Running a spider#
After successful installation, run a spider, ReadMoreSider, read-me in short.
uv run scrapy crawl -a'urls=https://www.example.org/' -O foo.jsonl read-more
The above command crawls the URL and stores the scraped texts into
foo.jsonl. The file is a JSONL file, a common format to process a large set of text data. Each line is a JSON object.
Open the file with your favorite text editor. You will see a line of JSON object.
With jq, the output is human friendly.
jq < foo.jsonl
{
"acquired_time": "2026-01-24T16:12:38.376752+00:00",
"body": "<main>\n <p>This domain is for use in documentation examples without needing permission. Avoid use in operations.</p>\n <p>Learn more</p>\n </main>",
"url": "https://www.example.org/",
"lang": "en",
"author": null,
"description": null,
"kind": null,
"modified_time": null,
"published_time": null,
"site_name": null,
"title": "Example Domain",
"item_type": "ArticleItem",
"character_count": 96,
"sources": []
}