XmlSpider#

A spider that scrapes ArticleItem from links in an XML file. The spider is useful when a list of links is in a dynamic XML response and the browser renders the list.

Usage#

In addition to urls, xml_link_xpath is required.

uv run scrapy -a "urls=http://example.org/latest.xml" -a "xml_link_xpath=//link/text()" xml

How It Works#

The spider:

fetches the XML file
parses the file and extract URLs with the given XPath expression
generates ArticleItem

Although the spider can be used to collect ArticleItem from RSS feeds (they are XML), feed spider does a better job for that purpose.