`generic.spiders.xml`#

Module Contents#

Classes#

`XmlSpiderConfig`	A configuration class for XmlSpider.
`XmlSpider`	A spider that scrapes ArticleItem from links in an XML file. The spider is useful when a list of links is in a dynamic XML response and the browser renders the list.

API#

class generic.spiders.xml.XmlSpiderConfig(/, **data: Any)#

Bases: generic.spiders.base.GenericSpiderConfig

A configuration class for XmlSpider.

Initialization

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

xml_link_xpath: str = None#: XPath expression to extract URLs, e.g., “//link/text()”.

class generic.spiders.xml.XmlSpider(*args, **kwargs)#

Bases: generic.spiders.base.GenericSpider[generic.spiders.xml.XmlSpiderConfig]

A spider that scrapes ArticleItem from links in an XML file. The spider is useful when a list of links is in a dynamic XML response and the browser renders the list.

Initialization

name = 'xml'#: The human-friendly name of the spider.

classmethod get_config_class() → Type[generic.spiders.xml.XmlSpiderConfig]#: Returns the config class for this spider.

async start()#: The entry point. Start crawling from the given URLs.

parse_xml(response: scrapy.http.Response)#: A handler to parse the XML.

parse_content(response: scrapy.http.Response)#

A handler to parse the article.

Yields:: ArticleItem

generic.spiders.xml#

Module Contents#

Classes#

API#

This Page

`generic.spiders.xml`#