DirectorySpider#
A spider that crawls pages under a directory. This spider is useful when scraping ArticleItem from a part of website.
How It Works#
The directory to crawl is the base directory of the last component of the given URL.
When the URL is http://example.org/index.html, it crawls all the URLs.
When the URL is http://example.org/foo/index.html, it crawls pages under /foo/.
When the URL is http://example.org/foo/bar/index.html, it crawls pages under /foo/bar/ but not /foo/bar.html.
When start_urls is http://example.org/a/b/c.html:
it crawls
/a/b/index.html.it crawls
/a/b/foo.html.it crawls
/a/b/c/bar.html.it does not crawl
/index.htmlit does not crawl
/a/index.html