generic.pipelines#
Module Contents#
Classes#
Drops items without text. |
|
Save FeedItem on local disk. |
|
Process FileItem. This pipeline should be placed before FileItemStoragePipeline. |
|
Save FileItem on local disk. This pipeline should be at the end of ITEM_PIPELINES. |
|
API#
- class generic.pipelines.FileItemPipeline#
Process FileItem. This pipeline should be placed before FileItemStoragePipeline.
This pipeline expects FileItem to have filename with a proper file extention.
The purpose of the pipeline is:
Generate a unique, hashed file name.
Process FileItems if necessary, e.g., adding contexts or metadata to the FileItem.
- process_item(item: generic.items.FileItem, spider: scrapy.Spider) generic.items.FileItem#
Process FileItem.
Call a specific method to process the FileItem.
Generate a unique, hashed file name
Create a new FileItem with the generated file name.
- process_pdf_item(item: generic.items.FileItem, spider: scrapy.Spider) generic.items.FileItem#
Process PDF FileItem.
Adding metadata to the PDF
- class generic.pipelines.FileItemStoragePipeline#
Save FileItem on local disk. This pipeline should be at the end of ITEM_PIPELINES.
- process_item(item, spider)#