Skip to main content
Ctrl+K

rokujo-collector-scrapy-generic

  • Introduction
  • Getting started
  • Spiders
  • RSS Reader
  • API Reference
  • Introduction
  • Getting started
  • Spiders
  • RSS Reader
  • API Reference

Section Navigation

  • generic
    • generic.spiders
      • generic.spiders.read_more
      • generic.spiders.feed
      • generic.spiders.file_download
      • generic.spiders.archive
      • generic.spiders.directory
      • generic.spiders.xml
      • generic.spiders.base
      • generic.spiders.generic_sitemap
    • generic.mixins
      • generic.mixins.read_more
      • generic.mixins.file_downloader
    • generic.utils
      • generic.utils.text_parser
    • generic.items
    • generic.settings
    • generic.runner
    • generic.middlewares
    • generic.pipelines
  • API Reference
  • generic
  • generic.utils
  • generic.utils.text_parser

generic.utils.text_parser#

Module Contents#

Classes#

ArticleTextParser

API#

class generic.utils.text_parser.ArticleTextParser#
parse(html_string)#
_clean(text)#
_remove_element(el)#
segment(text)#

previous

generic.utils

next

generic.items

On this page
  • Module Contents
    • Classes
    • API
      • ArticleTextParser
        • ArticleTextParser.parse()
        • ArticleTextParser._clean()
        • ArticleTextParser._remove_element()
        • ArticleTextParser.segment()

This Page

  • Show Source

© Copyright 2026, Tomoyuki Sakurai.

Created using Sphinx 9.0.4.

Built with the PyData Sphinx Theme 0.16.1.