nltk.corpus.util module

class nltk.corpus.util.LazyCorpusLoader[source]

Bases: object

To see the API documentation for this lazily loaded corpus, first run corpus.ensure_loaded(), and then run help(this_corpus).

LazyCorpusLoader is a proxy object which is used to stand in for a corpus object before the corpus is loaded. This allows NLTK to create an object for each corpus, but defer the costs associated with loading those corpora until the first time that they’re actually accessed.

The first time this object is accessed in any way, it will load the corresponding corpus, and transform itself into that corpus (by modifying its own __class__ and __dict__ attributes).

If the corpus can not be found, then accessing this object will raise an exception, displaying installation instructions for the NLTK data package. Once they’ve properly installed the data package (or modified nltk.data.path to point to its location), they can then use the corpus object without restarting python.

Parameters
  • name (str) – The name of the corpus

  • reader_cls – The specific CorpusReader class, e.g. PlaintextCorpusReader, WordListCorpusReader

  • nltk_data_subdir (str) – The subdirectory where the corpus is stored.

  • *args – Any other non-keywords arguments that reader_cls might need.

  • **kwargs – Any other keywords arguments that reader_cls might need.

__init__(name, reader_cls, *args, **kwargs)[source]