nltk.lm.preprocessing module

nltk.lm.preprocessing.flatten(iterable, /)

Alternative chain() constructor taking a single iterable argument that evaluates lazily.

nltk.lm.preprocessing.padded_everygram_pipeline(order, text)[source]

Default preprocessing for a sequence of sentences.

Creates two iterators:

  • sentences padded and turned into sequences of nltk.util.everygrams

  • sentences padded as above and chained together for a flat stream of words

Parameters
  • order – Largest ngram length produced by everygrams.

  • text (Iterable[Iterable[str]]) – Text to iterate over. Expected to be an iterable of sentences.

Returns

iterator over text as ngrams, iterator over text as vocabulary data

nltk.lm.preprocessing.padded_everygrams(order, sentence)[source]

Helper with some useful defaults.

Applies pad_both_ends to sentence and follows it up with everygrams.