nltk.lm.preprocessing module¶

nltk.lm.preprocessing.flatten(iterable, /)¶: Alternative chain() constructor taking a single iterable argument that evaluates lazily.

nltk.lm.preprocessing.padded_everygram_pipeline(order, text)[source]¶

Default preprocessing for a sequence of sentences.

Creates two iterators:

Parameters:

order – Largest ngram length produced by everygrams.
text (Iterable[Iterable[str]]) – Text to iterate over. Expected to be an iterable of sentences.

Returns:

iterator over text as ngrams, iterator over text as vocabulary data

nltk.lm.preprocessing.padded_everygrams(order, sentence)[source]¶

Helper with some useful defaults.

Applies pad_both_ends to sentence and follows it up with everygrams.

NLTK