- nltk.lm.preprocessing.flatten(iterable, /)¶
Alternative chain() constructor taking a single iterable argument that evaluates lazily.
- nltk.lm.preprocessing.padded_everygrams(order, sentence)¶
Helper with some useful defaults.
Applies pad_both_ends to sentence and follows it up with everygrams.
- nltk.lm.preprocessing.padded_everygram_pipeline(order, text)¶
Default preprocessing for a sequence of sentences.
Creates two iterators:
sentences padded and turned into sequences of nltk.util.everygrams
sentences padded as above and chained together for a flat stream of words
order – Largest ngram length produced by everygrams.
text (Iterable[Iterable[str]]) – Text to iterate over. Expected to be an iterable of sentences.
iterator over text as ngrams, iterator over text as vocabulary data