nltk.lm.smoothing module

Smoothing algorithms for language modeling.

According to Chen & Goodman 1995 these should work with both Backoff and Interpolation.

class nltk.lm.smoothing.AbsoluteDiscounting[source]

Bases: Smoothing

Smoothing with absolute discount.

__init__(vocabulary, counter, discount=0.75, **kwargs)[source]
Parameters
  • vocabulary (nltk.lm.vocab.Vocabulary) – The Ngram vocabulary object.

  • counter (nltk.lm.counter.NgramCounter) – The counts of the vocabulary items.

alpha_gamma(word, context)[source]
unigram_score(word)[source]
class nltk.lm.smoothing.KneserNey[source]

Bases: Smoothing

Kneser-Ney Smoothing.

This is an extension of smoothing with a discount.

Resources: - https://pages.ucsd.edu/~rlevy/lign256/winter2008/kneser_ney_mini_example.pdf - https://www.youtube.com/watch?v=ody1ysUTD7o - https://medium.com/@dennyc/a-simple-numerical-example-for-kneser-ney-smoothing-nlp-4600addf38b8 - https://www.cl.uni-heidelberg.de/courses/ss15/smt/scribe6.pdf - https://www-i6.informatik.rwth-aachen.de/publications/download/951/Kneser-ICASSP-1995.pdf

__init__(vocabulary, counter, order, discount=0.1, **kwargs)[source]
Parameters
  • vocabulary (nltk.lm.vocab.Vocabulary) – The Ngram vocabulary object.

  • counter (nltk.lm.counter.NgramCounter) – The counts of the vocabulary items.

alpha_gamma(word, context)[source]
unigram_score(word)[source]
class nltk.lm.smoothing.WittenBell[source]

Bases: Smoothing

Witten-Bell smoothing.

__init__(vocabulary, counter, **kwargs)[source]
Parameters
  • vocabulary (nltk.lm.vocab.Vocabulary) – The Ngram vocabulary object.

  • counter (nltk.lm.counter.NgramCounter) – The counts of the vocabulary items.

alpha_gamma(word, context)[source]
unigram_score(word)[source]