nltk.lm.models module

Language Models

class nltk.lm.models.AbsoluteDiscountingInterpolated[source]

Bases: InterpolatedLanguageModel

Interpolated version of smoothing with absolute discount.

__init__(order, discount=0.75, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

class nltk.lm.models.InterpolatedLanguageModel[source]

Bases: LanguageModel

Logic common to all interpolated language models.

The idea to abstract this comes from Chen & Goodman 1995. Do not instantiate this class directly!

__init__(smoothing_cls, order, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

unmasked_score(word, context=None)[source]

Score a word given some optional context.

Concrete models are expected to provide an implementation. Note that this method does not mask its arguments with the OOV label. Use the score method for that.

Parameters
  • word (str) – Word for which we want the score

  • context (tuple(str)) – Context the word is in. If None, compute unigram score.

  • context – tuple(str) or None

Return type

float

class nltk.lm.models.KneserNeyInterpolated[source]

Bases: InterpolatedLanguageModel

Interpolated version of Kneser-Ney smoothing.

__init__(order, discount=0.1, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

class nltk.lm.models.Laplace[source]

Bases: Lidstone

Implements Laplace (add one) smoothing.

Initialization identical to BaseNgramModel because gamma is always 1.

__init__(*args, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

class nltk.lm.models.Lidstone[source]

Bases: LanguageModel

Provides Lidstone-smoothed scores.

In addition to initialization arguments from BaseNgramModel also requires a number by which to increase the counts, gamma.

__init__(gamma, *args, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

unmasked_score(word, context=None)[source]

Add-one smoothing: Lidstone or Laplace.

To see what kind, look at gamma attribute on the class.

class nltk.lm.models.MLE[source]

Bases: LanguageModel

Class for providing MLE ngram model scores.

Inherits initialization from BaseNgramModel.

unmasked_score(word, context=None)[source]

Returns the MLE score for a word given a context.

Args: - word is expected to be a string - context is expected to be something reasonably convertible to a tuple

class nltk.lm.models.StupidBackoff[source]

Bases: LanguageModel

Provides StupidBackoff scores.

In addition to initialization arguments from BaseNgramModel also requires a parameter alpha with which we scale the lower order probabilities. Note that this is not a true probability distribution as scores for ngrams of the same order do not sum up to unity.

__init__(alpha=0.4, *args, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

unmasked_score(word, context=None)[source]

Score a word given some optional context.

Concrete models are expected to provide an implementation. Note that this method does not mask its arguments with the OOV label. Use the score method for that.

Parameters
  • word (str) – Word for which we want the score

  • context (tuple(str)) – Context the word is in. If None, compute unigram score.

  • context – tuple(str) or None

Return type

float

class nltk.lm.models.WittenBellInterpolated[source]

Bases: InterpolatedLanguageModel

Interpolated version of Witten-Bell smoothing.

__init__(order, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.Vocabulary or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter (nltk.lm.NgramCounter or None) – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.