nltk.probability.KneserNeyProbDist

class nltk.probability.KneserNeyProbDist[source]

Bases: ProbDistI

Kneser-Ney estimate of a probability distribution. This is a version of back-off that counts how likely an n-gram is provided the n-1-gram had been seen in training. Extends the ProbDistI interface, requires a trigram FreqDist instance to train on. Optionally, a different from default discount value can be specified. The default discount is set to 0.75.

__init__(freqdist, bins=None, discount=0.75)[source]
Parameters
  • freqdist (FreqDist) – The trigram frequency distribution upon which to base the estimation

  • bins (int or float) – Included for compatibility with nltk.tag.hmm

  • discount (float (preferred, but can be set to int)) – The discount applied when retrieving counts of trigrams

prob(trigram)[source]

Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].

Parameters

sample (any) – The sample whose probability should be returned.

Return type

float

discount()[source]

Return the value by which counts are discounted. By default set to 0.75.

Return type

float

set_discount(discount)[source]

Set the value by which counts are discounted to the value of discount.

Parameters

discount (float (preferred, but int possible)) – the new value to discount counts by

Return type

None

samples()[source]

Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.

Return type

list

max()[source]

Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.

Return type

any

SUM_TO_ONE = True

True if the probabilities of the samples in this probability distribution will always sum to one.

generate()

Return a randomly selected sample from this probability distribution. The probability of returning each sample samp is equal to self.prob(samp).

logprob(sample)

Return the base 2 logarithm of the probability for a given sample.

Parameters

sample (any) – The sample whose probability should be returned.

Return type

float