nltk.collocations.BigramCollocationFinder

class nltk.collocations.BigramCollocationFinder[source]

Bases: AbstractCollocationFinder

A tool for the finding and ranking of bigram collocations or other association measures. It is often useful to use from_words() rather than constructing an instance directly.

default_ws = 2
__init__(word_fd, bigram_fd, window_size=2)[source]

Construct a BigramCollocationFinder, given FreqDists for appearances of words and (possibly non-contiguous) bigrams.

classmethod from_words(words, window_size=2)[source]

Construct a BigramCollocationFinder for all bigrams in the given sequence. When window_size > 2, count non-contiguous bigrams, in the style of Church and Hanks’s (1990) association ratio.

score_ngram(score_fn, w1, w2)[source]

Returns the score for a given bigram using the given scoring function. Following Church and Hanks (1990), counts are scaled by a factor of 1/(window_size - 1).

above_score(score_fn, min_score)

Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.

apply_freq_filter(min_freq)

Removes candidate ngrams which have frequency less than min_freq.

apply_ngram_filter(fn)

Removes candidate ngrams (w1, w2, …) where fn(w1, w2, …) evaluates to True.

apply_word_filter(fn)

Removes candidate ngrams (w1, w2, …) where any of (fn(w1), fn(w2), …) evaluates to True.

classmethod from_documents(documents)

Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.

nbest(score_fn, n)

Returns the top n ngrams when scored by the given function.

score_ngrams(score_fn)

Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.