nltk.classify.util module

Utility functions and classes for classifiers.

class nltk.classify.util.CutoffChecker[source]

Bases: object

A helper class that implements cutoff checks based on number of iterations and log likelihood.

Accuracy cutoffs are also implemented, but they’re almost never a good idea to use.

__init__(cutoffs)[source]
check(classifier, train_toks)[source]
nltk.classify.util.accuracy(classifier, gold)[source]
nltk.classify.util.apply_features(feature_func, toks, labeled=None)[source]

Use the LazyMap class to construct a lazy list-like object that is analogous to map(feature_func, toks). In particular, if labeled=False, then the returned list-like object’s values are equal to:

[feature_func(tok) for tok in toks]

If labeled=True, then the returned list-like object’s values are equal to:

[(feature_func(tok), label) for (tok, label) in toks]

The primary purpose of this function is to avoid the memory overhead involved in storing all the featuresets for every token in a corpus. Instead, these featuresets are constructed lazily, as-needed. The reduction in memory overhead can be especially significant when the underlying list of tokens is itself lazy (as is the case with many corpus readers).

Parameters
  • feature_func – The function that will be applied to each token. It should return a featureset – i.e., a dict mapping feature names to feature values.

  • toks – The list of tokens to which feature_func should be applied. If labeled=True, then the list elements will be passed directly to feature_func(). If labeled=False, then the list elements should be tuples (tok,label), and tok will be passed to feature_func().

  • labeled – If true, then toks contains labeled tokens – i.e., tuples of the form (tok, label). (Default: auto-detect based on types.)

nltk.classify.util.attested_labels(tokens)[source]
Returns

A list of all labels that are attested in the given list of tokens.

Return type

list of (immutable)

Parameters

tokens (list) – The list of classified tokens from which to extract labels. A classified token has the form (token, label).

nltk.classify.util.binary_names_demo_features(name)[source]
nltk.classify.util.check_megam_config()[source]

Checks whether the MEGAM binary is configured.

nltk.classify.util.log_likelihood(classifier, gold)[source]
nltk.classify.util.names_demo(trainer, features=<function names_demo_features>)[source]
nltk.classify.util.names_demo_features(name)[source]
nltk.classify.util.partial_names_demo(trainer, features=<function names_demo_features>)[source]
nltk.classify.util.wsd_demo(trainer, word, features, n=1000)[source]