nltk.classify.decisiontree module

A classifier model that decides which label to assign to a token on the basis of a tree structure, where branches correspond to conditions on feature values, and leaves correspond to label assignments.

class nltk.classify.decisiontree.DecisionTreeClassifier[source]

Bases: ClassifierI

__init__(label, feature_name=None, decisions=None, default=None)[source]
  • label – The most likely label for tokens that reach this node in the decision tree. If this decision tree has no children, then this label will be assigned to any token that reaches this decision tree.

  • feature_name – The name of the feature that this decision tree selects for.

  • decisions – A dictionary mapping from feature values for the feature identified by feature_name to child decision trees.

  • default – The child that will be used if the value of feature feature_name does not match any of the keys in decisions. This is used when constructing binary decision trees.

static best_binary_stump(feature_names, labeled_featuresets, feature_values, verbose=False)[source]
static best_stump(feature_names, labeled_featuresets, verbose=False)[source]
static binary_stump(feature_name, feature_value, labeled_featuresets)[source]

the most appropriate label for the given featureset.

Return type



the list of category labels used by this classifier.

Return type

list of (immutable)

static leaf(labeled_featuresets)[source]
pretty_format(width=70, prefix='', depth=4)[source]

Return a string containing a pretty-printed version of this decision tree. Each line in this string corresponds to a single decision tree node or leaf, and indentation is used to display the structure of the decision tree.

pseudocode(prefix='', depth=4)[source]

Return a string representation of this decision tree that expresses the decisions it makes as a nested set of pseudocode if statements.

refine(labeled_featuresets, entropy_cutoff, depth_cutoff, support_cutoff, binary=False, feature_values=None, verbose=False)[source]
static stump(feature_name, labeled_featuresets)[source]
static train(labeled_featuresets, entropy_cutoff=0.05, depth_cutoff=100, support_cutoff=10, binary=False, feature_values=None, verbose=False)[source]

binary – If true, then treat all feature/value pairs as individual binary features, rather than using a single n-way branch for each feature.