nltk.classify.weka module

Classifiers that make use of the external ‘Weka’ package.

class nltk.classify.weka.ARFF_Formatter[source]

Bases: object

Converts featuresets and labeled featuresets to ARFF-formatted strings, appropriate for input into Weka.

Features and classes can be specified manually in the constructor, or may be determined from data using from_train.

__init__(labels, features)[source]
Parameters:
  • labels – A list of all class labels that can be generated.

  • features – A list of feature specifications, where each feature specification is a tuple (fname, ftype); and ftype is an ARFF type string such as NUMERIC or STRING.

data_section(tokens, labeled=None)[source]

Returns the ARFF data section for the given data.

Parameters:
  • tokens – a list of featuresets (dicts) or labelled featuresets which are tuples (featureset, label).

  • labeled – Indicates whether the given tokens are labeled or not. If None, then the tokens will be assumed to be labeled if the first token’s value is a tuple or list.

format(tokens)[source]

Returns a string representation of ARFF output for the given data.

static from_train(tokens)[source]

Constructs an ARFF_Formatter instance with class labels and feature types determined from the given data. Handles boolean, numeric and string (note: not nominal) types.

header_section()[source]

Returns an ARFF header as a string.

labels()[source]

Returns the list of classes.

write(outfile, tokens)[source]

Writes ARFF data to a file for the given data.

class nltk.classify.weka.WekaClassifier[source]

Bases: ClassifierI

__init__(formatter, model_filename)[source]
classify_many(featuresets)[source]

Apply self.classify() to each element of featuresets. I.e.:

return [self.classify(fs) for fs in featuresets]

Return type:

list(label)

parse_weka_distribution(s)[source]
parse_weka_output(lines)[source]
prob_classify_many(featuresets)[source]

Apply self.prob_classify() to each element of featuresets. I.e.:

return [self.prob_classify(fs) for fs in featuresets]

Return type:

list(ProbDistI)

classmethod train(model_filename, featuresets, classifier='naivebayes', options=[], quiet=True)[source]
nltk.classify.weka.config_weka(classpath=None)[source]