nltk.tag.perceptron module

class nltk.tag.perceptron.AveragedPerceptron[source]

Bases: object

An averaged perceptron, as implemented by Matthew Honnibal.

See more implementation details here:

json_tag = 'nltk.tag.perceptron.AveragedPerceptron'
predict(features, return_conf=False)[source]

Dot-product the features and current weights and return the best label.

update(truth, guess, features)[source]

Update the feature weights.


Average weights from all iterations.


Save the pickled model weights.


Load the pickled model weights.

classmethod decode_json_obj(obj)[source]
class nltk.tag.perceptron.PerceptronTagger[source]

Bases: nltk.tag.api.TaggerI

Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here:

>>> from nltk.tag.perceptron import PerceptronTagger

Train the model

>>> tagger = PerceptronTagger(load=False)
>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')],
... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])
>>> tagger.tag(['today','is','a','beautiful','day'])
[('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]

Use the pretrain model (the default constructor)

>>> pretrain = PerceptronTagger()
>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> pretrain.tag("The red cat".split())
[('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]
json_tag = 'nltk.tag.sequential.PerceptronTagger'
START = ['-START-', '-START2-']
END = ['-END-', '-END2-']

load – Load the pickled model upon instantiation.

tag(tokens, return_conf=False, use_tagdict=True)[source]

Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)

train(sentences, save_loc=None, nr_iter=5)[source]

Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.

  • sentences – A list or iterator of sentences, where each sentence is a list of (words, tags) tuples.

  • save_loc – If not None, saves a pickled model in this location.

  • nr_iter – Number of training iterations.


loc (str) – Load a pickled model at location.

classmethod decode_json_obj(obj)[source]

Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS

Return type