nltk.tag.perceptron module¶

class nltk.tag.perceptron.AveragedPerceptron[source]¶

Bases: object

An averaged perceptron, as implemented by Matthew Honnibal.

See more implementation details here:: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

__init__(weights=None)[source]¶

average_weights()[source]¶: Average weights from all iterations.

classmethod decode_json_obj(obj)[source]¶

encode_json_obj()[source]¶

json_tag = 'nltk.tag.perceptron.AveragedPerceptron'¶

load(path)[source]¶: Load the json model weights.

predict(features, return_conf=False)[source]¶: Dot-product the features and current weights and return the best label.

save(path)[source]¶: Save the model weights as json

update(truth, guess, features)[source]¶: Update the feature weights.

class nltk.tag.perceptron.PerceptronTagger[source]¶

Bases: TaggerI

Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

>>> from nltk.tag.perceptron import PerceptronTagger

Train the model

>>> tagger = PerceptronTagger(load=False)

>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')],
... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])

>>> tagger.tag(['today','is','a','beautiful','day'])
[('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]

Use the pretrain model (the default constructor)

>>> pretrain = PerceptronTagger()

>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

>>> pretrain.tag("The red cat".split())
[('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]

END = ['-END-', '-END2-']¶

START = ['-START-', '-START2-']¶

__init__(load=True, lang='eng')[source]¶

Parameters:: load – Load the json model upon instantiation.

classmethod decode_json_obj(obj)[source]¶

encode_json_obj()[source]¶

json_tag = 'nltk.tag.sequential.PerceptronTagger'¶

load_from_json(lang='eng')[source]¶

normalize(word)[source]¶

Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS

Return type:: str

save_to_json(loc, lang='xxx')[source]¶

tag(tokens, return_conf=False, use_tagdict=True)[source]¶: Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)

train(sentences, save_loc=None, nr_iter=5)[source]¶

Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.

Parameters:

sentences – A list or iterator of sentences, where each sentence is a list of (words, tags) tuples.
save_loc – If not None, saves a json model in this location.
nr_iter – Number of training iterations.

NLTK

Documentation

nltk.tag.perceptron module¶