nltk.tag.perceptron module¶
- class nltk.tag.perceptron.AveragedPerceptron[source]¶
Bases:
object
An averaged perceptron, as implemented by Matthew Honnibal.
- See more implementation details here:
https://explosion.ai/blog/part-of-speech-pos-tagger-in-python
- json_tag = 'nltk.tag.perceptron.AveragedPerceptron'¶
- class nltk.tag.perceptron.PerceptronTagger[source]¶
Bases:
TaggerI
Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python
>>> from nltk.tag.perceptron import PerceptronTagger
Train the model
>>> tagger = PerceptronTagger(load=False)
>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')], ... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])
>>> tagger.tag(['today','is','a','beautiful','day']) [('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]
Use the pretrain model (the default constructor)
>>> pretrain = PerceptronTagger()
>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split()) [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> pretrain.tag("The red cat".split()) [('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]
- END = ['-END-', '-END2-']¶
- START = ['-START-', '-START2-']¶
- json_tag = 'nltk.tag.sequential.PerceptronTagger'¶
- normalize(word)[source]¶
Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS
- Return type:
str
- tag(tokens, return_conf=False, use_tagdict=True)[source]¶
Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)
- train(sentences, save_loc=None, nr_iter=5)[source]¶
Train a model from sentences, and save it at
save_loc
.nr_iter
controls the number of Perceptron training iterations.- Parameters:
sentences – A list or iterator of sentences, where each sentence is a list of (words, tags) tuples.
save_loc – If not
None
, saves a json model in this location.nr_iter – Number of training iterations.