nltk.tag.pos_tag

nltk.tag.pos_tag(tokens, tagset=None, lang='eng')[source]

Use NLTK’s currently recommended part of speech tagger to tag the given list of tokens.

>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad.")) 
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is', 'VBZ'),
("n't", 'RB'), ('all', 'PDT'), ('that', 'DT'), ('bad', 'JJ'), ('.', '.')]
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."), tagset='universal') 
[('John', 'NOUN'), ("'s", 'PRT'), ('big', 'ADJ'), ('idea', 'NOUN'), ('is', 'VERB'),
("n't", 'ADV'), ('all', 'DET'), ('that', 'DET'), ('bad', 'ADJ'), ('.', '.')]

NB. Use pos_tag_sents() for efficient tagging of more than one sentence.

Parameters
  • tokens (list(str)) – Sequence of tokens to be tagged

  • tagset (str) – the tagset to be used, e.g. universal, wsj, brown

  • lang (str) – the ISO 639 code of the language, e.g. ‘eng’ for English, ‘rus’ for Russian

Returns

The tagged tokens

Return type

list(tuple(str, str))