nltk.tag.stanford module

A module for interfacing with the Stanford taggers.

Tagger models need to be downloaded from https://nlp.stanford.edu/software and the STANFORD_MODELS environment variable set (a colon-separated list of paths).

For more details see the documentation for StanfordPOSTagger and StanfordNERTagger.

class nltk.tag.stanford.StanfordNERTagger[source]

Bases: StanfordTagger

A class for Named-Entity Tagging with Stanford Tagger. The input is the paths to:

  • a model trained on training data

  • (optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.

  • (optionally) the encoding of the training data (default: UTF-8)

Example:

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split()) 
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),
 ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),
 ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]
__init__(*args, **kwargs)[source]
parse_output(text, sentences)[source]
class nltk.tag.stanford.StanfordPOSTagger[source]

Bases: StanfordTagger

A class for pos tagging with Stanford Tagger. The input is the paths to:
  • a model trained on training data

  • (optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.

  • (optionally) the encoding of the training data (default: UTF-8)

Example:

>>> from nltk.tag import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger') 
>>> st.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
__init__(*args, **kwargs)[source]
class nltk.tag.stanford.StanfordTagger[source]

Bases: TaggerI

An interface to Stanford taggers. Subclasses must define:

  • _cmd property: A property that returns the command that will be executed.

  • _SEPARATOR: Class constant that represents that character that is used to separate the tokens from their tags.

  • _JAR file: Class constant that represents the jar file name.

__init__(model_filename, path_to_jar=None, encoding='utf8', verbose=False, java_options='-mx1000m')[source]
parse_output(text, sentences=None)[source]
tag(tokens)[source]

Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).

Return type:

list(tuple(str, str))

tag_sents(sentences)[source]

Apply self.tag() to each element of sentences. I.e.:

return [self.tag(sent) for sent in sentences]