nltk.tag.stanford module¶

A module for interfacing with the Stanford taggers.

Tagger models need to be downloaded from https://nlp.stanford.edu/software and the STANFORD_MODELS environment variable set (a colon-separated list of paths).

For more details see the documentation for StanfordPOSTagger and StanfordNERTagger.

class nltk.tag.stanford.StanfordNERTagger[source]¶

Bases: StanfordTagger

A class for Named-Entity Tagging with Stanford Tagger. The input is the paths to:

a model trained on training data
(optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.
(optionally) the encoding of the training data (default: UTF-8)

Example:

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split()) 
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),
 ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),
 ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]

__init__(*args, **kwargs)[source]¶

parse_output(text, sentences)[source]¶

class nltk.tag.stanford.StanfordPOSTagger[source]¶

Bases: StanfordTagger

A class for pos tagging with Stanford Tagger. The input is the paths to:

a model trained on training data
(optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.
(optionally) the encoding of the training data (default: UTF-8)

Example:

>>> from nltk.tag import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger') 
>>> st.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

__init__(*args, **kwargs)[source]¶

class nltk.tag.stanford.StanfordTagger[source]¶

Bases: TaggerI

An interface to Stanford taggers. Subclasses must define:

_cmd property: A property that returns the command that will be executed.
_SEPARATOR: Class constant that represents that character that is used to separate the tokens from their tags.
_JAR file: Class constant that represents the jar file name.

__init__(model_filename, path_to_jar=None, encoding='utf8', verbose=False, java_options='-mx1000m')[source]¶

parse_output(text, sentences=None)[source]¶

tag(tokens)[source]¶

Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).

Return type:: list(tuple(str, str))

tag_sents(sentences)[source]¶

Apply self.tag() to each element of sentences. I.e.:

return [self.tag(sent) for sent in sentences]

NLTK

Documentation

nltk.tag.stanford module¶