nltk.tag.stanford module¶
A module for interfacing with the Stanford taggers.
Tagger models need to be downloaded from https://nlp.stanford.edu/software and the STANFORD_MODELS environment variable set (a colon-separated list of paths).
For more details see the documentation for StanfordPOSTagger and StanfordNERTagger.
- class nltk.tag.stanford.StanfordNERTagger[source]¶
Bases:
StanfordTagger
A class for Named-Entity Tagging with Stanford Tagger. The input is the paths to:
a model trained on training data
(optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.
(optionally) the encoding of the training data (default: UTF-8)
Example:
>>> from nltk.tag import StanfordNERTagger >>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') >>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split()) [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]
- class nltk.tag.stanford.StanfordPOSTagger[source]¶
Bases:
StanfordTagger
- A class for pos tagging with Stanford Tagger. The input is the paths to:
a model trained on training data
(optionally) the path to the stanford tagger jar file. If not specified here, then this jar file must be specified in the CLASSPATH environment variable.
(optionally) the encoding of the training data (default: UTF-8)
Example:
>>> from nltk.tag import StanfordPOSTagger >>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger') >>> st.tag('What is the airspeed of an unladen swallow ?'.split()) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
- class nltk.tag.stanford.StanfordTagger[source]¶
Bases:
TaggerI
An interface to Stanford taggers. Subclasses must define:
_cmd
property: A property that returns the command that will be executed._SEPARATOR
: Class constant that represents that character that is used to separate the tokens from their tags._JAR
file: Class constant that represents the jar file name.
- __init__(model_filename, path_to_jar=None, encoding='utf8', verbose=False, java_options='-mx1000m')[source]¶