nltk.tag.senna module¶
Senna POS tagger, NER Tagger, Chunk Tagger
The input is:
path to the directory that contains SENNA executables. If the path is incorrect, SennaTagger will automatically search for executable file specified in SENNA environment variable
(optionally) the encoding of the input data (default:utf-8)
Note: Unit tests for this module can be found in test/unit/test_senna.py
>>> from nltk.tag import SennaTagger
>>> tagger = SennaTagger('/usr/share/senna-v3.0')
>>> tagger.tag('What is the airspeed of an unladen swallow ?'.split())
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'),
('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'NN'), ('?', '.')]
>>> from nltk.tag import SennaChunkTagger
>>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0')
>>> chktagger.tag('What is the airspeed of an unladen swallow ?'.split())
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'),
('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'),
('?', 'O')]
>>> from nltk.tag import SennaNERTagger
>>> nertagger = SennaNERTagger('/usr/share/senna-v3.0')
>>> nertagger.tag('Shakespeare theatre was in London .'.split())
[('Shakespeare', 'B-PER'), ('theatre', 'O'), ('was', 'O'), ('in', 'O'),
('London', 'B-LOC'), ('.', 'O')]
>>> nertagger.tag('UN headquarters are in NY , USA .'.split())
[('UN', 'B-ORG'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'),
('NY', 'B-LOC'), (',', 'O'), ('USA', 'B-LOC'), ('.', 'O')]
- class nltk.tag.senna.SennaChunkTagger[source]¶
Bases:
Senna
- bio_to_chunks(tagged_sent, chunk_type)[source]¶
Extracts the chunks in a BIO chunk-tagged sentence.
>>> from nltk.tag import SennaChunkTagger >>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0') >>> sent = 'What is the airspeed of an unladen swallow ?'.split() >>> tagged_sent = chktagger.tag(sent) >>> tagged_sent [('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'), ('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'), ('?', 'O')] >>> list(chktagger.bio_to_chunks(tagged_sent, chunk_type='NP')) [('What', '0'), ('the airspeed', '2-3'), ('an unladen swallow', '5-6-7')]
- Parameters:
tagged_sent (str) – A list of tuples of word and BIO chunk tag.
tagged_sent – The chunk tag that users want to extract, e.g. ‘NP’ or ‘VP’
- Returns:
An iterable of tuples of chunks that users want to extract and their corresponding indices.
- Return type:
iter(tuple(str))