nltk.tag.senna module

Senna POS tagger, NER Tagger, Chunk Tagger

The input is:

  • path to the directory that contains SENNA executables. If the path is incorrect, SennaTagger will automatically search for executable file specified in SENNA environment variable

  • (optionally) the encoding of the input data (default:utf-8)

Note: Unit tests for this module can be found in test/unit/test_senna.py

>>> from nltk.tag import SennaTagger
>>> tagger = SennaTagger('/usr/share/senna-v3.0')
>>> tagger.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'),
('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'NN'), ('?', '.')]
>>> from nltk.tag import SennaChunkTagger
>>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0')
>>> chktagger.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'),
('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'),
('?', 'O')]
>>> from nltk.tag import SennaNERTagger
>>> nertagger = SennaNERTagger('/usr/share/senna-v3.0')
>>> nertagger.tag('Shakespeare theatre was in London .'.split()) 
[('Shakespeare', 'B-PER'), ('theatre', 'O'), ('was', 'O'), ('in', 'O'),
('London', 'B-LOC'), ('.', 'O')]
>>> nertagger.tag('UN headquarters are in NY , USA .'.split()) 
[('UN', 'B-ORG'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'),
('NY', 'B-LOC'), (',', 'O'), ('USA', 'B-LOC'), ('.', 'O')]
class nltk.tag.senna.SennaTagger[source]

Bases: nltk.classify.senna.Senna

__init__(path, encoding='utf-8')[source]
tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).

class nltk.tag.senna.SennaChunkTagger[source]

Bases: nltk.classify.senna.Senna

__init__(path, encoding='utf-8')[source]
tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).

bio_to_chunks(tagged_sent, chunk_type)[source]

Extracts the chunks in a BIO chunk-tagged sentence.

>>> from nltk.tag import SennaChunkTagger
>>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0')
>>> sent = 'What is the airspeed of an unladen swallow ?'.split()
>>> tagged_sent = chktagger.tag(sent) 
>>> tagged_sent 
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'),
('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'),
('?', 'O')]
>>> list(chktagger.bio_to_chunks(tagged_sent, chunk_type='NP')) 
[('What', '0'), ('the airspeed', '2-3'), ('an unladen swallow', '5-6-7')]
Parameters
  • tagged_sent (str) – A list of tuples of word and BIO chunk tag.

  • tagged_sent – The chunk tag that users want to extract, e.g. ‘NP’ or ‘VP’

Returns

An iterable of tuples of chunks that users want to extract and their corresponding indices.

Return type

iter(tuple(str))

class nltk.tag.senna.SennaNERTagger[source]

Bases: nltk.classify.senna.Senna

__init__(path, encoding='utf-8')[source]
tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).