nltk.tag.senna module

Senna POS tagger, NER Tagger, Chunk Tagger

The input is:

  • path to the directory that contains SENNA executables. If the path is incorrect, SennaTagger will automatically search for executable file specified in SENNA environment variable

  • (optionally) the encoding of the input data (default:utf-8)

Note: Unit tests for this module can be found in test/unit/test_senna.py

>>> from nltk.tag import SennaTagger
>>> tagger = SennaTagger('/usr/share/senna-v3.0')  
>>> tagger.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'),
('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'NN'), ('?', '.')]
>>> from nltk.tag import SennaChunkTagger
>>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0')  
>>> chktagger.tag('What is the airspeed of an unladen swallow ?'.split()) 
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'),
('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'),
('?', 'O')]
>>> from nltk.tag import SennaNERTagger
>>> nertagger = SennaNERTagger('/usr/share/senna-v3.0')  
>>> nertagger.tag('Shakespeare theatre was in London .'.split()) 
[('Shakespeare', 'B-PER'), ('theatre', 'O'), ('was', 'O'), ('in', 'O'),
('London', 'B-LOC'), ('.', 'O')]
>>> nertagger.tag('UN headquarters are in NY , USA .'.split()) 
[('UN', 'B-ORG'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'),
('NY', 'B-LOC'), (',', 'O'), ('USA', 'B-LOC'), ('.', 'O')]
class nltk.tag.senna.SennaChunkTagger[source]

Bases: Senna

__init__(path, encoding='utf-8')[source]
bio_to_chunks(tagged_sent, chunk_type)[source]

Extracts the chunks in a BIO chunk-tagged sentence.

>>> from nltk.tag import SennaChunkTagger
>>> chktagger = SennaChunkTagger('/usr/share/senna-v3.0')  
>>> sent = 'What is the airspeed of an unladen swallow ?'.split()
>>> tagged_sent = chktagger.tag(sent)  
>>> tagged_sent  
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'),
('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'),
('?', 'O')]
>>> list(chktagger.bio_to_chunks(tagged_sent, chunk_type='NP'))  
[('What', '0'), ('the airspeed', '2-3'), ('an unladen swallow', '5-6-7')]
Parameters:
  • tagged_sent (str) – A list of tuples of word and BIO chunk tag.

  • tagged_sent – The chunk tag that users want to extract, e.g. ‘NP’ or ‘VP’

Returns:

An iterable of tuples of chunks that users want to extract and their corresponding indices.

Return type:

iter(tuple(str))

tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).

class nltk.tag.senna.SennaNERTagger[source]

Bases: Senna

__init__(path, encoding='utf-8')[source]
tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).

class nltk.tag.senna.SennaTagger[source]

Bases: Senna

__init__(path, encoding='utf-8')[source]
tag_sents(sentences)[source]

Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).