nltk.corpus.reader.senseval module

Read from the Senseval 2 Corpus.

SENSEVAL [] Evaluation exercises for Word Sense Disambiguation. Organized by ACL-SIGLEX []

Prepared by Ted Pedersen <>, University of Minnesota, Distributed with permission.

The NLTK version of the Senseval 2 files uses well-formed XML. Each instance of the ambiguous words “hard”, “interest”, “line”, and “serve” is tagged with a sense identifier, and supplied with context.

class nltk.corpus.reader.senseval.SensevalCorpusReader[source]

Bases: CorpusReader

class nltk.corpus.reader.senseval.SensevalCorpusView[source]

Bases: StreamBackedCorpusView

__init__(fileid, encoding)[source]

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

  • fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.

  • startpos – The file position at which the view will start reading. This can be used to skip over preface sections.

  • encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).


Read a block from the input stream.


a block of tokens from the input stream

Return type



stream (stream) – an input stream

class nltk.corpus.reader.senseval.SensevalInstance[source]

Bases: object

__init__(word, position, context, senses)[source]