nltk.corpus.reader.senseval module

Read from the Senseval 2 Corpus.

SENSEVAL [http://www.senseval.org/] Evaluation exercises for Word Sense Disambiguation. Organized by ACL-SIGLEX [https://www.siglex.org/]

Prepared by Ted Pedersen <tpederse@umn.edu>, University of Minnesota, https://www.d.umn.edu/~tpederse/data.html Distributed with permission.

The NLTK version of the Senseval 2 files uses well-formed XML. Each instance of the ambiguous words “hard”, “interest”, “line”, and “serve” is tagged with a sense identifier, and supplied with context.

class nltk.corpus.reader.senseval.SensevalInstance[source]

Bases: object

__init__(word, position, context, senses)[source]
class nltk.corpus.reader.senseval.SensevalCorpusReader[source]

Bases: nltk.corpus.reader.api.CorpusReader

instances(fileids=None)[source]
class nltk.corpus.reader.senseval.SensevalCorpusView[source]

Bases: nltk.corpus.reader.util.StreamBackedCorpusView

__init__(fileid, encoding)[source]

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters
  • fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.

  • startpos – The file position at which the view will start reading. This can be used to skip over preface sections.

  • encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).

read_block(stream)[source]

Read a block from the input stream.

Returns

a block of tokens from the input stream

Return type

list(any)

Parameters

stream (stream) – an input stream