nltk.corpus.reader.ieer module

Corpus reader for the Information Extraction and Entity Recognition Corpus.

NIST 1999 Information Extraction: Entity Recognition Evaluation

This corpus contains the NEWSWIRE development test data for the NIST 1999 IE-ER Evaluation. The files were taken from the subdirectory: /ie_er_99/english/devtest/newswire/*.ref.nwt and filenames were shortened.

The corpus contains the following files: APW_19980314, APW_19980424, APW_19980429, NYT_19980315, NYT_19980403, and NYT_19980407.

class nltk.corpus.reader.ieer.IEERCorpusReader[source]

Bases: CorpusReader

class nltk.corpus.reader.ieer.IEERDocument[source]

Bases: object

__init__(text, docno=None, doctype=None, date_time=None, headline='')[source]
nltk.corpus.reader.ieer.documents = ['APW_19980314', 'APW_19980424', 'APW_19980429', 'NYT_19980315', 'NYT_19980403', 'NYT_19980407']

A list of all documents in this corpus.

nltk.corpus.reader.ieer.titles = {'APW_19980314': 'Associated Press Weekly, 14 March 1998', 'APW_19980424': 'Associated Press Weekly, 24 April 1998', 'APW_19980429': 'Associated Press Weekly, 29 April 1998', 'NYT_19980315': 'New York Times, 15 March 1998', 'NYT_19980403': 'New York Times, 3 April 1998', 'NYT_19980407': 'New York Times, 7 April 1998'}

A dictionary whose keys are the names of documents in this corpus; and whose values are descriptions of those documents’ contents.