nltk.corpus.reader.nombank module¶

class nltk.corpus.reader.nombank.NombankChainTreePointer[source]¶

Bases: NombankPointer

__init__(pieces)[source]¶

pieces¶: A list of the pieces that make up this chain. Elements may be either NombankSplitTreePointer or NombankTreePointer pointers.

select(tree)[source]¶

class nltk.corpus.reader.nombank.NombankCorpusReader[source]¶

Bases: CorpusReader

Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of “frameset files” which define the argument labels used by the annotations, on a per-noun basis. Each “frameset file” contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called “rolesets”. For each “roleset”, the frameset file provides descriptions of the argument roles, along with examples.

__init__(root, nomfile, framefiles='', nounsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding='utf8')[source]¶

Parameters:

root – The root directory for this corpus.
nomfile – The name of the file containing the predicate- argument annotations (relative to root).
framefiles – A list or regexp specifying the frameset fileids for this corpus.
parse_fileid_xform – A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).
parse_corpus – The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by nombank.

instances(baseform=None)[source]¶

Returns:: a corpus view that acts as a list of NombankInstance objects, one for each noun in the corpus.

lines()[source]¶

Returns:: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

nouns()[source]¶

Returns:: a corpus view that acts as a list of all noun lemmas in this corpus (from the nombank.1.0.words file).

roleset(roleset_id)[source]¶

Returns:: the xml description for the given roleset.

rolesets(baseform=None)[source]¶

Returns:: list of xml descriptions for rolesets.

class nltk.corpus.reader.nombank.NombankInstance[source]¶

Bases: object

__init__(fileid, sentnum, wordnum, baseform, sensenumber, predicate, predid, arguments, parse_corpus=None)[source]¶

arguments¶: A list of tuples (argloc, argid), specifying the location and identifier for each of the predicate’s argument in the containing sentence. Argument identifiers are strings such as 'ARG0' or 'ARGM-TMP'. This list does not contain the predicate.

baseform¶: The baseform of the predicate.

fileid¶: The name of the file containing the parse tree for this instance’s sentence.

static parse(s, parse_fileid_xform=None, parse_corpus=None)[source]¶

parse_corpus¶: A corpus reader for the parse trees corresponding to the instances in this nombank corpus.

predicate¶: A NombankTreePointer indicating the position of this instance’s predicate within its containing sentence.

predid¶: Identifier of the predicate.

property roleset¶: The name of the roleset used by this instance’s predicate. Use nombank.roleset() <NombankCorpusReader.roleset> to look up information about the roleset.

sensenumber¶: The sense number of the predicate.

sentnum¶: The sentence number of this sentence within fileid. Indexing starts from zero.

property tree¶: The parse tree corresponding to this instance, or None if the corresponding tree is not available.

wordnum¶: The word number of this instance’s predicate within its containing sentence. Word numbers are indexed starting from zero, and include traces and other empty parse elements.

class nltk.corpus.reader.nombank.NombankPointer[source]¶

Bases: object

A pointer used by nombank to identify one or more constituents in a parse tree. NombankPointer is an abstract base class with three concrete subclasses:

NombankTreePointer is used to point to single constituents.
NombankSplitTreePointer is used to point to ‘split’ constituents, which consist of a sequence of two or more NombankTreePointer pointers.
NombankChainTreePointer is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be NombankTreePointer or NombankSplitTreePointer pointers.

__init__()[source]¶

class nltk.corpus.reader.nombank.NombankSplitTreePointer[source]¶

Bases: NombankPointer

__init__(pieces)[source]¶

pieces¶: A list of the pieces that make up this chain. Elements are all NombankTreePointer pointers.

select(tree)[source]¶

class nltk.corpus.reader.nombank.NombankTreePointer[source]¶

Bases: NombankPointer

wordnum:height*wordnum:height*… wordnum:height,

__init__(wordnum, height)[source]¶

static parse(s)[source]¶

select(tree)[source]¶

treepos(tree)[source]¶: Convert this pointer to a standard ‘tree position’ pointer, given that it points to the given tree.

NLTK

Documentation

nltk.corpus.reader.nombank module¶