nltk.corpus.reader.nombank module

class nltk.corpus.reader.nombank.NombankCorpusReader[source]

Bases: nltk.corpus.reader.api.CorpusReader

Corpus reader for the nombank corpus, which augments the Penn Treebank with information about the predicate argument structure of every noun instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of “frameset files” which define the argument labels used by the annotations, on a per-noun basis. Each “frameset file” contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called “rolesets”. For each “roleset”, the frameset file provides descriptions of the argument roles, along with examples.

__init__(root, nomfile, framefiles='', nounsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding='utf8')[source]
Parameters
  • root – The root directory for this corpus.

  • nomfile – The name of the file containing the predicate- argument annotations (relative to root).

  • framefiles – A list or regexp specifying the frameset fileids for this corpus.

  • parse_fileid_xform – A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).

  • parse_corpus – The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by nombank.

instances(baseform=None)[source]
Returns

a corpus view that acts as a list of NombankInstance objects, one for each noun in the corpus.

lines()[source]
Returns

a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

roleset(roleset_id)[source]
Returns

the xml description for the given roleset.

rolesets(baseform=None)[source]
Returns

list of xml descriptions for rolesets.

nouns()[source]
Returns

a corpus view that acts as a list of all noun lemmas in this corpus (from the nombank.1.0.words file).

class nltk.corpus.reader.nombank.NombankInstance[source]

Bases: object

__init__(fileid, sentnum, wordnum, baseform, sensenumber, predicate, predid, arguments, parse_corpus=None)[source]
fileid

The name of the file containing the parse tree for this instance’s sentence.

sentnum

The sentence number of this sentence within fileid. Indexing starts from zero.

wordnum

The word number of this instance’s predicate within its containing sentence. Word numbers are indexed starting from zero, and include traces and other empty parse elements.

baseform

The baseform of the predicate.

sensenumber

The sense number of the predicate.

predicate

A NombankTreePointer indicating the position of this instance’s predicate within its containing sentence.

predid

Identifier of the predicate.

arguments

A list of tuples (argloc, argid), specifying the location and identifier for each of the predicate’s argument in the containing sentence. Argument identifiers are strings such as 'ARG0' or 'ARGM-TMP'. This list does not contain the predicate.

parse_corpus

A corpus reader for the parse trees corresponding to the instances in this nombank corpus.

property roleset

The name of the roleset used by this instance’s predicate. Use nombank.roleset() <NombankCorpusReader.roleset> to look up information about the roleset.

property tree

The parse tree corresponding to this instance, or None if the corresponding tree is not available.

static parse(s, parse_fileid_xform=None, parse_corpus=None)[source]
class nltk.corpus.reader.nombank.NombankPointer[source]

Bases: object

A pointer used by nombank to identify one or more constituents in a parse tree. NombankPointer is an abstract base class with three concrete subclasses:

  • NombankTreePointer is used to point to single constituents.

  • NombankSplitTreePointer is used to point to ‘split’ constituents, which consist of a sequence of two or more NombankTreePointer pointers.

  • NombankChainTreePointer is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be NombankTreePointer or NombankSplitTreePointer pointers.

__init__()[source]
class nltk.corpus.reader.nombank.NombankChainTreePointer[source]

Bases: nltk.corpus.reader.nombank.NombankPointer

__init__(pieces)[source]
pieces

A list of the pieces that make up this chain. Elements may be either NombankSplitTreePointer or NombankTreePointer pointers.

select(tree)[source]
class nltk.corpus.reader.nombank.NombankSplitTreePointer[source]

Bases: nltk.corpus.reader.nombank.NombankPointer

__init__(pieces)[source]
pieces

A list of the pieces that make up this chain. Elements are all NombankTreePointer pointers.

select(tree)[source]
class nltk.corpus.reader.nombank.NombankTreePointer[source]

Bases: nltk.corpus.reader.nombank.NombankPointer

wordnum:height*wordnum:height*… wordnum:height,

__init__(wordnum, height)[source]
static parse(s)[source]
select(tree)[source]
treepos(tree)[source]

Convert this pointer to a standard ‘tree position’ pointer, given that it points to the given tree.