nltk.corpus.reader.propbank module¶

class nltk.corpus.reader.propbank.PropbankChainTreePointer[source]¶

Bases: PropbankPointer

__init__(pieces)[source]¶

pieces¶: A list of the pieces that make up this chain. Elements may be either PropbankSplitTreePointer or PropbankTreePointer pointers.

select(tree)[source]¶

class nltk.corpus.reader.propbank.PropbankCorpusReader[source]¶

Bases: CorpusReader

Corpus reader for the propbank corpus, which augments the Penn Treebank with information about the predicate argument structure of every verb instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of “frameset files” which define the argument labels used by the annotations, on a per-verb basis. Each “frameset file” contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called “rolesets”. For each “roleset”, the frameset file provides descriptions of the argument roles, along with examples.

__init__(root, propfile, framefiles='', verbsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding='utf8')[source]¶

Parameters:

root – The root directory for this corpus.
propfile – The name of the file containing the predicate- argument annotations (relative to root).
framefiles – A list or regexp specifying the frameset fileids for this corpus.
parse_fileid_xform – A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).
parse_corpus – The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by propbank.

instances(baseform=None)[source]¶

Returns:: a corpus view that acts as a list of PropBankInstance objects, one for each noun in the corpus.

lines()[source]¶

Returns:: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

roleset(roleset_id)[source]¶

Returns:: the xml description for the given roleset.

rolesets(baseform=None)[source]¶

Returns:: list of xml descriptions for rolesets.

verbs()[source]¶

Returns:: a corpus view that acts as a list of all verb lemmas in this corpus (from the verbs.txt file).

class nltk.corpus.reader.propbank.PropbankInflection[source]¶

Bases: object

ACTIVE = 'a'¶

FINITE = 'v'¶

FUTURE = 'f'¶

GERUND = 'g'¶

INFINITIVE = 'i'¶

NONE = '-'¶

PARTICIPLE = 'p'¶

PASSIVE = 'p'¶

PAST = 'p'¶

PERFECT = 'p'¶

PERFECT_AND_PROGRESSIVE = 'b'¶

PRESENT = 'n'¶

PROGRESSIVE = 'o'¶

THIRD_PERSON = '3'¶

__init__(form='-', tense='-', aspect='-', person='-', voice='-')[source]¶

static parse(s)[source]¶

class nltk.corpus.reader.propbank.PropbankInstance[source]¶

Bases: object

__init__(fileid, sentnum, wordnum, tagger, roleset, inflection, predicate, arguments, parse_corpus=None)[source]¶

arguments¶: A list of tuples (argloc, argid), specifying the location and identifier for each of the predicate’s argument in the containing sentence. Argument identifiers are strings such as 'ARG0' or 'ARGM-TMP'. This list does not contain the predicate.

property baseform¶: The baseform of the predicate.

fileid¶: The name of the file containing the parse tree for this instance’s sentence.

inflection¶: A PropbankInflection object describing the inflection of this instance’s predicate.

static parse(s, parse_fileid_xform=None, parse_corpus=None)[source]¶

parse_corpus¶: A corpus reader for the parse trees corresponding to the instances in this propbank corpus.

predicate¶: A PropbankTreePointer indicating the position of this instance’s predicate within its containing sentence.

property predid¶: Identifier of the predicate.

roleset¶: The name of the roleset used by this instance’s predicate. Use propbank.roleset() <PropbankCorpusReader.roleset> to look up information about the roleset.

property sensenumber¶: The sense number of the predicate.

sentnum¶: The sentence number of this sentence within fileid. Indexing starts from zero.

tagger¶: An identifier for the tagger who tagged this instance; or 'gold' if this is an adjuticated instance.

property tree¶: The parse tree corresponding to this instance, or None if the corresponding tree is not available.

wordnum¶: The word number of this instance’s predicate within its containing sentence. Word numbers are indexed starting from zero, and include traces and other empty parse elements.

class nltk.corpus.reader.propbank.PropbankPointer[source]¶

Bases: object

A pointer used by propbank to identify one or more constituents in a parse tree. PropbankPointer is an abstract base class with three concrete subclasses:

PropbankTreePointer is used to point to single constituents.

PropbankSplitTreePointer is used to point to ‘split’ constituents, which consist of a sequence of two or more PropbankTreePointer pointers.

PropbankChainTreePointer is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be PropbankTreePointer or PropbankSplitTreePointer pointers.

__init__()[source]¶

class nltk.corpus.reader.propbank.PropbankSplitTreePointer[source]¶

Bases: PropbankPointer

__init__(pieces)[source]¶

pieces¶: A list of the pieces that make up this chain. Elements are all PropbankTreePointer pointers.

select(tree)[source]¶

class nltk.corpus.reader.propbank.PropbankTreePointer[source]¶

Bases: PropbankPointer

wordnum:height*wordnum:height*… wordnum:height,

__init__(wordnum, height)[source]¶

static parse(s)[source]¶

select(tree)[source]¶

treepos(tree)[source]¶: Convert this pointer to a standard ‘tree position’ pointer, given that it points to the given tree.

NLTK

Documentation

nltk.corpus.reader.propbank module¶