nltk.corpus.reader.propbank module

class nltk.corpus.reader.propbank.PropbankCorpusReader[source]

Bases: nltk.corpus.reader.api.CorpusReader

Corpus reader for the propbank corpus, which augments the Penn Treebank with information about the predicate argument structure of every verb instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of “frameset files” which define the argument labels used by the annotations, on a per-verb basis. Each “frameset file” contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called “rolesets”. For each “roleset”, the frameset file provides descriptions of the argument roles, along with examples.

__init__(root, propfile, framefiles='', verbsfile=None, parse_fileid_xform=None, parse_corpus=None, encoding='utf8')[source]
Parameters
  • root – The root directory for this corpus.

  • propfile – The name of the file containing the predicate- argument annotations (relative to root).

  • framefiles – A list or regexp specifying the frameset fileids for this corpus.

  • parse_fileid_xform – A transform that should be applied to the fileids in this corpus. This should be a function of one argument (a fileid) that returns a string (the new fileid).

  • parse_corpus – The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by propbank.

instances(baseform=None)[source]
Returns

a corpus view that acts as a list of PropBankInstance objects, one for each noun in the corpus.

lines()[source]
Returns

a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

roleset(roleset_id)[source]
Returns

the xml description for the given roleset.

rolesets(baseform=None)[source]
Returns

list of xml descriptions for rolesets.

verbs()[source]
Returns

a corpus view that acts as a list of all verb lemmas in this corpus (from the verbs.txt file).

class nltk.corpus.reader.propbank.PropbankInstance[source]

Bases: object

__init__(fileid, sentnum, wordnum, tagger, roleset, inflection, predicate, arguments, parse_corpus=None)[source]
fileid

The name of the file containing the parse tree for this instance’s sentence.

sentnum

The sentence number of this sentence within fileid. Indexing starts from zero.

wordnum

The word number of this instance’s predicate within its containing sentence. Word numbers are indexed starting from zero, and include traces and other empty parse elements.

tagger

An identifier for the tagger who tagged this instance; or 'gold' if this is an adjuticated instance.

roleset

The name of the roleset used by this instance’s predicate. Use propbank.roleset() <PropbankCorpusReader.roleset> to look up information about the roleset.

inflection

A PropbankInflection object describing the inflection of this instance’s predicate.

predicate

A PropbankTreePointer indicating the position of this instance’s predicate within its containing sentence.

arguments

A list of tuples (argloc, argid), specifying the location and identifier for each of the predicate’s argument in the containing sentence. Argument identifiers are strings such as 'ARG0' or 'ARGM-TMP'. This list does not contain the predicate.

parse_corpus

A corpus reader for the parse trees corresponding to the instances in this propbank corpus.

property baseform

The baseform of the predicate.

property sensenumber

The sense number of the predicate.

property predid

Identifier of the predicate.

property tree

The parse tree corresponding to this instance, or None if the corresponding tree is not available.

static parse(s, parse_fileid_xform=None, parse_corpus=None)[source]
class nltk.corpus.reader.propbank.PropbankPointer[source]

Bases: object

A pointer used by propbank to identify one or more constituents in a parse tree. PropbankPointer is an abstract base class with three concrete subclasses:

  • PropbankTreePointer is used to point to single constituents.

  • PropbankSplitTreePointer is used to point to ‘split’ constituents, which consist of a sequence of two or more PropbankTreePointer pointers.

  • PropbankChainTreePointer is used to point to entire trace chains in a tree. It consists of a sequence of pieces, which can be PropbankTreePointer or PropbankSplitTreePointer pointers.

__init__()[source]
class nltk.corpus.reader.propbank.PropbankChainTreePointer[source]

Bases: nltk.corpus.reader.propbank.PropbankPointer

__init__(pieces)[source]
pieces

A list of the pieces that make up this chain. Elements may be either PropbankSplitTreePointer or PropbankTreePointer pointers.

select(tree)[source]
class nltk.corpus.reader.propbank.PropbankSplitTreePointer[source]

Bases: nltk.corpus.reader.propbank.PropbankPointer

__init__(pieces)[source]
pieces

A list of the pieces that make up this chain. Elements are all PropbankTreePointer pointers.

select(tree)[source]
class nltk.corpus.reader.propbank.PropbankTreePointer[source]

Bases: nltk.corpus.reader.propbank.PropbankPointer

wordnum:height*wordnum:height*… wordnum:height,

__init__(wordnum, height)[source]
static parse(s)[source]
select(tree)[source]
treepos(tree)[source]

Convert this pointer to a standard ‘tree position’ pointer, given that it points to the given tree.

class nltk.corpus.reader.propbank.PropbankInflection[source]

Bases: object

INFINITIVE = 'i'
GERUND = 'g'
PARTICIPLE = 'p'
FINITE = 'v'
FUTURE = 'f'
PAST = 'p'
PRESENT = 'n'
PERFECT = 'p'
PROGRESSIVE = 'o'
PERFECT_AND_PROGRESSIVE = 'b'
THIRD_PERSON = '3'
ACTIVE = 'a'
PASSIVE = 'p'
NONE = '-'
__init__(form='-', tense='-', aspect='-', person='-', voice='-')[source]
static parse(s)[source]