- class nltk.parse.viterbi.ViterbiParser¶
PCFGparser that uses dynamic programming to find the single most likely parse for a text. The
ViterbiParserparser parses texts by filling in a “most likely constituent table”. This table records the most probable tree representation for any given span and node value. In particular, it has an entry for every start index, end index, and node value, recording the most likely subtree that spans from the start index to the end index, and has the given node value.
ViterbiParserparser fills in this table incrementally. It starts by filling in all entries for constituents that span one element of text (i.e., entries where the end index is one greater than the start index). After it has filled in all table entries for constituents that span one element of text, it fills in the entries for constitutants that span two elements of text. It continues filling in the entries for constituents spanning larger and larger portions of the text, until the entire table has been filled. Finally, it returns the table entry for a constituent spanning the entire text, whose node value is the grammar’s start symbol.
In order to find the most likely constituent with a given span and node value, the
ViterbiParserparser considers all productions that could produce that node value. For each production, it finds all children that collectively cover the span and have the node values specified by the production’s right hand side. If the probability of the tree formed by applying the production to the children is greater than the probability of the current entry in the table, then the table is updated with this new tree.
A pseudo-code description of the algorithm used by
ViterbiParseris:Create an empty most likely constituent table, MLC.For width in 1…len(text):For start in 1…len(text)-width:For prod in grammar.productions:For each sequence of subtrees [t, t, …, t[n]] in MLC,where t[i].label()==prod.rhs[i],and the sequence covers [start:start+width]:old_p = MLC[start, start+width, prod.lhs]new_p = P(t)P(t)…P(t[n])P(prod)if new_p > old_p:new_tree = Tree(prod.lhs, t, t, …, t[n])MLC[start, start+width, prod.lhs] = new_treeReturn MLC[0, len(text), start_symbol]
_grammar – The grammar used to parse sentences.
_trace – The level of tracing output that should be generated when parsing a text.
- __init__(grammar, trace=0)¶
Create a new
ViterbiParserparser, that uses
grammarto parse texts.
grammar (PCFG) – The grammar used to parse texts.
trace (int) – The level of tracing that should be used when parsing a text.
0will generate no tracing output; and higher numbers will produce more verbose tracing output.
The grammar used by this parser.
Set the level of tracing output that should be generated when parsing a text.
trace (int) – The trace level. A trace level of
0will generate no tracing output; and higher trace levels will produce more verbose tracing output.
- Return type
A demonstration of the probabilistic parsers. The user is prompted to select which demo to run, and how many parses should be found; and then each parser is run on the same demo, and a summary of the results are displayed.