nltk.parse.chart module

Data classes and parser implementations for “chart parsers”, which use dynamic programming to efficiently parse a text. A chart parser derives parse trees for a text by iteratively adding “edges” to a “chart.” Each edge represents a hypothesis about the tree structure for a subsequence of the text. The chart is a “blackboard” for composing and combining these hypotheses.

When a chart parser begins parsing a text, it creates a new (empty) chart, spanning the text. It then incrementally adds new edges to the chart. A set of “chart rules” specifies the conditions under which new edges should be added to the chart. Once the chart reaches a stage where none of the chart rules adds any new edges, parsing is complete.

Charts are encoded with the Chart class, and edges are encoded with the TreeEdge and LeafEdge classes. The chart parser module defines three chart parsers:

  • ChartParser is a simple and flexible chart parser. Given a set of chart rules, it will apply those rules to the chart until no more edges are added.

  • SteppingChartParser is a subclass of ChartParser that can be used to step through the parsing process.

class nltk.parse.chart.AbstractChartRule[source]

Bases: ChartRuleI

An abstract base class for chart rules. AbstractChartRule provides:

  • A default implementation for apply.

  • A default implementation for apply_everywhere, (Currently, this implementation assumes that NUM_EDGES <= 3.)

  • A default implementation for __str__, which returns a name based on the rule’s class name.

apply(chart, grammar, *edges)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

apply_everywhere(chart, grammar)[source]

Return a generator that will add all edges licensed by this rule, given the edges that are currently in the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Return type:

iter(EdgeI)

class nltk.parse.chart.BottomUpChartParser[source]

Bases: ChartParser

A ChartParser using a bottom-up parsing strategy. See ChartParser for more information.

__init__(grammar, **parser_args)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

class nltk.parse.chart.BottomUpLeftCornerChartParser[source]

Bases: ChartParser

A ChartParser using a bottom-up left-corner parsing strategy. This strategy is often more efficient than standard bottom-up. See ChartParser for more information.

__init__(grammar, **parser_args)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

class nltk.parse.chart.BottomUpPredictCombineRule[source]

Bases: BottomUpPredictRule

A rule licensing any edge corresponding to a production whose right-hand side begins with a complete edge’s left-hand side. In particular, this rule specifies that [A -> alpha \*] licenses the edge [B -> A \* beta] for each grammar production B -> A beta.

Note:

This is like BottomUpPredictRule, but it also applies the FundamentalRule to the resulting edge.

NUM_EDGES = 1
apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.BottomUpPredictRule[source]

Bases: AbstractChartRule

A rule licensing any edge corresponding to a production whose right-hand side begins with a complete edge’s left-hand side. In particular, this rule specifies that [A -> alpha \*] licenses the edge [B -> \* A beta] for each grammar production B -> A beta.

NUM_EDGES = 1
apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.CachedTopDownPredictRule[source]

Bases: TopDownPredictRule

A cached version of TopDownPredictRule. After the first time this rule is applied to an edge with a given end and next, it will not generate any more edges for edges with that end and next.

If chart or grammar are changed, then the cache is flushed.

__init__()[source]
apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.Chart[source]

Bases: object

A blackboard for hypotheses about the syntactic constituents of a sentence. A chart contains a set of edges, and each edge encodes a single hypothesis about the structure of some portion of the sentence.

The select method can be used to select a specific collection of edges. For example chart.select(is_complete=True, start=0) yields all complete edges whose start indices are 0. To ensure the efficiency of these selection operations, Chart dynamically creates and maintains an index for each set of attributes that have been selected on.

In order to reconstruct the trees that are represented by an edge, the chart associates each edge with a set of child pointer lists. A child pointer list is a list of the edges that license an edge’s right-hand side.

Variables:
  • _tokens – The sentence that the chart covers.

  • _num_leaves – The number of tokens.

  • _edges – A list of the edges in the chart

  • _edge_to_cpls – A dictionary mapping each edge to a set of child pointer lists that are associated with that edge.

  • _indexes – A dictionary mapping tuples of edge attributes to indices, where each index maps the corresponding edge attribute values to lists of edges.

__init__(tokens)[source]

Construct a new chart. The chart is initialized with the leaf edges corresponding to the terminal leaves.

Parameters:

tokens (list) – The sentence that this chart will be used to parse.

child_pointer_lists(edge)[source]

Return the set of child pointer lists for the given edge. Each child pointer list is a list of edges that have been used to form this edge.

Return type:

list(list(EdgeI))

dot_digraph()[source]
edges()[source]

Return a list of all edges in this chart. New edges that are added to the chart after the call to edges() will not be contained in this list.

Return type:

list(EdgeI)

See:

iteredges, select

initialize()[source]

Clear the chart.

insert(edge, *child_pointer_lists)[source]

Add a new edge to the chart, and return True if this operation modified the chart. In particular, return true iff the chart did not already contain edge, or if it did not already associate child_pointer_lists with edge.

Parameters:
  • edge (EdgeI) – The new edge

  • child_pointer_lists (sequence of tuple(EdgeI)) – A sequence of lists of the edges that were used to form this edge. This list is used to reconstruct the trees (or partial trees) that are associated with edge.

Return type:

bool

insert_with_backpointer(new_edge, previous_edge, child_edge)[source]

Add a new edge to the chart, using a pointer to the previous edge.

iteredges()[source]

Return an iterator over the edges in this chart. It is not guaranteed that new edges which are added to the chart before the iterator is exhausted will also be generated.

Return type:

iter(EdgeI)

See:

edges, select

leaf(index)[source]

Return the leaf value of the word at the given index.

Return type:

str

leaves()[source]

Return a list of the leaf values of each word in the chart’s sentence.

Return type:

list(str)

num_edges()[source]

Return the number of edges contained in this chart.

Return type:

int

num_leaves()[source]

Return the number of words in this chart’s sentence.

Return type:

int

parses(root, tree_class=<class 'nltk.tree.tree.Tree'>)[source]

Return an iterator of the complete tree structures that span the entire chart, and whose root node is root.

pretty_format(width=None)[source]

Return a pretty-printed string representation of this chart.

Parameters:

width – The number of characters allotted to each index in the sentence.

Return type:

str

pretty_format_edge(edge, width=None)[source]

Return a pretty-printed string representation of a given edge in this chart.

Return type:

str

Parameters:

width – The number of characters allotted to each index in the sentence.

pretty_format_leaves(width=None)[source]

Return a pretty-printed string representation of this chart’s leaves. This string can be used as a header for calls to pretty_format_edge.

select(**restrictions)[source]

Return an iterator over the edges in this chart. Any new edges that are added to the chart before the iterator is exahusted will also be generated. restrictions can be used to restrict the set of edges that will be generated.

Parameters:
  • span – Only generate edges e where e.span()==span

  • start – Only generate edges e where e.start()==start

  • end – Only generate edges e where e.end()==end

  • length – Only generate edges e where e.length()==length

  • lhs – Only generate edges e where e.lhs()==lhs

  • rhs – Only generate edges e where e.rhs()==rhs

  • nextsym – Only generate edges e where e.nextsym()==nextsym

  • dot – Only generate edges e where e.dot()==dot

  • is_complete – Only generate edges e where e.is_complete()==is_complete

  • is_incomplete – Only generate edges e where e.is_incomplete()==is_incomplete

Return type:

iter(EdgeI)

trees(edge, tree_class=<class 'nltk.tree.tree.Tree'>, complete=False)[source]

Return an iterator of the tree structures that are associated with edge.

If edge is incomplete, then the unexpanded children will be encoded as childless subtrees, whose node value is the corresponding terminal or nonterminal.

Return type:

list(Tree)

Note:

If two trees share a common subtree, then the same Tree may be used to encode that subtree in both trees. If you need to eliminate this subtree sharing, then create a deep copy of each tree.

class nltk.parse.chart.ChartParser[source]

Bases: ParserI

A generic chart parser. A “strategy”, or list of ChartRuleI instances, is used to decide what edges to add to the chart. In particular, ChartParser uses the following algorithm to parse texts:

Until no new edges are added:
For each rule in strategy:
Apply rule to any applicable edges in the chart.
Return any complete parses in the chart
__init__(grammar, strategy=[<nltk.parse.chart.LeafInitRule object>, <nltk.parse.chart.EmptyPredictRule object>, <nltk.parse.chart.BottomUpPredictCombineRule object>, <nltk.parse.chart.SingleEdgeFundamentalRule object>], trace=0, trace_chart_width=50, use_agenda=True, chart_class=<class 'nltk.parse.chart.Chart'>)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

chart_parse(tokens, trace=None)[source]

Return the final parse Chart from which all possible parse trees can be extracted.

Parameters:

tokens (list(str)) – The sentence to be parsed

Return type:

Chart

grammar()[source]
Returns:

The grammar used by this parser.

parse(tokens, tree_class=<class 'nltk.tree.tree.Tree'>)[source]
Returns:

An iterator that generates parse trees for the sentence. When possible this list is sorted from most likely to least likely.

Parameters:

sent (list(str)) – The sentence to be parsed

Return type:

iter(Tree)

class nltk.parse.chart.ChartRuleI[source]

Bases: object

A rule that specifies what new edges are licensed by any given set of existing edges. Each chart rule expects a fixed number of edges, as indicated by the class variable NUM_EDGES. In particular:

  • A chart rule with NUM_EDGES=0 specifies what new edges are licensed, regardless of existing edges.

  • A chart rule with NUM_EDGES=1 specifies what new edges are licensed by a single existing edge.

  • A chart rule with NUM_EDGES=2 specifies what new edges are licensed by a pair of existing edges.

Variables:

NUM_EDGES – The number of existing edges that this rule uses to license new edges. Typically, this number ranges from zero to two.

apply(chart, grammar, *edges)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

apply_everywhere(chart, grammar)[source]

Return a generator that will add all edges licensed by this rule, given the edges that are currently in the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Return type:

iter(EdgeI)

class nltk.parse.chart.EdgeI[source]

Bases: object

A hypothesis about the structure of part of a sentence. Each edge records the fact that a structure is (partially) consistent with the sentence. An edge contains:

  • A span, indicating what part of the sentence is consistent with the hypothesized structure.

  • A left-hand side, specifying what kind of structure is hypothesized.

  • A right-hand side, specifying the contents of the hypothesized structure.

  • A dot position, indicating how much of the hypothesized structure is consistent with the sentence.

Every edge is either complete or incomplete:

  • An edge is complete if its structure is fully consistent with the sentence.

  • An edge is incomplete if its structure is partially consistent with the sentence. For every incomplete edge, the span specifies a possible prefix for the edge’s structure.

There are two kinds of edge:

  • A TreeEdge records which trees have been found to be (partially) consistent with the text.

  • A LeafEdge records the tokens occurring in the text.

The EdgeI interface provides a common interface to both types of edge, allowing chart parsers to treat them in a uniform manner.

__init__()[source]
dot()[source]

Return this edge’s dot position, which indicates how much of the hypothesized structure is consistent with the sentence. In particular, self.rhs[:dot] is consistent with tokens[self.start():self.end()].

Return type:

int

end()[source]

Return the end index of this edge’s span.

Return type:

int

is_complete()[source]

Return True if this edge’s structure is fully consistent with the text.

Return type:

bool

is_incomplete()[source]

Return True if this edge’s structure is partially consistent with the text.

Return type:

bool

length()[source]

Return the length of this edge’s span.

Return type:

int

lhs()[source]

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the left-hand side values for each edge type.

nextsym()[source]

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:

Nonterminal or terminal or None

rhs()[source]

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the right-hand side values for each edge type.

span()[source]

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:

tuple(int, int)

start()[source]

Return the start index of this edge’s span.

Return type:

int

class nltk.parse.chart.EmptyPredictRule[source]

Bases: AbstractChartRule

A rule that inserts all empty productions as passive edges, in every position in the chart.

NUM_EDGES = 0
apply(chart, grammar)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.FilteredBottomUpPredictCombineRule[source]

Bases: BottomUpPredictCombineRule

apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.FilteredSingleEdgeFundamentalRule[source]

Bases: SingleEdgeFundamentalRule

class nltk.parse.chart.FundamentalRule[source]

Bases: AbstractChartRule

A rule that joins two adjacent edges to form a single combined edge. In particular, this rule specifies that any pair of edges

  • [A -> alpha \* B beta][i:j]

  • [B -> gamma \*][j:k]

licenses the edge:

  • [A -> alpha B * beta][i:j]

NUM_EDGES = 2
apply(chart, grammar, left_edge, right_edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.LeafEdge[source]

Bases: EdgeI

An edge that records the fact that a leaf value is consistent with a word in the sentence. A leaf edge consists of:

  • An index, indicating the position of the word.

  • A leaf, specifying the word’s content.

A leaf edge’s left-hand side is its leaf value, and its right hand side is (). Its span is [index, index+1], and its dot position is 0.

__init__(leaf, index)[source]

Construct a new LeafEdge.

Parameters:
  • leaf – The new edge’s leaf value, specifying the word that is recorded by this edge.

  • index – The new edge’s index, specifying the position of the word that is recorded by this edge.

dot()[source]

Return this edge’s dot position, which indicates how much of the hypothesized structure is consistent with the sentence. In particular, self.rhs[:dot] is consistent with tokens[self.start():self.end()].

Return type:

int

end()[source]

Return the end index of this edge’s span.

Return type:

int

is_complete()[source]

Return True if this edge’s structure is fully consistent with the text.

Return type:

bool

is_incomplete()[source]

Return True if this edge’s structure is partially consistent with the text.

Return type:

bool

length()[source]

Return the length of this edge’s span.

Return type:

int

lhs()[source]

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the left-hand side values for each edge type.

nextsym()[source]

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:

Nonterminal or terminal or None

rhs()[source]

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the right-hand side values for each edge type.

span()[source]

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:

tuple(int, int)

start()[source]

Return the start index of this edge’s span.

Return type:

int

class nltk.parse.chart.LeafInitRule[source]

Bases: AbstractChartRule

NUM_EDGES = 0
apply(chart, grammar)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.LeftCornerChartParser[source]

Bases: ChartParser

__init__(grammar, **parser_args)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

class nltk.parse.chart.SingleEdgeFundamentalRule[source]

Bases: FundamentalRule

A rule that joins a given edge with adjacent edges in the chart, to form combined edges. In particular, this rule specifies that either of the edges:

  • [A -> alpha \* B beta][i:j]

  • [B -> gamma \*][j:k]

licenses the edge:

  • [A -> alpha B * beta][i:j]

if the other edge is already in the chart.

Note:

This is basically FundamentalRule, with one edge left unspecified.

NUM_EDGES = 1
apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.SteppingChartParser[source]

Bases: ChartParser

A ChartParser that allows you to step through the parsing process, adding a single edge at a time. It also allows you to change the parser’s strategy or grammar midway through parsing a text.

The initialize method is used to start parsing a text. step adds a single edge to the chart. set_strategy changes the strategy used by the chart parser. parses returns the set of parses that has been found by the chart parser.

Variables:

_restart – Records whether the parser’s strategy, grammar, or chart has been changed. If so, then step must restart the parsing algorithm.

__init__(grammar, strategy=[], trace=0)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

chart()[source]

Return the chart that is used by this parser.

current_chartrule()[source]

Return the chart rule used to generate the most recent edge.

grammar()[source]

Return the grammar used by this parser.

initialize(tokens)[source]

Begin parsing the given tokens.

parse(tokens, tree_class=<class 'nltk.tree.tree.Tree'>)[source]
Returns:

An iterator that generates parse trees for the sentence. When possible this list is sorted from most likely to least likely.

Parameters:

sent (list(str)) – The sentence to be parsed

Return type:

iter(Tree)

parses(tree_class=<class 'nltk.tree.tree.Tree'>)[source]

Return the parse trees currently contained in the chart.

set_chart(chart)[source]

Load a given chart into the chart parser.

set_grammar(grammar)[source]

Change the grammar used by the parser.

set_strategy(strategy)[source]

Change the strategy that the parser uses to decide which edges to add to the chart.

Parameters:

strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart.

step()[source]

Return a generator that adds edges to the chart, one at a time. Each time the generator is resumed, it adds a single edge and yields that edge. If no more edges can be added, then it yields None.

If the parser’s strategy, grammar, or chart is changed, then the generator will continue adding edges using the new strategy, grammar, or chart.

Note that this generator never terminates, since the grammar or strategy might be changed to values that would add new edges. Instead, it yields None when no more edges can be added with the current strategy and grammar.

strategy()[source]

Return the strategy used by this parser.

class nltk.parse.chart.TopDownChartParser[source]

Bases: ChartParser

A ChartParser using a top-down parsing strategy. See ChartParser for more information.

__init__(grammar, **parser_args)[source]

Create a new chart parser, that uses grammar to parse texts.

Parameters:
  • grammar (CFG) – The grammar used to parse texts.

  • strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart (top-down strategy by default).

  • trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.

  • trace_chart_width (int) – The default total width reserved for the chart in trace output. The remainder of each line will be used to display edges.

  • use_agenda (bool) – Use an optimized agenda-based algorithm, if possible.

  • chart_class – The class that should be used to create the parse charts.

class nltk.parse.chart.TopDownInitRule[source]

Bases: AbstractChartRule

A rule licensing edges corresponding to the grammar productions for the grammar’s start symbol. In particular, this rule specifies that [S -> \* alpha][0:i] is licensed for each grammar production S -> alpha, where S is the grammar’s start symbol.

NUM_EDGES = 0
apply(chart, grammar)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.TopDownPredictRule[source]

Bases: AbstractChartRule

A rule licensing edges corresponding to the grammar productions for the nonterminal following an incomplete edge’s dot. In particular, this rule specifies that [A -> alpha \* B beta][i:j] licenses the edge [B -> \* gamma][j:j] for each grammar production B -> gamma.

Note:

This rule corresponds to the Predictor Rule in Earley parsing.

NUM_EDGES = 1
apply(chart, grammar, edge)[source]

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:

edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to apply() is specified by the NUM_EDGES class variable.

Return type:

iter(EdgeI)

class nltk.parse.chart.TreeEdge[source]

Bases: EdgeI

An edge that records the fact that a tree is (partially) consistent with the sentence. A tree edge consists of:

  • A span, indicating what part of the sentence is consistent with the hypothesized tree.

  • A left-hand side, specifying the hypothesized tree’s node value.

  • A right-hand side, specifying the hypothesized tree’s children. Each element of the right-hand side is either a terminal, specifying a token with that terminal as its leaf value; or a nonterminal, specifying a subtree with that nonterminal’s symbol as its node value.

  • A dot position, indicating which children are consistent with part of the sentence. In particular, if dot is the dot position, rhs is the right-hand size, (start,end) is the span, and sentence is the list of tokens in the sentence, then tokens[start:end] can be spanned by the children specified by rhs[:dot].

For more information about edges, see the EdgeI interface.

__init__(span, lhs, rhs, dot=0)[source]

Construct a new TreeEdge.

Parameters:
  • span (tuple(int, int)) – A tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with the new edge’s structure.

  • lhs (Nonterminal) – The new edge’s left-hand side, specifying the hypothesized tree’s node value.

  • rhs (list(Nonterminal and str)) – The new edge’s right-hand side, specifying the hypothesized tree’s children.

  • dot (int) – The position of the new edge’s dot. This position specifies what prefix of the production’s right hand side is consistent with the text. In particular, if sentence is the list of tokens in the sentence, then okens[span[0]:span[1]] can be spanned by the children specified by rhs[:dot].

dot()[source]

Return this edge’s dot position, which indicates how much of the hypothesized structure is consistent with the sentence. In particular, self.rhs[:dot] is consistent with tokens[self.start():self.end()].

Return type:

int

end()[source]

Return the end index of this edge’s span.

Return type:

int

static from_production(production, index)[source]

Return a new TreeEdge formed from the given production. The new edge’s left-hand side and right-hand side will be taken from production; its span will be (index,index); and its dot position will be 0.

Return type:

TreeEdge

is_complete()[source]

Return True if this edge’s structure is fully consistent with the text.

Return type:

bool

is_incomplete()[source]

Return True if this edge’s structure is partially consistent with the text.

Return type:

bool

length()[source]

Return the length of this edge’s span.

Return type:

int

lhs()[source]

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the left-hand side values for each edge type.

move_dot_forward(new_end)[source]

Return a new TreeEdge formed from this edge. The new edge’s dot position is increased by 1, and its end index will be replaced by new_end.

Parameters:

new_end (int) – The new end index.

Return type:

TreeEdge

nextsym()[source]

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:

Nonterminal or terminal or None

rhs()[source]

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:

TreeEdge and LeafEdge for a description of the right-hand side values for each edge type.

span()[source]

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:

tuple(int, int)

start()[source]

Return the start index of this edge’s span.

Return type:

int

nltk.parse.chart.demo(choice=None, print_times=True, print_grammar=False, print_trees=True, trace=2, sent='I saw John with a dog with my cookie', numparses=5)[source]

A demonstration of the chart parsers.

nltk.parse.chart.demo_grammar()[source]