nltk.parse.ProbabilisticNonprojectiveParser

class nltk.parse.ProbabilisticNonprojectiveParser[source]

Bases: object

A probabilistic non-projective dependency parser.

Nonprojective dependencies allows for “crossing branches” in the parse tree which is necessary for representing particular linguistic phenomena, or even typical parses in some languages. This parser follows the MST parsing algorithm, outlined in McDonald(2005), which likens the search for the best non-projective parse to finding the maximum spanning tree in a weighted directed graph.

>>> class Scorer(DependencyScorerI):
...     def train(self, graphs):
...         pass
...
...     def score(self, graph):
...         return [
...             [[], [5],  [1],  [1]],
...             [[], [],   [11], [4]],
...             [[], [10], [],   [5]],
...             [[], [8],  [8],  []],
...         ]
>>> npp = ProbabilisticNonprojectiveParser()
>>> npp.train([], Scorer())
>>> parses = npp.parse(['v1', 'v2', 'v3'], [None, None, None])
>>> len(list(parses))
1

Rule based example

>>> from nltk.grammar import DependencyGrammar
>>> grammar = DependencyGrammar.fromstring('''
... 'taught' -> 'play' | 'man'
... 'man' -> 'the' | 'in'
... 'in' -> 'corner'
... 'corner' -> 'the'
... 'play' -> 'golf' | 'dachshund' | 'to'
... 'dachshund' -> 'his'
... ''')
>>> ndp = NonprojectiveDependencyParser(grammar)
>>> parses = ndp.parse(['the', 'man', 'in', 'the', 'corner', 'taught', 'his', 'dachshund', 'to', 'play', 'golf'])
>>> len(list(parses))
4
__init__()[source]

Creates a new non-projective parser.

train(graphs, dependency_scorer)[source]

Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser’s scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.

Parameters
  • graphs (list(DependencyGraph)) – A list of dependency graphs to train the scorer.

  • dependency_scorer (DependencyScorerI) – A scorer which implements the DependencyScorerI interface.

initialize_edge_scores(graph)[source]

Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser’s scorer which was assigned during the training process.

Parameters

graph (DependencyGraph) – A dependency graph to assign scores to.

collapse_nodes(new_node, cycle_path, g_graph, b_graph, c_graph)[source]

Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.

Parameters
  • new_node (Node.) – A Node (Dictionary) to collapse the cycle nodes into.

  • cycle_path (A list of integers.) – A list of node addresses, each of which is in the cycle.

  • c_graph (g_graph, b_graph,) – Graphs which need to be updated.

update_edge_scores(new_node, cycle_path)[source]

Updates the edge scores to reflect a collapse operation into new_node.

Parameters
  • new_node (A Node.) – The node which cycle nodes are collapsed into.

  • cycle_path (A list of integers.) – A list of node addresses that belong to the cycle.

compute_original_indexes(new_indexes)[source]

As nodes are collapsed into others, they are replaced by the new node in the graph, but it’s still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.

Parameters

new_indexes (A list of integers.) – A list of node addresses to check for subsumed nodes.

compute_max_subtract_score(column_index, cycle_indexes)[source]

When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.

Parameters
  • column_index (integer.) – A index representing the column of incoming arcs to a particular node being updated

  • cycle_indexes (A list of integers.) – Only arcs from cycle nodes are considered. This is a list of such nodes addresses.

best_incoming_arc(node_index)[source]

Returns the source of the best incoming arc to the node with address: node_index

Parameters

node_index (integer.) – The address of the ‘destination’ node, the node that is arced to.

original_best_arc(node_index)[source]
parse(tokens, tags)[source]

Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the DependencyScorerI interface and passing it to the training algorithm.

Parameters
  • tokens (list(str)) – A list of words or punctuation to be parsed.

  • tags (list(str)) – A list of tags corresponding by index to the words in the tokens list.

Returns

An iterator of non-projective parses.

Return type

iter(DependencyGraph)