nltk.parse.dependencygraph module

Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (https://stp.lingfil.uu.se/~nivre/research/MaltXML.html).

class nltk.parse.dependencygraph.DependencyGraph[source]

Bases: object

A container for the nodes and labelled edges of a dependency structure.

__init__(tree_str=None, cell_extractor=None, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]

Dependency graph.

We place a dummy TOP node with the index 0, since the root node is often assigned 0 as its head. This also means that the indexing of the nodes corresponds directly to the Malt-TAB format, which starts at 1.

If zero-based is True, then Malt-TAB-like input with node numbers starting at 0 and the root node assigned -1 (as produced by, e.g., zpar).

Parameters
  • cell_separator (str) – the cell separator. If not provided, cells are split by whitespace.

  • top_relation_label (str) – the label by which the top relation is identified, for examlple, ROOT, null or TOP.

add_arc(head_address, mod_address)[source]

Adds an arc from the node specified by head_address to the node specified by the mod address.

add_node(node)[source]
connect_graph()[source]

Fully connects all non-root nodes. All nodes are set to be dependents of the root node.

contains_address(node_address)[source]

Returns true if the graph contains a node with the given node address, false otherwise.

contains_cycle()[source]

Check whether there are cycles.

>>> dg = DependencyGraph(treebank_data)
>>> dg.contains_cycle()
False
>>> cyclic_dg = DependencyGraph()
>>> top = {'word': None, 'deps': [1], 'rel': 'TOP', 'address': 0}
>>> child1 = {'word': None, 'deps': [2], 'rel': 'NTOP', 'address': 1}
>>> child2 = {'word': None, 'deps': [4], 'rel': 'NTOP', 'address': 2}
>>> child3 = {'word': None, 'deps': [1], 'rel': 'NTOP', 'address': 3}
>>> child4 = {'word': None, 'deps': [3], 'rel': 'NTOP', 'address': 4}
>>> cyclic_dg.nodes = {
...     0: top,
...     1: child1,
...     2: child2,
...     3: child3,
...     4: child4,
... }
>>> cyclic_dg.root = top
>>> cyclic_dg.contains_cycle()
[1, 2, 4, 3]
get_by_address(node_address)[source]

Return the node with the given address.

get_cycle_path(curr_node, goal_node_index)[source]
left_children(node_index)[source]

Returns the number of left children under the node specified by the given address.

static load(filename, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]
Parameters
  • filename – a name of a file in Malt-TAB format

  • zero_based – nodes in the input file are numbered starting from 0 rather than 1 (as produced by, e.g., zpar)

  • cell_separator (str) – the cell separator. If not provided, cells are split by whitespace.

  • top_relation_label (str) – the label by which the top relation is identified, for examlple, ROOT, null or TOP.

Returns

a list of DependencyGraphs

nx_graph()[source]

Convert the data in a nodelist into a networkx labeled directed graph.

redirect_arcs(originals, redirect)[source]

Redirects arcs to any of the nodes in the originals list to the redirect node address.

remove_by_address(address)[source]

Removes the node with the given address. References to this node in others will still exist.

right_children(node_index)[source]

Returns the number of right children under the node specified by the given address.

to_conll(style)[source]

The dependency graph in CoNLL format.

Parameters

style (int) – the style to use for the format (3, 4, 10 columns)

Return type

str

to_dot()[source]

Return a dot representation suitable for using with Graphviz.

>>> dg = DependencyGraph(
...     'John N 2\n'
...     'loves V 0\n'
...     'Mary N 2'
... )
>>> print(dg.to_dot())
digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 2 [label="ROOT"]
1 [label="1 (John)"]
2 [label="2 (loves)"]
2 -> 1 [label=""]
2 -> 3 [label=""]
3 [label="3 (Mary)"]
}
tree()[source]

Starting with the root node, build a dependency tree using the NLTK Tree constructor. Dependency labels are omitted.

triples(node=None)[source]

Extract dependency triples of the form: ((head word, head tag), rel, (dep word, dep tag))

exception nltk.parse.dependencygraph.DependencyGraphError[source]

Bases: Exception

Dependency graph exception.

nltk.parse.dependencygraph.conll_demo()[source]

A demonstration of how to read a string representation of a CoNLL format dependency tree.

nltk.parse.dependencygraph.conll_file_demo()[source]
nltk.parse.dependencygraph.cycle_finding_demo()[source]
nltk.parse.dependencygraph.demo()[source]
nltk.parse.dependencygraph.dot2img(dot_string, t='svg')[source]

Create image representation fom dot_string, using the ‘dot’ program from the Graphviz package.

Use the ‘t’ argument to specify the image file format, for ex. ‘jpeg’, ‘eps’, ‘json’, ‘png’ or ‘webp’ (Running ‘dot -T:’ lists all available formats).

Note that the “capture_output” option of subprocess.run() is only available with text formats (like svg), but not with binary image formats (like png).

nltk.parse.dependencygraph.malt_demo(nx=False)[source]

A demonstration of the result of reading a dependency version of the first sentence of the Penn Treebank.