nltk.parse.dependencygraph module¶
Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (https://stp.lingfil.uu.se/~nivre/research/MaltXML.html).
- class nltk.parse.dependencygraph.DependencyGraph[source]¶
Bases:
object
A container for the nodes and labelled edges of a dependency structure.
- __init__(tree_str=None, cell_extractor=None, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]¶
Dependency graph.
We place a dummy TOP node with the index 0, since the root node is often assigned 0 as its head. This also means that the indexing of the nodes corresponds directly to the Malt-TAB format, which starts at 1.
If zero-based is True, then Malt-TAB-like input with node numbers starting at 0 and the root node assigned -1 (as produced by, e.g., zpar).
- Parameters:
cell_separator (str) – the cell separator. If not provided, cells are split by whitespace.
top_relation_label (str) – the label by which the top relation is identified, for examlple, ROOT, null or TOP.
- add_arc(head_address, mod_address)[source]¶
Adds an arc from the node specified by head_address to the node specified by the mod address.
- connect_graph()[source]¶
Fully connects all non-root nodes. All nodes are set to be dependents of the root node.
- contains_address(node_address)[source]¶
Returns true if the graph contains a node with the given node address, false otherwise.
- contains_cycle()[source]¶
Check whether there are cycles.
>>> dg = DependencyGraph(treebank_data) >>> dg.contains_cycle() False
>>> cyclic_dg = DependencyGraph() >>> top = {'word': None, 'deps': [1], 'rel': 'TOP', 'address': 0} >>> child1 = {'word': None, 'deps': [2], 'rel': 'NTOP', 'address': 1} >>> child2 = {'word': None, 'deps': [4], 'rel': 'NTOP', 'address': 2} >>> child3 = {'word': None, 'deps': [1], 'rel': 'NTOP', 'address': 3} >>> child4 = {'word': None, 'deps': [3], 'rel': 'NTOP', 'address': 4} >>> cyclic_dg.nodes = { ... 0: top, ... 1: child1, ... 2: child2, ... 3: child3, ... 4: child4, ... } >>> cyclic_dg.root = top
>>> cyclic_dg.contains_cycle() [1, 2, 4, 3]
- left_children(node_index)[source]¶
Returns the number of left children under the node specified by the given address.
- static load(filename, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]¶
- Parameters:
filename – a name of a file in Malt-TAB format
zero_based – nodes in the input file are numbered starting from 0 rather than 1 (as produced by, e.g., zpar)
cell_separator (str) – the cell separator. If not provided, cells are split by whitespace.
top_relation_label (str) – the label by which the top relation is identified, for examlple, ROOT, null or TOP.
- Returns:
a list of DependencyGraphs
- redirect_arcs(originals, redirect)[source]¶
Redirects arcs to any of the nodes in the originals list to the redirect node address.
- remove_by_address(address)[source]¶
Removes the node with the given address. References to this node in others will still exist.
- right_children(node_index)[source]¶
Returns the number of right children under the node specified by the given address.
- to_conll(style)[source]¶
The dependency graph in CoNLL format.
- Parameters:
style (int) – the style to use for the format (3, 4, 10 columns)
- Return type:
str
- to_dot()[source]¶
Return a dot representation suitable for using with Graphviz.
>>> dg = DependencyGraph( ... 'John N 2\n' ... 'loves V 0\n' ... 'Mary N 2' ... ) >>> print(dg.to_dot()) digraph G{ edge [dir=forward] node [shape=plaintext] 0 [label="0 (None)"] 0 -> 2 [label="ROOT"] 1 [label="1 (John)"] 2 [label="2 (loves)"] 2 -> 1 [label=""] 2 -> 3 [label=""] 3 [label="3 (Mary)"] }
- exception nltk.parse.dependencygraph.DependencyGraphError[source]¶
Bases:
Exception
Dependency graph exception.
- nltk.parse.dependencygraph.conll_demo()[source]¶
A demonstration of how to read a string representation of a CoNLL format dependency tree.
- nltk.parse.dependencygraph.dot2img(dot_string, t='svg')[source]¶
Create image representation fom dot_string, using the ‘dot’ program from the Graphviz package.
Use the ‘t’ argument to specify the image file format, for ex. ‘jpeg’, ‘eps’, ‘json’, ‘png’ or ‘webp’ (Running ‘dot -T:’ lists all available formats).
Note that the “capture_output” option of subprocess.run() is only available with text formats (like svg), but not with binary image formats (like png).