nltk.tree.prettyprinter module

Pretty-printing of discontinuous trees. Adapted from the disco-dop project, by Andreas van Cranenburgh. https://github.com/andreasvc/disco-dop

Interesting reference (not used for this code): T. Eschbach et al., Orth. Hypergraph Drawing, Journal of Graph Algorithms and Applications, 10(2) 141–157 (2006)149. https://jgaa.info/accepted/2006/EschbachGuentherBecker2006.10.2.pdf

class nltk.tree.prettyprinter.TreePrettyPrinter[source]

Bases: object

Pretty-print a tree in text format, either as ASCII or Unicode. The tree can be a normal tree, or discontinuous.

TreePrettyPrinter(tree, sentence=None, highlight=()) creates an object from which different visualizations can be created.

Parameters
  • tree – a Tree object.

  • sentence – a list of words (strings). If sentence is given, tree must contain integers as leaves, which are taken as indices in sentence. Using this you can display a discontinuous tree.

  • highlight – Optionally, a sequence of Tree objects in tree which should be highlighted. Has the effect of only applying colors to nodes in this sequence (nodes should be given as Tree objects, terminals as indices).

>>> from nltk.tree import Tree
>>> tree = Tree.fromstring('(S (NP Mary) (VP walks))')
>>> print(TreePrettyPrinter(tree).text())
... 
      S
  ____|____
 NP        VP
 |         |
Mary     walks
__init__(tree, sentence=None, highlight=())[source]
static nodecoords(tree, sentence, highlight)[source]

Produce coordinates of nodes on a grid.

Objective:

  • Produce coordinates for a non-overlapping placement of nodes and

    horizontal lines.

  • Order edges so that crossing edges cross a minimal number of previous

    horizontal lines (never vertical lines).

Approach:

  • bottom up level order traversal (start at terminals)

  • at each level, identify nodes which cannot be on the same row

  • identify nodes which cannot be in the same column

  • place nodes into a grid at (row, column)

  • order child-parent edges with crossing edges last

Coordinates are (row, column); the origin (0, 0) is at the top left; the root node is on row 0. Coordinates do not consider the size of a node (which depends on font, &c), so the width of a column of the grid should be automatically determined by the element with the greatest width in that column. Alternatively, the integer coordinates could be converted to coordinates in which the distances between adjacent nodes are non-uniform.

Produces tuple (nodes, coords, edges, highlighted) where:

  • nodes[id]: Tree object for the node with this integer id

  • coords[id]: (n, m) coordinate where to draw node with id in the grid

  • edges[id]: parent id of node with this id (ordered dictionary)

  • highlighted: set of ids that should be highlighted

svg(nodecolor='blue', leafcolor='red', funccolor='green')[source]
Returns

SVG representation of a tree.

text(nodedist=1, unicodelines=False, html=False, ansi=False, nodecolor='blue', leafcolor='red', funccolor='green', abbreviate=None, maxwidth=16)[source]
Returns

ASCII art for a discontinuous tree.

Parameters
  • unicodelines – whether to use Unicode line drawing characters instead of plain (7-bit) ASCII.

  • html – whether to wrap output in html code (default plain text).

  • ansi – whether to produce colors with ANSI escape sequences (only effective when html==False).

  • nodecolor (leafcolor,) – specify colors of leaves and phrasal nodes; effective when either html or ansi is True.

  • abbreviate – if True, abbreviate labels longer than 5 characters. If integer, abbreviate labels longer than abbr characters.

  • maxwidth – maximum number of characters before a label starts to wrap; pass None to disable.