nltk.tree.ParentedTree

class nltk.tree.ParentedTree[source]

Bases: AbstractParentedTree

A Tree that automatically maintains parent pointers for single-parented trees. The following are methods for querying the structure of a parented tree: parent, parent_index, left_sibling, right_sibling, root, treeposition.

Each ParentedTree may have at most one parent. In particular, subtrees may not be shared. Any attempt to reuse a single ParentedTree as a child of more than one parent (or as multiple children of the same parent) will cause a ValueError exception to be raised.

ParentedTrees should never be used in the same tree as Trees or MultiParentedTrees. Mixing tree implementations may result in incorrect parent pointers and in TypeError exceptions.

__init__(node, children=None)[source]
copy(deep=False)[source]

Return a shallow copy of the list.

parent()[source]

The parent of this tree, or None if it has no parent.

parent_index()[source]

The index of this tree in its parent. I.e., ptree.parent()[ptree.parent_index()] is ptree. Note that ptree.parent_index() is not necessarily equal to ptree.parent.index(ptree), since the index() method returns the first child that is equal to its argument.

left_sibling()[source]

The left sibling of this tree, or None if it has none.

right_sibling()[source]

The right sibling of this tree, or None if it has none.

root()[source]

The root of this tree. I.e., the unique ancestor of this tree whose parent is None. If ptree.parent() is None, then ptree is its own root.

treeposition()[source]

The tree position of this tree, relative to the root of the tree. I.e., ptree.root[ptree.treeposition] is ptree.

__new__(**kwargs)
append(child)

Append object to the end of the list.

chomsky_normal_form(factor='right', horzMarkov=None, vertMarkov=0, childChar='|', parentChar='^')

This method can modify a tree in three ways:

  1. Convert a tree into its Chomsky Normal Form (CNF) equivalent – Every subtree has either two non-terminals or one terminal as its children. This process requires the creation of more”artificial” non-terminal nodes.

  2. Markov (vertical) smoothing of children in new artificial nodes

  3. Horizontal (parent) annotation of nodes

Parameters
  • factor (str = [left|right]) – Right or left factoring method (default = “right”)

  • horzMarkov (int | None) – Markov order for sibling smoothing in artificial nodes (None (default) = include all siblings)

  • vertMarkov (int | None) – Markov order for parent smoothing (0 (default) = no vertical annotation)

  • childChar (str) – A string used in construction of the artificial nodes, separating the head of the original subtree from the child nodes that have yet to be expanded (default = “|”)

  • parentChar (str) – A string used to separate the node representation from its vertical annotation

clear()

Remove all items from list.

collapse_unary(collapsePOS=False, collapseRoot=False, joinChar='+')

Collapse subtrees with a single child (ie. unary productions) into a new non-terminal (Tree node) joined by ‘joinChar’. This is useful when working with algorithms that do not allow unary productions, and completely removing the unary productions would require loss of useful information. The Tree is modified directly (since it is passed by reference) and no value is returned.

Parameters
  • collapsePOS (bool) – ‘False’ (default) will not collapse the parent of leaf nodes (ie. Part-of-Speech tags) since they are always unary productions

  • collapseRoot (bool) – ‘False’ (default) will not modify the root production if it is unary. For the Penn WSJ treebank corpus, this corresponds to the TOP -> productions.

  • joinChar (str) – A string used to connect collapsed node values (default = “+”)

classmethod convert(tree)

Convert a tree between different subtypes of Tree. cls determines which class will be used to encode the new tree.

Parameters

tree (Tree) – The tree that should be converted.

Returns

The new Tree.

count(value, /)

Return number of occurrences of value.

draw()

Open a new window containing a graphical diagram of this tree.

extend(children)

Extend list by appending elements from the iterable.

flatten()

Return a flat version of the tree, with all non-root non-terminals removed.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> print(t.flatten())
(S the dog chased the cat)
Returns

a tree consisting of this tree’s root connected directly to its leaves, omitting all intervening non-terminal nodes.

Return type

Tree

freeze(leaf_freezer=None)
classmethod fromlist(l)
Parameters

l (list) – a tree represented as nested lists

Returns

A tree corresponding to the list representation l.

Return type

Tree

Convert nested lists to a NLTK Tree

classmethod fromstring(s, brackets='()', read_node=None, read_leaf=None, node_pattern=None, leaf_pattern=None, remove_empty_top_bracketing=False)

Read a bracketed tree string and return the resulting tree. Trees are represented as nested brackettings, such as:

(S (NP (NNP John)) (VP (V runs)))
Parameters
  • s (str) – The string to read

  • brackets (str (length=2)) – The bracket characters used to mark the beginning and end of trees and subtrees.

  • read_leaf (read_node,) –

    If specified, these functions are applied to the substrings of s corresponding to nodes and leaves (respectively) to obtain the values for those nodes and leaves. They should have the following signature:

    read_node(str) -> value

    For example, these functions could be used to process nodes and leaves whose values should be some type other than string (such as FeatStruct). Note that by default, node strings and leaf strings are delimited by whitespace and brackets; to override this default, use the node_pattern and leaf_pattern arguments.

  • leaf_pattern (node_pattern,) – Regular expression patterns used to find node and leaf substrings in s. By default, both nodes patterns are defined to match any sequence of non-whitespace non-bracket characters.

  • remove_empty_top_bracketing (bool) – If the resulting tree has an empty node label, and is length one, then return its single child instead. This is useful for treebank trees, which sometimes contain an extra level of bracketing.

Returns

A tree corresponding to the string representation s. If this class method is called using a subclass of Tree, then it will return a tree of that type.

Return type

Tree

height()

Return the height of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.height()
5
>>> print(t[0,0])
(D the)
>>> t[0,0].height()
2
Returns

The height of this tree. The height of a tree containing no children is 1; the height of a tree containing only leaves is 2; and the height of any other tree is one plus the maximum of its children’s heights.

Return type

int

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

insert(index, child)

Insert object before index.

label()

Return the node label of the tree.

>>> t = Tree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
>>> t.label()
'S'
Returns

the node label (typically a string)

Return type

any

leaf_treeposition(index)
Returns

The tree position of the index-th leaf in this tree. I.e., if tp=self.leaf_treeposition(i), then self[tp]==self.leaves()[i].

Raises

IndexError – If this tree contains fewer than index+1 leaves, or if index<0.

leaves()

Return the leaves of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.leaves()
['the', 'dog', 'chased', 'the', 'cat']
Returns

a list containing this tree’s leaves. The order reflects the order of the leaves in the tree’s hierarchical structure.

Return type

list

property node

Outdated method to access the node value; use the label() method instead.

@deprecated: Use label() instead

pformat(margin=70, indent=0, nodesep='', parens='()', quotes=False)
Returns

A pretty-printed string representation of this tree.

Return type

str

Parameters
  • margin (int) – The right margin at which to do line-wrapping.

  • indent (int) – The indentation level at which printing begins. This number is used to decide how far to indent subsequent lines.

  • nodesep – A string that is used to separate the node from the children. E.g., the default value ':' gives trees like (S: (NP: I) (VP: (V: saw) (NP: it))).

pformat_latex_qtree()

Returns a representation of the tree compatible with the LaTeX qtree package. This consists of the string \Tree followed by the tree represented in bracketed notation.

For example, the following result was generated from a parse tree of the sentence The announcement astounded us:

\Tree [.I'' [.N'' [.D The ] [.N' [.N announcement ] ] ]
    [.I' [.V'' [.V' [.V astounded ] [.N'' [.N' [.N us ] ] ] ] ] ] ]

See https://www.ling.upenn.edu/advice/latex.html for the LaTeX style file for the qtree package.

Returns

A latex qtree representation of this tree.

Return type

str

pop(index=- 1)

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

pos()

Return a sequence of pos-tagged words extracted from the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.pos()
[('the', 'D'), ('dog', 'N'), ('chased', 'V'), ('the', 'D'), ('cat', 'N')]
Returns

a list of tuples containing leaves and pre-terminals (part-of-speech tags). The order reflects the order of the leaves in the tree’s hierarchical structure.

Return type

list(tuple)

pprint(**kwargs)

Print a string representation of this Tree to ‘stream’

pretty_print(sentence=None, highlight=(), stream=None, **kwargs)

Pretty-print this tree as ASCII or Unicode art. For explanation of the arguments, see the documentation for nltk.tree.prettyprinter.TreePrettyPrinter.

productions()

Generate the productions that correspond to the non-terminal nodes of the tree. For each subtree of the form (P: C1 C2 … Cn) this produces a production of the form P -> C1 C2 … Cn.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.productions() 
[S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased',
NP -> D N, D -> 'the', N -> 'cat']
Return type

list(Production)

remove(child)

Remove first occurrence of value.

Raises ValueError if the value is not present.

reverse()

Reverse IN PLACE.

set_label(label)

Set the node label of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.set_label("T")
>>> print(t)
(T (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))
Parameters

label (any) – the node label (typically a string)

sort(*, key=None, reverse=False)

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

subtrees(filter=None)

Generate all the subtrees of this tree, optionally restricted to trees matching the filter function.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> for s in t.subtrees(lambda t: t.height() == 2):
...     print(s)
(D the)
(N dog)
(V chased)
(D the)
(N cat)
Parameters

filter (function) – the function to filter all local trees

treeposition_spanning_leaves(start, end)
Returns

The tree position of the lowest descendant of this tree that dominates self.leaves()[start:end].

Raises

ValueError – if end <= start

treepositions(order='preorder')
>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.treepositions() 
[(), (0,), (0, 0), (0, 0, 0), (0, 1), (0, 1, 0), (1,), (1, 0), (1, 0, 0), ...]
>>> for pos in t.treepositions('leaves'):
...     t[pos] = t[pos][::-1].upper()
>>> print(t)
(S (NP (D EHT) (N GOD)) (VP (V DESAHC) (NP (D EHT) (N TAC))))
Parameters

order – One of: preorder, postorder, bothorder, leaves.

un_chomsky_normal_form(expandUnary=True, childChar='|', parentChar='^', unaryChar='+')

This method modifies the tree in three ways:

  1. Transforms a tree in Chomsky Normal Form back to its original structure (branching greater than two)

  2. Removes any parent annotation (if it exists)

  3. (optional) expands unary subtrees (if previously collapsed with collapseUnary(…) )

Parameters
  • expandUnary (bool) – Flag to expand unary or not (default = True)

  • childChar (str) – A string separating the head node from its children in an artificial node (default = “|”)

  • parentChar (str) – A string separating the node label from its parent annotation (default = “^”)

  • unaryChar (str) – A string joining two non-terminals in a unary production (default = “+”)