nltk.parse package

Submodules

Module contents

NLTK Parsers

Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text’s “parses”. Typically, the text is a single sentence, and the tree structure represents the syntactic structure of the sentence. However, parsers can also be used in other domains. For example, parsers can be used to derive the morphological structure of the morphemes that make up a word, or to derive the discourse structure for a set of utterances.

Sometimes, a single piece of text can be represented by more than one tree structure. Texts represented by more than one tree structure are called “ambiguous” texts. Note that there are actually two ways in which a text can be ambiguous:

  • The text has multiple correct parses.

  • There is not enough information to decide which of several candidate parses is correct.

However, the parser module does not distinguish these two types of ambiguity.

The parser module defines ParserI, a standard interface for parsing texts; and two simple implementations of that interface, ShiftReduceParser and RecursiveDescentParser. It also contains three sub-modules for specialized kinds of parsing:

  • nltk.parser.chart defines chart parsing, which uses dynamic programming to efficiently parse texts.

  • nltk.parser.probabilistic defines probabilistic parsing, which associates a probability with each parse.