nltk.grammar.PCFG

class nltk.grammar.PCFG[source]

Bases: CFG

A probabilistic context-free grammar. A PCFG consists of a start state and a set of productions with probabilities. The set of terminals and nonterminals is implicitly specified by the productions.

PCFG productions use the ProbabilisticProduction class. PCFGs impose the constraint that the set of productions with any given left-hand-side must have probabilities that sum to 1 (allowing for a small margin of error).

If you need efficient key-based access to productions, you can use a subclass to implement it.

Variables

EPSILON – The acceptable margin of error for checking that productions with a given left-hand side have probabilities that sum to 1.

EPSILON = 0.01
__init__(start, productions, calculate_leftcorners=True)[source]

Create a new context-free grammar, from the given start state and set of ProbabilisticProductions.

Parameters
  • start (Nonterminal) – The start symbol

  • productions (list(Production)) – The list of productions that defines the grammar

  • calculate_leftcorners (bool) – False if we don’t want to calculate the leftcorner relation. In that case, some optimized chart parsers won’t work.

Raises

ValueError – if the set of productions with any left-hand-side do not have probabilities that sum to a value within EPSILON of 1.

classmethod fromstring(input, encoding=None)[source]

Return a probabilistic context-free grammar corresponding to the input string(s).

Parameters

input – a grammar, either in the form of a string or else as a list of strings.

classmethod binarize(grammar, padding='@$@')

Convert all non-binary rules into binary by introducing new tokens. Example:

Original:
    A => B C D
After Conversion:
    A => B A@$@B
    A@$@B => C D
check_coverage(tokens)

Check whether the grammar rules cover the given list of tokens. If not, then raise an exception.

chomsky_normal_form(new_token_padding='@$@', flexible=False)

Returns a new Grammar that is in chomsky normal

Param

new_token_padding Customise new rule formation during binarisation

classmethod eliminate_start(grammar)

Eliminate start rule in case it appears on RHS Example: S -> S0 S1 and S0 -> S1 S Then another rule S0_Sigma -> S is added

is_binarised()

Return True if all productions are at most binary. Note that there can still be empty and unary productions.

is_chomsky_normal_form()

Return True if the grammar is of Chomsky Normal Form, i.e. all productions are of the form A -> B C, or A -> “s”.

is_flexible_chomsky_normal_form()

Return True if all productions are of the forms A -> B C, A -> B, or A -> “s”.

is_leftcorner(cat, left)

True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.

Parameters
  • cat (Nonterminal) – the parent of the leftcorner

  • left (Terminal or Nonterminal) – the suggested leftcorner

Return type

bool

is_lexical()

Return True if all productions are lexicalised.

is_nonempty()

Return True if there are no empty productions.

is_nonlexical()

Return True if all lexical rules are “preterminals”, that is, unary rules which can be separated in a preprocessing step.

This means that all productions are of the forms A -> B1 … Bn (n>=0), or A -> “s”.

Note: is_lexical() and is_nonlexical() are not opposites. There are grammars which are neither, and grammars which are both.

leftcorner_parents(cat)

Return the set of all nonterminals for which the given category is a left corner. This is the inverse of the leftcorner relation.

Parameters

cat (Nonterminal) – the suggested leftcorner

Returns

the set of all parents to the leftcorner

Return type

set(Nonterminal)

leftcorners(cat)

Return the set of all nonterminals that the given nonterminal can start with, including itself.

This is the reflexive, transitive closure of the immediate leftcorner relation: (A > B) iff (A -> B beta)

Parameters

cat (Nonterminal) – the parent of the leftcorners

Returns

the set of all leftcorners

Return type

set(Nonterminal)

max_len()

Return the right-hand side length of the longest grammar production.

min_len()

Return the right-hand side length of the shortest grammar production.

productions(lhs=None, rhs=None, empty=False)

Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.

Parameters
  • lhs – Only return productions with the given left-hand side.

  • rhs – Only return productions with the given first item in the right-hand side.

  • empty – Only return productions with an empty right-hand side.

Returns

A list of productions matching the given constraints.

Return type

list(Production)

classmethod remove_unitary_rules(grammar)

Remove nonlexical unitary rules and convert them to lexical

start()

Return the start symbol of the grammar

Return type

Nonterminal