nltk.grammar.CFG

class nltk.grammar.CFG[source]

Bases: object

A context-free grammar. A grammar consists of a start state and a set of productions. The set of terminals and nonterminals is implicitly specified by the productions.

If you need efficient key-based access to productions, you can use a subclass to implement it.

__init__(start, productions, calculate_leftcorners=True)[source]

Create a new context-free grammar, from the given start state and set of Production instances.

Parameters
  • start (Nonterminal) – The start symbol

  • productions (list(Production)) – The list of productions that defines the grammar

  • calculate_leftcorners (bool) – False if we don’t want to calculate the leftcorner relation. In that case, some optimized chart parsers won’t work.

classmethod fromstring(input, encoding=None)[source]

Return the grammar instance corresponding to the input string(s).

Parameters

input – a grammar, either in the form of a string or as a list of strings.

start()[source]

Return the start symbol of the grammar

Return type

Nonterminal

productions(lhs=None, rhs=None, empty=False)[source]

Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.

Parameters
  • lhs – Only return productions with the given left-hand side.

  • rhs – Only return productions with the given first item in the right-hand side.

  • empty – Only return productions with an empty right-hand side.

Returns

A list of productions matching the given constraints.

Return type

list(Production)

leftcorners(cat)[source]

Return the set of all nonterminals that the given nonterminal can start with, including itself.

This is the reflexive, transitive closure of the immediate leftcorner relation: (A > B) iff (A -> B beta)

Parameters

cat (Nonterminal) – the parent of the leftcorners

Returns

the set of all leftcorners

Return type

set(Nonterminal)

is_leftcorner(cat, left)[source]

True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.

Parameters
  • cat (Nonterminal) – the parent of the leftcorner

  • left (Terminal or Nonterminal) – the suggested leftcorner

Return type

bool

leftcorner_parents(cat)[source]

Return the set of all nonterminals for which the given category is a left corner. This is the inverse of the leftcorner relation.

Parameters

cat (Nonterminal) – the suggested leftcorner

Returns

the set of all parents to the leftcorner

Return type

set(Nonterminal)

check_coverage(tokens)[source]

Check whether the grammar rules cover the given list of tokens. If not, then raise an exception.

is_lexical()[source]

Return True if all productions are lexicalised.

is_nonlexical()[source]

Return True if all lexical rules are “preterminals”, that is, unary rules which can be separated in a preprocessing step.

This means that all productions are of the forms A -> B1 … Bn (n>=0), or A -> “s”.

Note: is_lexical() and is_nonlexical() are not opposites. There are grammars which are neither, and grammars which are both.

min_len()[source]

Return the right-hand side length of the shortest grammar production.

max_len()[source]

Return the right-hand side length of the longest grammar production.

is_nonempty()[source]

Return True if there are no empty productions.

is_binarised()[source]

Return True if all productions are at most binary. Note that there can still be empty and unary productions.

is_flexible_chomsky_normal_form()[source]

Return True if all productions are of the forms A -> B C, A -> B, or A -> “s”.

is_chomsky_normal_form()[source]

Return True if the grammar is of Chomsky Normal Form, i.e. all productions are of the form A -> B C, or A -> “s”.

chomsky_normal_form(new_token_padding='@$@', flexible=False)[source]

Returns a new Grammar that is in chomsky normal

Param

new_token_padding Customise new rule formation during binarisation

classmethod remove_unitary_rules(grammar)[source]

Remove nonlexical unitary rules and convert them to lexical

classmethod binarize(grammar, padding='@$@')[source]

Convert all non-binary rules into binary by introducing new tokens. Example:

Original:
    A => B C D
After Conversion:
    A => B A@$@B
    A@$@B => C D
classmethod eliminate_start(grammar)[source]

Eliminate start rule in case it appears on RHS Example: S -> S0 S1 and S0 -> S1 S Then another rule S0_Sigma -> S is added