nltk.grammar.CFG¶

class nltk.grammar.CFG[source]¶

Bases: object

A context-free grammar. A grammar consists of a start state and a set of productions. The set of terminals and nonterminals is implicitly specified by the productions.

If you need efficient key-based access to productions, you can use a subclass to implement it.

__init__(start, productions, calculate_leftcorners=True)[source]¶

Create a new context-free grammar, from the given start state and set of Production instances.

Parameters

start (Nonterminal) – The start symbol
productions (list(Production)) – The list of productions that defines the grammar
calculate_leftcorners (bool) – False if we don’t want to calculate the leftcorner relation. In that case, some optimized chart parsers won’t work.

classmethod fromstring(input, encoding=None)[source]¶

Return the grammar instance corresponding to the input string(s).

Parameters: input – a grammar, either in the form of a string or as a list of strings.

start()[source]¶

Return the start symbol of the grammar

Return type: Nonterminal

productions(lhs=None, rhs=None, empty=False)[source]¶

Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.

Parameters

lhs – Only return productions with the given left-hand side.
rhs – Only return productions with the given first item in the right-hand side.
empty – Only return productions with an empty right-hand side.

Returns

A list of productions matching the given constraints.

Return type

list(Production)

leftcorners(cat)[source]¶

Return the set of all nonterminals that the given nonterminal can start with, including itself.

This is the reflexive, transitive closure of the immediate leftcorner relation: (A > B) iff (A -> B beta)

Parameters: cat (Nonterminal) – the parent of the leftcorners
Returns: the set of all leftcorners
Return type: set(Nonterminal)

is_leftcorner(cat, left)[source]¶

True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.

Parameters

cat (Nonterminal) – the parent of the leftcorner
left (Terminal or Nonterminal) – the suggested leftcorner

Return type

bool

leftcorner_parents(cat)[source]¶

Return the set of all nonterminals for which the given category is a left corner. This is the inverse of the leftcorner relation.

Parameters: cat (Nonterminal) – the suggested leftcorner
Returns: the set of all parents to the leftcorner
Return type: set(Nonterminal)

check_coverage(tokens)[source]¶

Check whether the grammar rules cover the given list of tokens. If not, then raise an exception.

is_lexical()[source]¶: Return True if all productions are lexicalised.

is_nonlexical()[source]¶

Return True if all lexical rules are “preterminals”, that is, unary rules which can be separated in a preprocessing step.

This means that all productions are of the forms A -> B1 … Bn (n>=0), or A -> “s”.

Note: is_lexical() and is_nonlexical() are not opposites. There are grammars which are neither, and grammars which are both.

min_len()[source]¶: Return the right-hand side length of the shortest grammar production.

max_len()[source]¶: Return the right-hand side length of the longest grammar production.

is_nonempty()[source]¶: Return True if there are no empty productions.

is_binarised()[source]¶: Return True if all productions are at most binary. Note that there can still be empty and unary productions.

is_flexible_chomsky_normal_form()[source]¶: Return True if all productions are of the forms A -> B C, A -> B, or A -> “s”.

is_chomsky_normal_form()[source]¶: Return True if the grammar is of Chomsky Normal Form, i.e. all productions are of the form A -> B C, or A -> “s”.

chomsky_normal_form(new_token_padding='@$@', flexible=False)[source]¶

Returns a new Grammar that is in chomsky normal

Param: new_token_padding Customise new rule formation during binarisation

classmethod remove_unitary_rules(grammar)[source]¶: Remove nonlexical unitary rules and convert them to lexical

classmethod binarize(grammar, padding='@$@')[source]¶

Convert all non-binary rules into binary by introducing new tokens. Example:

Original:
    A => B C D
After Conversion:
    A => B A@$@B
    A@$@B => C D

classmethod eliminate_start(grammar)[source]¶: Eliminate start rule in case it appears on RHS Example: S -> S0 S1 and S0 -> S1 S Then another rule S0_Sigma -> S is added

NLTK

Documentation

nltk.grammar.CFG¶