nltk.grammar.PCFG¶
- class nltk.grammar.PCFG[source]¶
Bases:
CFG
A probabilistic context-free grammar. A PCFG consists of a start state and a set of productions with probabilities. The set of terminals and nonterminals is implicitly specified by the productions.
PCFG productions use the
ProbabilisticProduction
class.PCFGs
impose the constraint that the set of productions with any given left-hand-side must have probabilities that sum to 1 (allowing for a small margin of error).If you need efficient key-based access to productions, you can use a subclass to implement it.
- Variables
EPSILON – The acceptable margin of error for checking that productions with a given left-hand side have probabilities that sum to 1.
- EPSILON = 0.01¶
- __init__(start, productions, calculate_leftcorners=True)[source]¶
Create a new context-free grammar, from the given start state and set of
ProbabilisticProductions
.- Parameters
start (Nonterminal) – The start symbol
productions (list(Production)) – The list of productions that defines the grammar
calculate_leftcorners (bool) – False if we don’t want to calculate the leftcorner relation. In that case, some optimized chart parsers won’t work.
- Raises
ValueError – if the set of productions with any left-hand-side do not have probabilities that sum to a value within EPSILON of 1.
- classmethod fromstring(input, encoding=None)[source]¶
Return a probabilistic context-free grammar corresponding to the input string(s).
- Parameters
input – a grammar, either in the form of a string or else as a list of strings.
- classmethod binarize(grammar, padding='@$@')¶
Convert all non-binary rules into binary by introducing new tokens. Example:
Original: A => B C D After Conversion: A => B A@$@B A@$@B => C D
- check_coverage(tokens)¶
Check whether the grammar rules cover the given list of tokens. If not, then raise an exception.
- chomsky_normal_form(new_token_padding='@$@', flexible=False)¶
Returns a new Grammar that is in chomsky normal
- Param
new_token_padding Customise new rule formation during binarisation
- classmethod eliminate_start(grammar)¶
Eliminate start rule in case it appears on RHS Example: S -> S0 S1 and S0 -> S1 S Then another rule S0_Sigma -> S is added
- is_binarised()¶
Return True if all productions are at most binary. Note that there can still be empty and unary productions.
- is_chomsky_normal_form()¶
Return True if the grammar is of Chomsky Normal Form, i.e. all productions are of the form A -> B C, or A -> “s”.
- is_flexible_chomsky_normal_form()¶
Return True if all productions are of the forms A -> B C, A -> B, or A -> “s”.
- is_leftcorner(cat, left)¶
True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.
- Parameters
cat (Nonterminal) – the parent of the leftcorner
left (Terminal or Nonterminal) – the suggested leftcorner
- Return type
bool
- is_lexical()¶
Return True if all productions are lexicalised.
- is_nonempty()¶
Return True if there are no empty productions.
- is_nonlexical()¶
Return True if all lexical rules are “preterminals”, that is, unary rules which can be separated in a preprocessing step.
This means that all productions are of the forms A -> B1 … Bn (n>=0), or A -> “s”.
Note: is_lexical() and is_nonlexical() are not opposites. There are grammars which are neither, and grammars which are both.
- leftcorner_parents(cat)¶
Return the set of all nonterminals for which the given category is a left corner. This is the inverse of the leftcorner relation.
- Parameters
cat (Nonterminal) – the suggested leftcorner
- Returns
the set of all parents to the leftcorner
- Return type
set(Nonterminal)
- leftcorners(cat)¶
Return the set of all nonterminals that the given nonterminal can start with, including itself.
This is the reflexive, transitive closure of the immediate leftcorner relation: (A > B) iff (A -> B beta)
- Parameters
cat (Nonterminal) – the parent of the leftcorners
- Returns
the set of all leftcorners
- Return type
set(Nonterminal)
- max_len()¶
Return the right-hand side length of the longest grammar production.
- min_len()¶
Return the right-hand side length of the shortest grammar production.
- productions(lhs=None, rhs=None, empty=False)¶
Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.
- Parameters
lhs – Only return productions with the given left-hand side.
rhs – Only return productions with the given first item in the right-hand side.
empty – Only return productions with an empty right-hand side.
- Returns
A list of productions matching the given constraints.
- Return type
list(Production)
- classmethod remove_unitary_rules(grammar)¶
Remove nonlexical unitary rules and convert them to lexical
- start()¶
Return the start symbol of the grammar
- Return type