nltk.featstruct module

Basic data classes for representing feature structures, and for performing basic operations on those feature structures. A feature structure is a mapping from feature identifiers to feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure. There are two types of feature structure, implemented by two subclasses of FeatStruct:

  • feature dictionaries, implemented by FeatDict, act like Python dictionaries. Feature identifiers may be strings or instances of the Feature class.

  • feature lists, implemented by FeatList, act like Python lists. Feature identifiers are integers.

Feature structures are typically used to represent partial information about objects. A feature identifier that is not mapped to a value stands for a feature whose value is unknown (not a feature without a value). Two feature structures that represent (potentially overlapping) information about the same object can be combined by unification. When two inconsistent feature structures are unified, the unification fails and returns None.

Features can be specified using “feature paths”, or tuples of feature identifiers that specify path through the nested feature structures to a value. Feature structures may contain reentrant feature values. A “reentrant feature value” is a single feature value that can be accessed via multiple feature paths. Unification preserves the reentrance relations imposed by both of the unified feature structures. In the feature structure resulting from unification, any modifications to a reentrant feature value will be visible using any of its feature paths.

Feature structure variables are encoded using the nltk.sem.Variable class. The variables’ values are tracked using a bindings dictionary, which maps variables to their values. When two feature structures are unified, a fresh bindings dictionary is created to track their values; and before unification completes, all bound variables are replaced by their values. Thus, the bindings dictionaries are usually strictly internal to the unification process. However, it is possible to track the bindings of variables if you choose to, by supplying your own initial bindings dictionary to the unify() function.

When unbound variables are unified with one another, they become aliased. This is encoded by binding one variable to the other.

Lightweight Feature Structures

Many of the functions defined by nltk.featstruct can be applied directly to simple Python dictionaries and lists, rather than to full-fledged FeatDict and FeatList objects. In other words, Python dicts and lists can be used as “light-weight” feature structures.

>>> from nltk.featstruct import unify
>>> unify(dict(x=1, y=dict()), dict(a='a', y=dict(b='b')))  
{'y': {'b': 'b'}, 'x': 1, 'a': 'a'}

However, you should keep in mind the following caveats:

  • Python dictionaries & lists ignore reentrance when checking for equality between values. But two FeatStructs with different reentrances are considered nonequal, even if all their base values are equal.

  • FeatStructs can be easily frozen, allowing them to be used as keys in hash tables. Python dictionaries and lists can not.

  • FeatStructs display reentrance in their string representations; Python dictionaries and lists do not.

  • FeatStructs may not be mixed with Python dictionaries and lists (e.g., when performing unification).

  • FeatStructs provide a number of useful methods, such as walk() and cyclic(), which are not available for Python dicts and lists.

In general, if your feature structures will contain any reentrances, or if you plan to use them as dictionary keys, it is strongly recommended that you use full-fledged FeatStruct objects.

class nltk.featstruct.FeatDict[source]

Bases: FeatStruct, dict

A feature structure that acts like a Python dictionary. I.e., a mapping from feature identifiers to feature values, where a feature identifier can be a string or a Feature; and where a feature value can be either a basic value (such as a string or an integer), or a nested feature structure. A feature identifiers for a FeatDict is sometimes called a “feature name”.

Two feature dicts are considered equal if they assign the same values to all features, and have the same reentrances.

See:

FeatStruct for information about feature paths, reentrance, cyclic feature structures, mutability, freezing, and hashing.

__init__(features=None, **morefeatures)[source]

Create a new feature dictionary, with the specified features.

Parameters:
  • features – The initial value for this feature dictionary. If features is a FeatStruct, then its features are copied (shallow copy). If features is a dict, then a feature is created for each item, mapping its key to its value. If features is a string, then it is processed using FeatStructReader. If features is a list of tuples (name, val), then a feature is created for each tuple.

  • morefeatures – Additional features for the new feature dictionary. If a feature is listed under both features and morefeatures, then the value from morefeatures will be used.

clear() None.  Remove all items from D.

If self is frozen, raise ValueError.

get(name_or_path, default=None)[source]

If the feature with the given name or path exists, return its value; otherwise, return default.

has_key(name_or_path)[source]

Return true if a feature with the given name or path exists.

pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError. If self is frozen, raise ValueError.

popitem(*args, **kwargs)

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. If self is frozen, raise ValueError.

setdefault(*args, **kwargs)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default. If self is frozen, raise ValueError.

update([E, ]**F) None.  Update D from dict/iterable E and F.[source]

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class nltk.featstruct.FeatList[source]

Bases: FeatStruct, list

A list of feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure.

Feature lists may contain reentrant feature values. A “reentrant feature value” is a single feature value that can be accessed via multiple feature paths. Feature lists may also be cyclic.

Two feature lists are considered equal if they assign the same values to all features, and have the same reentrances.

See:

FeatStruct for information about feature paths, reentrance, cyclic feature structures, mutability, freezing, and hashing.

__init__(features=())[source]

Create a new feature list, with the specified features.

Parameters:

features – The initial list of features for this feature list. If features is a string, then it is paresd using FeatStructReader. Otherwise, it should be a sequence of basic values and nested feature structures.

append(*args, **kwargs)

Append object to the end of the list. If self is frozen, raise ValueError.

extend(*args, **kwargs)

Extend list by appending elements from the iterable. If self is frozen, raise ValueError.

insert(*args, **kwargs)

Insert object before index. If self is frozen, raise ValueError.

pop(*args, **kwargs)

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range. If self is frozen, raise ValueError.

remove(*args, **kwargs)

Remove first occurrence of value.

Raises ValueError if the value is not present. If self is frozen, raise ValueError.

reverse(*args, **kwargs)

Reverse IN PLACE. If self is frozen, raise ValueError.

sort(*args, **kwargs)

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order. If self is frozen, raise ValueError.

class nltk.featstruct.FeatStruct[source]

Bases: SubstituteBindingsI

A mapping from feature identifiers to feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure. There are two types of feature structure:

  • feature dictionaries, implemented by FeatDict, act like Python dictionaries. Feature identifiers may be strings or instances of the Feature class.

  • feature lists, implemented by FeatList, act like Python lists. Feature identifiers are integers.

Feature structures may be indexed using either simple feature identifiers or ‘feature paths.’ A feature path is a sequence of feature identifiers that stand for a corresponding sequence of indexing operations. In particular, fstruct[(f1,f2,...,fn)] is equivalent to fstruct[f1][f2]...[fn].

Feature structures may contain reentrant feature structures. A “reentrant feature structure” is a single feature structure object that can be accessed via multiple feature paths. Feature structures may also be cyclic. A feature structure is “cyclic” if there is any feature path from the feature structure to itself.

Two feature structures are considered equal if they assign the same values to all features, and have the same reentrancies.

By default, feature structures are mutable. They may be made immutable with the freeze() method. Once they have been frozen, they may be hashed, and thus used as dictionary keys.

static __new__(cls, features=None, **morefeatures)[source]

Construct and return a new feature structure. If this constructor is called directly, then the returned feature structure will be an instance of either the FeatDict class or the FeatList class.

Parameters:
  • features

    The initial feature values for this feature structure:

    • FeatStruct(string) -> FeatStructReader().read(string)

    • FeatStruct(mapping) -> FeatDict(mapping)

    • FeatStruct(sequence) -> FeatList(sequence)

    • FeatStruct() -> FeatDict()

  • morefeatures – If features is a mapping or None, then morefeatures provides additional features for the FeatDict constructor.

copy(deep=True)[source]

Return a new copy of self. The new copy will not be frozen.

Parameters:

deep – If true, create a deep copy; if false, create a shallow copy.

cyclic()[source]

Return True if this feature structure contains itself.

equal_values(other, check_reentrance=False)[source]

Return True if self and other assign the same value to to every feature. In particular, return true if self[p]==other[p] for every feature path p such that self[p] or other[p] is a base value (i.e., not a nested feature structure).

Parameters:

check_reentrance – If True, then also return False if there is any difference between the reentrances of self and other.

Note:

the == is equivalent to equal_values() with check_reentrance=True.

freeze()[source]

Make this feature structure, and any feature structures it contains, immutable. Note: this method does not attempt to ‘freeze’ any feature value that is not a FeatStruct; it is recommended that you use only immutable feature values.

frozen()[source]

Return True if this feature structure is immutable. Feature structures can be made immutable with the freeze() method. Immutable feature structures may not be made mutable again, but new mutable copies can be produced with the copy() method.

remove_variables()[source]

Return the feature structure that is obtained by deleting any feature whose value is a Variable.

Return type:

FeatStruct

rename_variables(vars=None, used_vars=(), new_vars=None)[source]
See:

nltk.featstruct.rename_variables()

retract_bindings(bindings)[source]
See:

nltk.featstruct.retract_bindings()

substitute_bindings(bindings)[source]
See:

nltk.featstruct.substitute_bindings()

subsumes(other)[source]

Return True if self subsumes other. I.e., return true If unifying self with other would result in a feature structure equal to other.

unify(other, bindings=None, trace=False, fail=None, rename_vars=True)[source]
variables()[source]
See:

nltk.featstruct.find_variables()

walk()[source]

Return an iterator that generates this feature structure, and each feature structure it contains. Each feature structure will be generated exactly once.

class nltk.featstruct.FeatStructReader[source]

Bases: object

VALUE_HANDLERS = [('read_fstruct_value', re.compile('\\s*(?:\\((\\d+)\\)\\s*)?(\\??[\\w-]+)?(\\[)')), ('read_var_value', re.compile('\\?[a-zA-Z_][a-zA-Z0-9_]*')), ('read_str_value', re.compile('[uU]?[rR]?([\'"])')), ('read_int_value', re.compile('-?\\d+')), ('read_sym_value', re.compile('[a-zA-Z_][a-zA-Z0-9_]*')), ('read_app_value', re.compile('<(app)\\((\\?[a-z][a-z]*)\\s*,\\s*(\\?[a-z][a-z]*)\\)>')), ('read_logic_value', re.compile('<(.*?)(?<!-)>')), ('read_set_value', re.compile('{')), ('read_tuple_value', re.compile('\\('))]

A table indicating how feature values should be processed. Each entry in the table is a pair (handler, regexp). The first entry with a matching regexp will have its handler called. Handlers should have the following signature:

def handler(s, position, reentrances, match): ...

and should return a tuple (value, position), where position is the string position where the value ended. (n.b.: order is important here!)

__init__(features=(*slash*, *type*), fdict_class=<class 'nltk.featstruct.FeatStruct'>, flist_class=<class 'nltk.featstruct.FeatList'>, logic_parser=None)[source]
fromstring(s, fstruct=None)[source]

Convert a string representation of a feature structure (as displayed by repr) into a FeatStruct. This process imposes the following restrictions on the string representation:

  • Feature names cannot contain any of the following: whitespace, parentheses, quote marks, equals signs, dashes, commas, and square brackets. Feature names may not begin with plus signs or minus signs.

  • Only the following basic feature value are supported: strings, integers, variables, None, and unquoted alphanumeric strings.

  • For reentrant values, the first mention must specify a reentrance identifier and a value; and any subsequent mentions must use arrows ('->') to reference the reentrance identifier.

read_app_value(s, position, reentrances, match)[source]

Mainly included for backwards compat.

read_fstruct_value(s, position, reentrances, match)[source]
read_int_value(s, position, reentrances, match)[source]
read_logic_value(s, position, reentrances, match)[source]
read_partial(s, position=0, reentrances=None, fstruct=None)[source]

Helper function that reads in a feature structure.

Parameters:
  • s – The string to read.

  • position – The position in the string to start parsing.

  • reentrances – A dictionary from reentrance ids to values. Defaults to an empty dictionary.

Returns:

A tuple (val, pos) of the feature structure created by parsing and the position where the parsed feature structure ends.

Return type:

bool

read_set_value(s, position, reentrances, match)[source]
read_str_value(s, position, reentrances, match)[source]
read_sym_value(s, position, reentrances, match)[source]
read_tuple_value(s, position, reentrances, match)[source]
read_value(s, position, reentrances)[source]
read_var_value(s, position, reentrances, match)[source]
class nltk.featstruct.Feature[source]

Bases: object

A feature identifier that’s specialized to put additional constraints, default values, etc.

__init__(name, default=None, display=None)[source]
property default

Default value for this feature.

property display

Custom display location: can be prefix, or slash.

property name

The name of this feature.

read_value(s, position, reentrances, parser)[source]
unify_base_values(fval1, fval2, bindings)[source]

If possible, return a single value.. If not, return the value UnificationFailure.

class nltk.featstruct.RangeFeature[source]

Bases: Feature

RANGE_RE = re.compile('(-?\\d+):(-?\\d+)')
read_value(s, position, reentrances, parser)[source]
unify_base_values(fval1, fval2, bindings)[source]

If possible, return a single value.. If not, return the value UnificationFailure.

class nltk.featstruct.SlashFeature[source]

Bases: Feature

read_value(s, position, reentrances, parser)[source]
nltk.featstruct.conflicts(fstruct1, fstruct2, trace=0)[source]

Return a list of the feature paths of all features which are assigned incompatible values by fstruct1 and fstruct2.

Return type:

list(tuple)

nltk.featstruct.subsumes(fstruct1, fstruct2)[source]

Return True if fstruct1 subsumes fstruct2. I.e., return true if unifying fstruct1 with fstruct2 would result in a feature structure equal to fstruct2.

Return type:

bool

nltk.featstruct.unify(fstruct1, fstruct2, bindings=None, trace=False, fail=None, rename_vars=True, fs_class='default')[source]

Unify fstruct1 with fstruct2, and return the resulting feature structure. This unified feature structure is the minimal feature structure that contains all feature value assignments from both fstruct1 and fstruct2, and that preserves all reentrancies.

If no such feature structure exists (because fstruct1 and fstruct2 specify incompatible values for some feature), then unification fails, and unify returns None.

Bound variables are replaced by their values. Aliased variables are replaced by their representative variable (if unbound) or the value of their representative variable (if bound). I.e., if variable v is in bindings, then v is replaced by bindings[v]. This will be repeated until the variable is replaced by an unbound variable or a non-variable value.

Unbound variables are bound when they are unified with values; and aliased when they are unified with variables. I.e., if variable v is not in bindings, and is unified with a variable or value x, then bindings[v] is set to x.

If bindings is unspecified, then all variables are assumed to be unbound. I.e., bindings defaults to an empty dict.

>>> from nltk.featstruct import FeatStruct
>>> FeatStruct('[a=?x]').unify(FeatStruct('[b=?x]'))
[a=?x, b=?x2]
Parameters:
  • bindings (dict(Variable -> any)) – A set of variable bindings to be used and updated during unification.

  • trace (bool) – If true, generate trace output.

  • rename_vars (bool) – If True, then rename any variables in fstruct2 that are also used in fstruct1, in order to avoid collisions on variable names.