nltk.featstruct module¶
Basic data classes for representing feature structures, and for
performing basic operations on those feature structures. A feature
structure is a mapping from feature identifiers to feature values,
where each feature value is either a basic value (such as a string or
an integer), or a nested feature structure. There are two types of
feature structure, implemented by two subclasses of FeatStruct
:
feature dictionaries, implemented by
FeatDict
, act like Python dictionaries. Feature identifiers may be strings or instances of theFeature
class.feature lists, implemented by
FeatList
, act like Python lists. Feature identifiers are integers.
Feature structures are typically used to represent partial information about objects. A feature identifier that is not mapped to a value stands for a feature whose value is unknown (not a feature without a value). Two feature structures that represent (potentially overlapping) information about the same object can be combined by unification. When two inconsistent feature structures are unified, the unification fails and returns None.
Features can be specified using “feature paths”, or tuples of feature identifiers that specify path through the nested feature structures to a value. Feature structures may contain reentrant feature values. A “reentrant feature value” is a single feature value that can be accessed via multiple feature paths. Unification preserves the reentrance relations imposed by both of the unified feature structures. In the feature structure resulting from unification, any modifications to a reentrant feature value will be visible using any of its feature paths.
Feature structure variables are encoded using the nltk.sem.Variable
class. The variables’ values are tracked using a bindings
dictionary, which maps variables to their values. When two feature
structures are unified, a fresh bindings dictionary is created to
track their values; and before unification completes, all bound
variables are replaced by their values. Thus, the bindings
dictionaries are usually strictly internal to the unification process.
However, it is possible to track the bindings of variables if you
choose to, by supplying your own initial bindings dictionary to the
unify()
function.
When unbound variables are unified with one another, they become aliased. This is encoded by binding one variable to the other.
Lightweight Feature Structures¶
Many of the functions defined by nltk.featstruct
can be applied
directly to simple Python dictionaries and lists, rather than to
full-fledged FeatDict
and FeatList
objects. In other words,
Python dicts
and lists
can be used as “light-weight” feature
structures.
>>> from nltk.featstruct import unify
>>> unify(dict(x=1, y=dict()), dict(a='a', y=dict(b='b')))
{'y': {'b': 'b'}, 'x': 1, 'a': 'a'}
However, you should keep in mind the following caveats:
Python dictionaries & lists ignore reentrance when checking for equality between values. But two FeatStructs with different reentrances are considered nonequal, even if all their base values are equal.
FeatStructs can be easily frozen, allowing them to be used as keys in hash tables. Python dictionaries and lists can not.
FeatStructs display reentrance in their string representations; Python dictionaries and lists do not.
FeatStructs may not be mixed with Python dictionaries and lists (e.g., when performing unification).
FeatStructs provide a number of useful methods, such as
walk()
andcyclic()
, which are not available for Python dicts and lists.
In general, if your feature structures will contain any reentrances,
or if you plan to use them as dictionary keys, it is strongly
recommended that you use full-fledged FeatStruct
objects.
- class nltk.featstruct.FeatDict[source]¶
Bases:
FeatStruct
,dict
A feature structure that acts like a Python dictionary. I.e., a mapping from feature identifiers to feature values, where a feature identifier can be a string or a
Feature
; and where a feature value can be either a basic value (such as a string or an integer), or a nested feature structure. A feature identifiers for aFeatDict
is sometimes called a “feature name”.Two feature dicts are considered equal if they assign the same values to all features, and have the same reentrances.
- See:
FeatStruct
for information about feature paths, reentrance, cyclic feature structures, mutability, freezing, and hashing.
- __init__(features=None, **morefeatures)[source]¶
Create a new feature dictionary, with the specified features.
- Parameters:
features – The initial value for this feature dictionary. If
features
is aFeatStruct
, then its features are copied (shallow copy). Iffeatures
is a dict, then a feature is created for each item, mapping its key to its value. Iffeatures
is a string, then it is processed usingFeatStructReader
. Iffeatures
is a list of tuples(name, val)
, then a feature is created for each tuple.morefeatures – Additional features for the new feature dictionary. If a feature is listed under both
features
andmorefeatures
, then the value frommorefeatures
will be used.
- clear() None. Remove all items from D. ¶
If self is frozen, raise ValueError.
- get(name_or_path, default=None)[source]¶
If the feature with the given name or path exists, return its value; otherwise, return
default
.
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If the key is not found, return the default if given; otherwise, raise a KeyError. If self is frozen, raise ValueError.
- popitem(*args, **kwargs)¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. If self is frozen, raise ValueError.
- setdefault(*args, **kwargs)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default. If self is frozen, raise ValueError.
- class nltk.featstruct.FeatList[source]¶
Bases:
FeatStruct
,list
A list of feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure.
Feature lists may contain reentrant feature values. A “reentrant feature value” is a single feature value that can be accessed via multiple feature paths. Feature lists may also be cyclic.
Two feature lists are considered equal if they assign the same values to all features, and have the same reentrances.
- See:
FeatStruct
for information about feature paths, reentrance, cyclic feature structures, mutability, freezing, and hashing.
- __init__(features=())[source]¶
Create a new feature list, with the specified features.
- Parameters:
features – The initial list of features for this feature list. If
features
is a string, then it is paresd usingFeatStructReader
. Otherwise, it should be a sequence of basic values and nested feature structures.
- append(*args, **kwargs)¶
Append object to the end of the list. If self is frozen, raise ValueError.
- extend(*args, **kwargs)¶
Extend list by appending elements from the iterable. If self is frozen, raise ValueError.
- insert(*args, **kwargs)¶
Insert object before index. If self is frozen, raise ValueError.
- pop(*args, **kwargs)¶
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range. If self is frozen, raise ValueError.
- remove(*args, **kwargs)¶
Remove first occurrence of value.
Raises ValueError if the value is not present. If self is frozen, raise ValueError.
- reverse(*args, **kwargs)¶
Reverse IN PLACE. If self is frozen, raise ValueError.
- sort(*args, **kwargs)¶
Sort the list in ascending order and return None.
The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).
If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.
The reverse flag can be set to sort in descending order. If self is frozen, raise ValueError.
- class nltk.featstruct.FeatStruct[source]¶
Bases:
SubstituteBindingsI
A mapping from feature identifiers to feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure. There are two types of feature structure:
feature dictionaries, implemented by
FeatDict
, act like Python dictionaries. Feature identifiers may be strings or instances of theFeature
class.feature lists, implemented by
FeatList
, act like Python lists. Feature identifiers are integers.
Feature structures may be indexed using either simple feature identifiers or ‘feature paths.’ A feature path is a sequence of feature identifiers that stand for a corresponding sequence of indexing operations. In particular,
fstruct[(f1,f2,...,fn)]
is equivalent tofstruct[f1][f2]...[fn]
.Feature structures may contain reentrant feature structures. A “reentrant feature structure” is a single feature structure object that can be accessed via multiple feature paths. Feature structures may also be cyclic. A feature structure is “cyclic” if there is any feature path from the feature structure to itself.
Two feature structures are considered equal if they assign the same values to all features, and have the same reentrancies.
By default, feature structures are mutable. They may be made immutable with the
freeze()
method. Once they have been frozen, they may be hashed, and thus used as dictionary keys.- static __new__(cls, features=None, **morefeatures)[source]¶
Construct and return a new feature structure. If this constructor is called directly, then the returned feature structure will be an instance of either the
FeatDict
class or theFeatList
class.- Parameters:
features –
The initial feature values for this feature structure:
FeatStruct(string) -> FeatStructReader().read(string)
FeatStruct(mapping) -> FeatDict(mapping)
FeatStruct(sequence) -> FeatList(sequence)
FeatStruct() -> FeatDict()
morefeatures – If
features
is a mapping or None, thenmorefeatures
provides additional features for theFeatDict
constructor.
- copy(deep=True)[source]¶
Return a new copy of
self
. The new copy will not be frozen.- Parameters:
deep – If true, create a deep copy; if false, create a shallow copy.
- equal_values(other, check_reentrance=False)[source]¶
Return True if
self
andother
assign the same value to to every feature. In particular, return true ifself[p]==other[p]
for every feature path p such thatself[p]
orother[p]
is a base value (i.e., not a nested feature structure).- Parameters:
check_reentrance – If True, then also return False if there is any difference between the reentrances of
self
andother
.- Note:
the
==
is equivalent toequal_values()
withcheck_reentrance=True
.
- freeze()[source]¶
Make this feature structure, and any feature structures it contains, immutable. Note: this method does not attempt to ‘freeze’ any feature value that is not a
FeatStruct
; it is recommended that you use only immutable feature values.
- frozen()[source]¶
Return True if this feature structure is immutable. Feature structures can be made immutable with the
freeze()
method. Immutable feature structures may not be made mutable again, but new mutable copies can be produced with thecopy()
method.
- remove_variables()[source]¶
Return the feature structure that is obtained by deleting any feature whose value is a
Variable
.- Return type:
- rename_variables(vars=None, used_vars=(), new_vars=None)[source]¶
- See:
nltk.featstruct.rename_variables()
- class nltk.featstruct.FeatStructReader[source]¶
Bases:
object
- VALUE_HANDLERS = [('read_fstruct_value', re.compile('\\s*(?:\\((\\d+)\\)\\s*)?(\\??[\\w-]+)?(\\[)')), ('read_var_value', re.compile('\\?[a-zA-Z_][a-zA-Z0-9_]*')), ('read_str_value', re.compile('[uU]?[rR]?([\'"])')), ('read_int_value', re.compile('-?\\d+')), ('read_sym_value', re.compile('[a-zA-Z_][a-zA-Z0-9_]*')), ('read_app_value', re.compile('<(app)\\((\\?[a-z][a-z]*)\\s*,\\s*(\\?[a-z][a-z]*)\\)>')), ('read_logic_value', re.compile('<(.*?)(?<!-)>')), ('read_set_value', re.compile('{')), ('read_tuple_value', re.compile('\\('))]¶
A table indicating how feature values should be processed. Each entry in the table is a pair (handler, regexp). The first entry with a matching regexp will have its handler called. Handlers should have the following signature:
def handler(s, position, reentrances, match): ...
and should return a tuple (value, position), where position is the string position where the value ended. (n.b.: order is important here!)
- __init__(features=(*slash*, *type*), fdict_class=<class 'nltk.featstruct.FeatStruct'>, flist_class=<class 'nltk.featstruct.FeatList'>, logic_parser=None)[source]¶
- fromstring(s, fstruct=None)[source]¶
Convert a string representation of a feature structure (as displayed by repr) into a
FeatStruct
. This process imposes the following restrictions on the string representation:Feature names cannot contain any of the following: whitespace, parentheses, quote marks, equals signs, dashes, commas, and square brackets. Feature names may not begin with plus signs or minus signs.
Only the following basic feature value are supported: strings, integers, variables, None, and unquoted alphanumeric strings.
For reentrant values, the first mention must specify a reentrance identifier and a value; and any subsequent mentions must use arrows (
'->'
) to reference the reentrance identifier.
- read_partial(s, position=0, reentrances=None, fstruct=None)[source]¶
Helper function that reads in a feature structure.
- Parameters:
s – The string to read.
position – The position in the string to start parsing.
reentrances – A dictionary from reentrance ids to values. Defaults to an empty dictionary.
- Returns:
A tuple (val, pos) of the feature structure created by parsing and the position where the parsed feature structure ends.
- Return type:
bool
- class nltk.featstruct.Feature[source]¶
Bases:
object
A feature identifier that’s specialized to put additional constraints, default values, etc.
- property default¶
Default value for this feature.
- property display¶
Custom display location: can be prefix, or slash.
- property name¶
The name of this feature.
- class nltk.featstruct.RangeFeature[source]¶
Bases:
Feature
- RANGE_RE = re.compile('(-?\\d+):(-?\\d+)')¶
- nltk.featstruct.conflicts(fstruct1, fstruct2, trace=0)[source]¶
Return a list of the feature paths of all features which are assigned incompatible values by
fstruct1
andfstruct2
.- Return type:
list(tuple)
- nltk.featstruct.subsumes(fstruct1, fstruct2)[source]¶
Return True if
fstruct1
subsumesfstruct2
. I.e., return true if unifyingfstruct1
withfstruct2
would result in a feature structure equal tofstruct2.
- Return type:
bool
- nltk.featstruct.unify(fstruct1, fstruct2, bindings=None, trace=False, fail=None, rename_vars=True, fs_class='default')[source]¶
Unify
fstruct1
withfstruct2
, and return the resulting feature structure. This unified feature structure is the minimal feature structure that contains all feature value assignments from bothfstruct1
andfstruct2
, and that preserves all reentrancies.If no such feature structure exists (because
fstruct1
andfstruct2
specify incompatible values for some feature), then unification fails, andunify
returns None.Bound variables are replaced by their values. Aliased variables are replaced by their representative variable (if unbound) or the value of their representative variable (if bound). I.e., if variable v is in
bindings
, then v is replaced bybindings[v]
. This will be repeated until the variable is replaced by an unbound variable or a non-variable value.Unbound variables are bound when they are unified with values; and aliased when they are unified with variables. I.e., if variable v is not in
bindings
, and is unified with a variable or value x, thenbindings[v]
is set to x.If
bindings
is unspecified, then all variables are assumed to be unbound. I.e.,bindings
defaults to an empty dict.>>> from nltk.featstruct import FeatStruct >>> FeatStruct('[a=?x]').unify(FeatStruct('[b=?x]')) [a=?x, b=?x2]
- Parameters:
bindings (dict(Variable -> any)) – A set of variable bindings to be used and updated during unification.
trace (bool) – If true, generate trace output.
rename_vars (bool) – If True, then rename any variables in
fstruct2
that are also used infstruct1
, in order to avoid collisions on variable names.