nltk.toolbox module¶

Module for reading, writing and manipulating Toolbox databases and settings files.

class nltk.toolbox.StandardFormat[source]¶

Bases: object

Class for reading and processing standard format marker files and strings.

__init__(filename=None, encoding=None)[source]¶

close()[source]¶: Close a previously opened standard format marker file or string.

fields(strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None)[source]¶

Return an iterator that returns the next field in a (marker, value) tuple, where marker and value are unicode strings if an encoding was specified in the fields() method. Otherwise they are non-unicode strings.

Parameters:

strip (bool) – strip trailing whitespace from the last line of each field
unwrap (bool) – Convert newlines in a field to spaces.
encoding (str or None) – Name of an encoding to use. If it is specified then the fields() method returns unicode strings rather than non unicode strings.
errors (str) – Error handling scheme for codec. Same as the decode() builtin string method.
unicode_fields (sequence) – Set of marker names whose values are UTF-8 encoded. Ignored if encoding is None. If the whole file is UTF-8 encoded set encoding='utf8' and leave unicode_fields with its default value of None.

Return type:

iter(tuple(str, str))

open(sfm_file)[source]¶

Open a standard format marker file for sequential reading.

Parameters:: sfm_file (str) – name of the standard format marker input file

open_string(s)[source]¶

Open a standard format marker string for sequential reading.

Parameters:: s (str) – string to parse as a standard format marker input file

raw_fields()[source]¶

Return an iterator that returns the next field in a (marker, value) tuple. Linebreaks and trailing white space are preserved except for the final newline in each field.

Return type:: iter(tuple(str, str))

class nltk.toolbox.ToolboxData[source]¶

Bases: StandardFormat

parse(grammar=None, **kwargs)[source]¶

class nltk.toolbox.ToolboxSettings[source]¶

Bases: StandardFormat

This class is the base class for settings files.

__init__()[source]¶

parse(encoding=None, errors='strict', **kwargs)[source]¶

Return the contents of toolbox settings file with a nested structure.

Parameters:

encoding (str) – encoding used by settings file
errors (str) – Error handling scheme for codec. Same as decode() builtin method.
kwargs (dict) – Keyword arguments passed to StandardFormat.fields()

Return type:

ElementTree._ElementInterface

nltk.toolbox.add_blank_lines(tree, blanks_before, blanks_between)[source]¶

Add blank lines before all elements and subelements specified in blank_before.

Parameters:

elem (ElementTree._ElementInterface) – toolbox data in an elementtree structure
blank_before (dict(tuple)) – elements and subelements to add blank lines before

nltk.toolbox.add_default_fields(elem, default_fields)[source]¶

Add blank elements and subelements specified in default_fields.

Parameters:

elem (ElementTree._ElementInterface) – toolbox data in an elementtree structure
default_fields (dict(tuple)) – fields to add to each type of element and subelement

nltk.toolbox.demo()[source]¶

nltk.toolbox.remove_blanks(elem)[source]¶

Remove all elements and subelements with no text and no child elements.

Parameters:: elem (ElementTree._ElementInterface) – toolbox data in an elementtree structure

nltk.toolbox.sort_fields(elem, field_orders)[source]¶

Sort the elements and subelements in order specified in field_orders.

Parameters:

elem (ElementTree._ElementInterface) – toolbox data in an elementtree structure
field_orders (dict(tuple)) – order of fields for each type of element and subelement

nltk.toolbox.to_settings_string(tree, encoding=None, errors='strict', unicode_fields=None)[source]¶

nltk.toolbox.to_sfm_string(tree, encoding=None, errors='strict', unicode_fields=None)[source]¶

Return a string with a standard format representation of the toolbox data in tree (tree can be a toolbox database or a single record).

Parameters:

tree (ElementTree._ElementInterface) – flat representation of toolbox data (whole database or single record)
encoding (str) – Name of an encoding to use.
errors (str) – Error handling scheme for codec. Same as the encode() builtin string method.
unicode_fields (dict(str) or set(str))

Return type:

str

NLTK

Documentation

nltk.toolbox module¶