nltk.toolbox.StandardFormat¶

class nltk.toolbox.StandardFormat[source]¶

Bases: object

Class for reading and processing standard format marker files and strings.

__init__(filename=None, encoding=None)[source]¶

open(sfm_file)[source]¶

Open a standard format marker file for sequential reading.

Parameters: sfm_file (str) – name of the standard format marker input file

open_string(s)[source]¶

Open a standard format marker string for sequential reading.

Parameters: s (str) – string to parse as a standard format marker input file

raw_fields()[source]¶

Return an iterator that returns the next field in a (marker, value) tuple. Linebreaks and trailing white space are preserved except for the final newline in each field.

Return type: iter(tuple(str, str))

fields(strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None)[source]¶

Return an iterator that returns the next field in a (marker, value) tuple, where marker and value are unicode strings if an encoding was specified in the fields() method. Otherwise they are non-unicode strings.

Parameters

strip (bool) – strip trailing whitespace from the last line of each field
unwrap (bool) – Convert newlines in a field to spaces.
encoding (str or None) – Name of an encoding to use. If it is specified then the fields() method returns unicode strings rather than non unicode strings.
errors (str) – Error handling scheme for codec. Same as the decode() builtin string method.
unicode_fields (sequence) – Set of marker names whose values are UTF-8 encoded. Ignored if encoding is None. If the whole file is UTF-8 encoded set encoding='utf8' and leave unicode_fields with its default value of None.

Return type

iter(tuple(str, str))

close()[source]¶: Close a previously opened standard format marker file or string.

NLTK

Documentation

nltk.toolbox.StandardFormat¶