nltk.toolbox.StandardFormat¶
- class nltk.toolbox.StandardFormat[source]¶
Bases:
object
Class for reading and processing standard format marker files and strings.
- open(sfm_file)[source]¶
Open a standard format marker file for sequential reading.
- Parameters
sfm_file (str) – name of the standard format marker input file
- open_string(s)[source]¶
Open a standard format marker string for sequential reading.
- Parameters
s (str) – string to parse as a standard format marker input file
- raw_fields()[source]¶
Return an iterator that returns the next field in a (marker, value) tuple. Linebreaks and trailing white space are preserved except for the final newline in each field.
- Return type
iter(tuple(str, str))
- fields(strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None)[source]¶
Return an iterator that returns the next field in a
(marker, value)
tuple, wheremarker
andvalue
are unicode strings if anencoding
was specified in thefields()
method. Otherwise they are non-unicode strings.- Parameters
strip (bool) – strip trailing whitespace from the last line of each field
unwrap (bool) – Convert newlines in a field to spaces.
encoding (str or None) – Name of an encoding to use. If it is specified then the
fields()
method returns unicode strings rather than non unicode strings.errors (str) – Error handling scheme for codec. Same as the
decode()
builtin string method.unicode_fields (sequence) – Set of marker names whose values are UTF-8 encoded. Ignored if encoding is None. If the whole file is UTF-8 encoded set
encoding='utf8'
and leaveunicode_fields
with its default value of None.
- Return type
iter(tuple(str, str))