nltk.corpus.reader.switchboard module¶

class nltk.corpus.reader.switchboard.SwitchboardCorpusReader[source]¶

Bases: CorpusReader

__init__(root, tagset=None)[source]¶

Parameters:

root (PathPointer or str) – A path pointer identifying the root directory for this corpus. If a string is specified, then it will be converted to a PathPointer automatically.
fileids – A list of the files that make up this corpus. This list can either be specified explicitly, as a list of strings; or implicitly, as a regular expression over file paths. The absolute path for each file will be constructed by joining the reader’s root to each file name.
encoding –
The default unicode encoding for the files that make up the corpus. The value of encoding can be any of the following:
- A string: encoding is the encoding name for all files.
- A dictionary: encoding[file_id] is the encoding name for the file whose identifier is file_id. If file_id is not in encoding, then the file contents will be processed using non-unicode byte strings.
- A list: encoding should be a list of (regexp, encoding) tuples. The encoding for a file whose identifier is file_id will be the encoding value for the first tuple whose regexp matches the file_id. If no tuple’s regexp matches the file_id, the file contents will be processed using non-unicode byte strings.
- None: the file contents of all files will be processed using non-unicode byte strings.
tagset – The name of the tagset used by this corpus, to be used for normalizing or converting the POS tags returned by the tagged_...() methods.

discourses()[source]¶

tagged_discourses(tagset=False)[source]¶

tagged_turns(tagset=None)[source]¶

tagged_words(tagset=None)[source]¶

turns()[source]¶

words()[source]¶

class nltk.corpus.reader.switchboard.SwitchboardTurn[source]¶

Bases: list

A specialized list object used to encode switchboard utterances. The elements of the list are the words in the utterance; and two attributes, speaker and id, are provided to retrieve the spearker identifier and utterance id. Note that utterance ids are only unique within a given discourse.

__init__(words, speaker, id)[source]¶

NLTK

Documentation

nltk.corpus.reader.switchboard module¶