nltk.corpus.reader.knbc module¶

class nltk.corpus.reader.knbc.KNBCorpusReader[source]¶

Bases: SyntaxCorpusReader

This class implements:

__init__, which specifies the location of the corpus and a method for detecting the sentence blocks in corpus files.
_read_block, which reads a block from the input stream.
_word, which takes a block and returns a list of list of words.
_tag, which takes a block and returns a list of list of tagged words.
_parse, which takes a block and returns a list of parsed sentences.

The structure of tagged words:

tagged_word = (word(str), tags(tuple)) tags = (surface, reading, lemma, pos1, posid1, pos2, posid2, pos3, posid3, others …)

Usage example

>>> from nltk.corpus.util import LazyCorpusLoader
>>> knbc = LazyCorpusLoader(
...     'knbc/corpus1',
...     KNBCorpusReader,
...     r'.*/KN.*',
...     encoding='euc-jp',
... )

>>> len(knbc.sents()[0])
9

__init__(root, fileids, encoding='utf8', morphs2str=<function <lambda>>)[source]¶: Initialize KNBCorpusReader morphs2str is a function to convert morphlist to str for tree representation for _parse()

nltk.corpus.reader.knbc.demo()[source]¶

nltk.corpus.reader.knbc.test()[source]¶

NLTK

Documentation

nltk.corpus.reader.knbc module¶