nltk.chunk.util module

class nltk.chunk.util.ChunkScore[source]

Bases: object

A utility class for scoring chunk parsers. ChunkScore can evaluate a chunk parser’s output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time.

Texts are evaluated with the score method. The results of evaluation can be accessed via a number of accessor methods, such as precision and f_measure. A typical use of the ChunkScore class is:

>>> chunkscore = ChunkScore()           
>>> for correct in correct_sentences:   
...     guess = chunkparser.parse(correct.leaves())   
...     chunkscore.score(correct, guess)              
>>> print('F Measure:', chunkscore.f_measure())       
F Measure: 0.823
Variables
  • kwargs

    Keyword arguments:

    • max_tp_examples: The maximum number actual examples of true positives to record. This affects the correct member function: correct will not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)

    • max_fp_examples: The maximum number actual examples of false positives to record. This affects the incorrect member function and the guessed member function: incorrect will not return more than this number of examples, and guessed will not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)

    • max_fn_examples: The maximum number actual examples of false negatives to record. This affects the missed member function and the correct member function: missed will not return more than this number of examples, and correct will not return more than this number of true negative examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)

    • chunk_label: A regular expression indicating which chunks should be compared. Defaults to '.*' (i.e., all chunks).

  • _tp – List of true positives

  • _fp – List of false positives

  • _fn – List of false negatives

  • _tp_num – Number of true positives

  • _fp_num – Number of false positives

  • _fn_num – Number of false negatives.

__init__(**kwargs)[source]
accuracy()[source]

Return the overall tag-based accuracy for all text that have been scored by this ChunkScore, using the IOB (conll2000) tag encoding.

Return type

float

correct()[source]

Return the chunks which were included in the correct chunk structures, listed in input order.

Return type

list of chunks

f_measure(alpha=0.5)[source]

Return the overall F measure for all texts that have been scored by this ChunkScore.

Parameters

alpha (float) – the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value. alpha should have a value in the range [0,1].

Return type

float

guessed()[source]

Return the chunks which were included in the guessed chunk structures, listed in input order.

Return type

list of chunks

incorrect()[source]

Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order.

Return type

list of chunks

missed()[source]

Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order.

Return type

list of chunks

precision()[source]

Return the overall precision for all texts that have been scored by this ChunkScore.

Return type

float

recall()[source]

Return the overall recall for all texts that have been scored by this ChunkScore.

Return type

float

score(correct, guessed)[source]

Given a correctly chunked sentence, score another chunked version of the same sentence.

Parameters
  • correct (chunk structure) – The known-correct (“gold standard”) chunked sentence.

  • guessed (chunk structure) – The chunked sentence to be scored.

nltk.chunk.util.accuracy(chunker, gold)[source]

Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.

Parameters
  • chunker (ChunkParserI) – The chunker being evaluated.

  • gold (tree) – The chunk structures to score the chunker on.

Return type

float

nltk.chunk.util.conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), root_label='S')[source]

Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

Parameters
  • s (str) – The CoNLL string to be converted.

  • chunk_types (tuple) – The chunk types to be converted.

  • root_label (str) – The node label to use for the root.

Return type

Tree

nltk.chunk.util.conlltags2tree(sentence, chunk_types=('NP', 'PP', 'VP'), root_label='S', strict=False)[source]

Convert the CoNLL IOB format to a tree.

nltk.chunk.util.demo()[source]
nltk.chunk.util.ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE'], root_label='S')[source]

Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Return type

Tree

nltk.chunk.util.tagstr2tree(s, chunk_label='NP', root_label='S', sep='/', source_tagset=None, target_tagset=None)[source]

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

Parameters
  • s (str) – The string to be converted

  • chunk_label (str) – The label to use for chunk nodes

  • root_label (str) – The label to use for the root of the tree

Return type

Tree

nltk.chunk.util.tree2conllstr(t)[source]

Return a multiline string where each line contains a word, tag and IOB tag. Convert a tree to the CoNLL IOB string format

Parameters

t (Tree) – The tree to be converted.

Return type

str

nltk.chunk.util.tree2conlltags(t)[source]

Return a list of 3-tuples containing (word, tag, IOB-tag). Convert a tree to the CoNLL IOB tag format.

Parameters

t (Tree) – The tree to be converted.

Return type

list(tuple)