nltk.chunk.util module¶
- class nltk.chunk.util.ChunkScore[source]¶
- Bases: - object- A utility class for scoring chunk parsers. - ChunkScorecan evaluate a chunk parser’s output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time.- Texts are evaluated with the - scoremethod. The results of evaluation can be accessed via a number of accessor methods, such as- precisionand- f_measure. A typical use of the- ChunkScoreclass is:- >>> chunkscore = ChunkScore() >>> for correct in correct_sentences: ... guess = chunkparser.parse(correct.leaves()) ... chunkscore.score(correct, guess) >>> print('F Measure:', chunkscore.f_measure()) F Measure: 0.823 - Variables:
- kwargs – - Keyword arguments: - max_tp_examples: The maximum number actual examples of true positives to record. This affects the - correctmember function:- correctwill not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)
- max_fp_examples: The maximum number actual examples of false positives to record. This affects the - incorrectmember function and the- guessedmember function:- incorrectwill not return more than this number of examples, and- guessedwill not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)
- max_fn_examples: The maximum number actual examples of false negatives to record. This affects the - missedmember function and the- correctmember function:- missedwill not return more than this number of examples, and- correctwill not return more than this number of true negative examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)
- chunk_label: A regular expression indicating which chunks should be compared. Defaults to - '.*'(i.e., all chunks).
 
- _tp – List of true positives 
- _fp – List of false positives 
- _fn – List of false negatives 
- _tp_num – Number of true positives 
- _fp_num – Number of false positives 
- _fn_num – Number of false negatives. 
 
 - accuracy()[source]¶
- Return the overall tag-based accuracy for all text that have been scored by this - ChunkScore, using the IOB (conll2000) tag encoding.- Return type:
- float 
 
 - correct()[source]¶
- Return the chunks which were included in the correct chunk structures, listed in input order. - Return type:
- list of chunks 
 
 - f_measure(alpha=0.5)[source]¶
- Return the overall F measure for all texts that have been scored by this - ChunkScore.- Parameters:
- alpha (float) – the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value. - alphashould have a value in the range [0,1].
- Return type:
- float 
 
 - guessed()[source]¶
- Return the chunks which were included in the guessed chunk structures, listed in input order. - Return type:
- list of chunks 
 
 - incorrect()[source]¶
- Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order. - Return type:
- list of chunks 
 
 - missed()[source]¶
- Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order. - Return type:
- list of chunks 
 
 - precision()[source]¶
- Return the overall precision for all texts that have been scored by this - ChunkScore.- Return type:
- float 
 
 
- nltk.chunk.util.accuracy(chunker, gold)[source]¶
- Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score. - Parameters:
- chunker (ChunkParserI) – The chunker being evaluated. 
- gold (tree) – The chunk structures to score the chunker on. 
 
- Return type:
- float 
 
- nltk.chunk.util.conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), root_label='S')[source]¶
- Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default). - Parameters:
- s (str) – The CoNLL string to be converted. 
- chunk_types (tuple) – The chunk types to be converted. 
- root_label (str) – The node label to use for the root. 
 
- Return type:
 
- nltk.chunk.util.conlltags2tree(sentence, chunk_types=('NP', 'PP', 'VP'), root_label='S', strict=False)[source]¶
- Convert the CoNLL IOB format to a tree. 
- nltk.chunk.util.ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE'], root_label='S')[source]¶
- Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE. - Return type:
 
- nltk.chunk.util.tagstr2tree(s, chunk_label='NP', root_label='S', sep='/', source_tagset=None, target_tagset=None)[source]¶
- Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ( - [...]). Words are delimited by whitespace, and each word should have the form- text/tag. Words that do not contain a slash are assigned a- tagof None.- Parameters:
- s (str) – The string to be converted 
- chunk_label (str) – The label to use for chunk nodes 
- root_label (str) – The label to use for the root of the tree 
 
- Return type: