nltk.chunk.util module¶
- class nltk.chunk.util.ChunkScore[source]¶
Bases:
objectA utility class for scoring chunk parsers.
ChunkScorecan evaluate a chunk parser’s output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time.Texts are evaluated with the
scoremethod. The results of evaluation can be accessed via a number of accessor methods, such asprecisionandf_measure. A typical use of theChunkScoreclass is:>>> chunkscore = ChunkScore() >>> for correct in correct_sentences: ... guess = chunkparser.parse(correct.leaves()) ... chunkscore.score(correct, guess) >>> print('F Measure:', chunkscore.f_measure()) F Measure: 0.823
- Variables:
kwargs –
Keyword arguments:
max_tp_examples: The maximum number actual examples of true positives to record. This affects the
correctmember function:correctwill not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)max_fp_examples: The maximum number actual examples of false positives to record. This affects the
incorrectmember function and theguessedmember function:incorrectwill not return more than this number of examples, andguessedwill not return more than this number of true positive examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)max_fn_examples: The maximum number actual examples of false negatives to record. This affects the
missedmember function and thecorrectmember function:missedwill not return more than this number of examples, andcorrectwill not return more than this number of true negative examples. This does not affect any of the numerical metrics (precision, recall, or f-measure)chunk_label: A regular expression indicating which chunks should be compared. Defaults to
'.*'(i.e., all chunks).
_tp – List of true positives
_fp – List of false positives
_fn – List of false negatives
_tp_num – Number of true positives
_fp_num – Number of false positives
_fn_num – Number of false negatives.
- accuracy()[source]¶
Return the overall tag-based accuracy for all text that have been scored by this
ChunkScore, using the IOB (conll2000) tag encoding.- Return type:
float
- correct()[source]¶
Return the chunks which were included in the correct chunk structures, listed in input order.
- Return type:
list of chunks
- f_measure(alpha=0.5)[source]¶
Return the overall F measure for all texts that have been scored by this
ChunkScore.- Parameters:
alpha (float) – the relative weighting of precision and recall. Larger alpha biases the score towards the precision value, while smaller alpha biases the score towards the recall value.
alphashould have a value in the range [0,1].- Return type:
float
- guessed()[source]¶
Return the chunks which were included in the guessed chunk structures, listed in input order.
- Return type:
list of chunks
- incorrect()[source]¶
Return the chunks which were included in the guessed chunk structures, but not in the correct chunk structures, listed in input order.
- Return type:
list of chunks
- missed()[source]¶
Return the chunks which were included in the correct chunk structures, but not in the guessed chunk structures, listed in input order.
- Return type:
list of chunks
- precision()[source]¶
Return the overall precision for all texts that have been scored by this
ChunkScore.- Return type:
float
- nltk.chunk.util.accuracy(chunker, gold)[source]¶
Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.
- Parameters:
chunker (ChunkParserI) – The chunker being evaluated.
gold (tree) – The chunk structures to score the chunker on.
- Return type:
float
- nltk.chunk.util.conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), root_label='S')[source]¶
Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).
- Parameters:
s (str) – The CoNLL string to be converted.
chunk_types (tuple) – The chunk types to be converted.
root_label (str) – The node label to use for the root.
- Return type:
- nltk.chunk.util.conlltags2tree(sentence, chunk_types=('NP', 'PP', 'VP'), root_label='S', strict=False)[source]¶
Convert the CoNLL IOB format to a tree.
- nltk.chunk.util.ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE'], root_label='S')[source]¶
Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.
- Return type:
- nltk.chunk.util.tagstr2tree(s, chunk_label='NP', root_label='S', sep='/', source_tagset=None, target_tagset=None)[source]¶
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets (
[...]). Words are delimited by whitespace, and each word should have the formtext/tag. Words that do not contain a slash are assigned atagof None.- Parameters:
s (str) – The string to be converted
chunk_label (str) – The label to use for chunk nodes
root_label (str) – The label to use for the root of the tree
- Return type: