nltk.metrics.paice module¶

Counts Paice’s performance statistics for evaluating stemming algorithms.

What is required:

A dictionary of words grouped by their real lemmas
A dictionary of words grouped by stems from a stemming algorithm

When these are given, Understemming Index (UI), Overstemming Index (OI), Stemming Weight (SW) and Error-rate relative to truncation (ERRT) are counted.

References: Chris D. Paice (1994). An evaluation method for stemming algorithms. In Proceedings of SIGIR, 42–50.

class nltk.metrics.paice.Paice[source]¶

Bases: object

Class for storing lemmas, stems and evaluation metrics.

__init__(lemmas, stems)[source]¶

Parameters:

lemmas (dict(str): list(str)) – A dictionary where keys are lemmas and values are sets or lists of words corresponding to that lemma.
stems (dict(str): set(str)) – A dictionary where keys are stems and values are sets or lists of words corresponding to that stem.

update()[source]¶: Update statistics after lemmas and stems have been set.

nltk.metrics.paice.demo()[source]¶: Demonstration of the module.

nltk.metrics.paice.get_words_from_dictionary(lemmas)[source]¶

Get original set of words used for analysis.

Parameters:: lemmas (dict(str): list(str)) – A dictionary where keys are lemmas and values are sets or lists of words corresponding to that lemma.
Returns:: Set of words that exist as values in the dictionary
Return type:: set(str)

NLTK

Documentation

nltk.metrics.paice module¶