nltk.metrics.paice module¶
Counts Paice’s performance statistics for evaluating stemming algorithms.
- What is required:
A dictionary of words grouped by their real lemmas
A dictionary of words grouped by stems from a stemming algorithm
When these are given, Understemming Index (UI), Overstemming Index (OI), Stemming Weight (SW) and Error-rate relative to truncation (ERRT) are counted.
References: Chris D. Paice (1994). An evaluation method for stemming algorithms. In Proceedings of SIGIR, 42–50.
- class nltk.metrics.paice.Paice[source]¶
Bases:
object
Class for storing lemmas, stems and evaluation metrics.
- __init__(lemmas, stems)[source]¶
- Parameters:
lemmas (dict(str): list(str)) – A dictionary where keys are lemmas and values are sets or lists of words corresponding to that lemma.
stems (dict(str): set(str)) – A dictionary where keys are stems and values are sets or lists of words corresponding to that stem.
- nltk.metrics.paice.get_words_from_dictionary(lemmas)[source]¶
Get original set of words used for analysis.
- Parameters:
lemmas (dict(str): list(str)) – A dictionary where keys are lemmas and values are sets or lists of words corresponding to that lemma.
- Returns:
Set of words that exist as values in the dictionary
- Return type:
set(str)