nltk.classify.rte_classify module¶
Simple classifier for RTE corpus.
It calculates the overlap in words and named entities between text and hypothesis, and also whether there are words / named entities in the hypothesis which fail to occur in the text, since this is an indicator that the hypothesis is more informative than (i.e not entailed by) the text.
TO DO: better Named Entity classification TO DO: add lemmatization
- class nltk.classify.rte_classify.RTEFeatureExtractor[source]¶
Bases:
object
This builds a bag of words for both the text and the hypothesis after throwing away some stopwords, then calculates overlap and difference.
- __init__(rtepair, stop=True, use_lemmatize=False)[source]¶
- Parameters:
rtepair – a
RTEPair
from which features should be extractedstop (bool) – if
True
, stopwords are thrown away.