nltk.corpus.reader.comparative_sents module¶

CorpusReader for the Comparative Sentence Dataset.

Comparative Sentence Dataset information -

Annotated by: Nitin Jindal and Bing Liu, 2006.: Department of Computer Sicence University of Illinois at Chicago
Contact: Nitin Jindal, njindal@cs.uic.edu: Bing Liu, liub@cs.uic.edu (https://www.cs.uic.edu/~liub)

Distributed with permission.

Related papers:

Nitin Jindal and Bing Liu. “Identifying Comparative Sentences in Text Documents”.
Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR-06), 2006.
Nitin Jindal and Bing Liu. “Mining Comprative Sentences and Relations”.
Proceedings of Twenty First National Conference on Artificial Intelligence (AAAI-2006), 2006.
Murthy Ganapathibhotla and Bing Liu. “Mining Opinions in Comparative Sentences”.
Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008.

class nltk.corpus.reader.comparative_sents.ComparativeSentencesCorpusReader[source]¶

Bases: CorpusReader

Reader for the Comparative Sentence Dataset by Jindal and Liu (2006).

>>> from nltk.corpus import comparative_sentences
>>> comparison = comparative_sentences.comparisons()[0]
>>> comparison.text 
['its', 'fast-forward', 'and', 'rewind', 'work', 'much', 'more', 'smoothly',
'and', 'consistently', 'than', 'those', 'of', 'other', 'models', 'i', "'ve",
'had', '.']
>>> comparison.entity_2
'models'
>>> (comparison.feature, comparison.keyword)
('rewind', 'more')
>>> len(comparative_sentences.comparisons())
853

CorpusView¶: alias of StreamBackedCorpusView

__init__(root, fileids, word_tokenizer=WhitespaceTokenizer(pattern='\\s+', gaps=True, discard_empty=True, flags=re.UNICODE | re.MULTILINE | re.DOTALL), sent_tokenizer=None, encoding='utf8')[source]¶

Parameters:

root – The root directory for this corpus.
fileids – a list or regexp specifying the fileids in this corpus.
word_tokenizer – tokenizer for breaking sentences or paragraphs into words. Default: WhitespaceTokenizer
sent_tokenizer – tokenizer for breaking paragraphs into sentences.
encoding – the encoding that should be used to read the corpus.

comparisons(fileids=None)[source]¶

Return all comparisons in the corpus.

Parameters:: fileids – a list or regexp specifying the ids of the files whose comparisons have to be returned.
Returns:: the given file(s) as a list of Comparison objects.
Return type:: list(Comparison)

keywords(fileids=None)[source]¶

Return a set of all keywords used in the corpus.

Parameters:: fileids – a list or regexp specifying the ids of the files whose keywords have to be returned.
Returns:: the set of keywords and comparative phrases used in the corpus.
Return type:: set(str)

keywords_readme()[source]¶: Return the list of words and constituents considered as clues of a comparison (from listOfkeywords.txt).

sents(fileids=None)[source]¶

Return all sentences in the corpus.

Parameters:: fileids – a list or regexp specifying the ids of the files whose sentences have to be returned.
Returns:: all sentences of the corpus as lists of tokens (or as plain strings, if no word tokenizer is specified).
Return type:: list(list(str)) or list(str)

words(fileids=None)[source]¶

Return all words and punctuation symbols in the corpus.

Parameters:: fileids – a list or regexp specifying the ids of the files whose words have to be returned.
Returns:: the given file(s) as a list of words and punctuation symbols.
Return type:: list(str)

class nltk.corpus.reader.comparative_sents.Comparison[source]¶

Bases: object

A Comparison represents a comparative sentence and its constituents.

__init__(text=None, comp_type=None, entity_1=None, entity_2=None, feature=None, keyword=None)[source]¶

Parameters:

text – a string (optionally tokenized) containing a comparison.
comp_type – an integer defining the type of comparison expressed. Values can be: 1 (Non-equal gradable), 2 (Equative), 3 (Superlative), 4 (Non-gradable).
entity_1 – the first entity considered in the comparison relation.
entity_2 – the second entity considered in the comparison relation.
feature – the feature considered in the comparison relation.
keyword – the word or phrase which is used for that comparative relation.

NLTK

Documentation

nltk.corpus.reader.comparative_sents module¶