nltk.corpus.reader.opinion_lexicon module¶

CorpusReader for the Opinion Lexicon.

Opinion Lexicon information¶

Authors: Minqing Hu and Bing Liu, 2004.: Department of Computer Science University of Illinois at Chicago
Contact: Bing Liu, liub@cs.uic.edu: https://www.cs.uic.edu/~liub

Distributed with permission.

Related papers:

Minqing Hu and Bing Liu. “Mining and summarizing customer reviews”.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-04), Aug 22-25, 2004, Seattle, Washington, USA.
Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and
Comparing Opinions on the Web”. Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

class nltk.corpus.reader.opinion_lexicon.IgnoreReadmeCorpusView[source]¶

Bases: StreamBackedCorpusView

This CorpusView is used to skip the initial readme block of the corpus.

__init__(*args, **kwargs)[source]¶

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters:

fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.
startpos – The file position at which the view will start reading. This can be used to skip over preface sections.
encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).

class nltk.corpus.reader.opinion_lexicon.OpinionLexiconCorpusReader[source]¶

Bases: WordListCorpusReader

Reader for Liu and Hu opinion lexicon. Blank lines and readme are ignored.

>>> from nltk.corpus import opinion_lexicon
>>> opinion_lexicon.words()
['2-faced', '2-faces', 'abnormal', 'abolish', ...]

The OpinionLexiconCorpusReader provides shortcuts to retrieve positive/negative words:

>>> opinion_lexicon.negative()
['2-faced', '2-faces', 'abnormal', 'abolish', ...]

Note that words from words() method are sorted by file id, not alphabetically:

>>> opinion_lexicon.words()[0:10] 
['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably',
'abominate', 'abomination', 'abort', 'aborted']
>>> sorted(opinion_lexicon.words())[0:10] 
['2-faced', '2-faces', 'a+', 'abnormal', 'abolish', 'abominable', 'abominably',
'abominate', 'abomination', 'abort']

CorpusView¶: alias of IgnoreReadmeCorpusView

negative()[source]¶

Return all negative words in alphabetical order.

Returns:: a list of negative words.
Return type:: list(str)

positive()[source]¶

Return all positive words in alphabetical order.

Returns:: a list of positive words.
Return type:: list(str)

words(fileids=None)[source]¶

Return all words in the opinion lexicon. Note that these words are not sorted in alphabetical order.

Parameters:: fileids – a list or regexp specifying the ids of the files whose words have to be returned.
Returns:: the given file(s) as a list of words and punctuation symbols.
Return type:: list(str)

NLTK

Documentation

nltk.corpus.reader.opinion_lexicon module¶

Opinion Lexicon information¶