nltk.corpus.reader.opinion_lexicon module

CorpusReader for the Opinion Lexicon.

Opinion Lexicon information

Authors: Minqing Hu and Bing Liu, 2004.

Department of Computer Science University of Illinois at Chicago

Contact: Bing Liu, liub@cs.uic.edu

https://www.cs.uic.edu/~liub

Distributed with permission.

Related papers:

  • Minqing Hu and Bing Liu. “Mining and summarizing customer reviews”.

    Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-04), Aug 22-25, 2004, Seattle, Washington, USA.

  • Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and

    Comparing Opinions on the Web”. Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

class nltk.corpus.reader.opinion_lexicon.IgnoreReadmeCorpusView[source]

Bases: StreamBackedCorpusView

This CorpusView is used to skip the initial readme block of the corpus.

__init__(*args, **kwargs)[source]

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters
  • fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.

  • startpos – The file position at which the view will start reading. This can be used to skip over preface sections.

  • encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).

class nltk.corpus.reader.opinion_lexicon.OpinionLexiconCorpusReader[source]

Bases: WordListCorpusReader

Reader for Liu and Hu opinion lexicon. Blank lines and readme are ignored.

>>> from nltk.corpus import opinion_lexicon
>>> opinion_lexicon.words()
['2-faced', '2-faces', 'abnormal', 'abolish', ...]

The OpinionLexiconCorpusReader provides shortcuts to retrieve positive/negative words:

>>> opinion_lexicon.negative()
['2-faced', '2-faces', 'abnormal', 'abolish', ...]

Note that words from words() method are sorted by file id, not alphabetically:

>>> opinion_lexicon.words()[0:10] 
['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably',
'abominate', 'abomination', 'abort', 'aborted']
>>> sorted(opinion_lexicon.words())[0:10] 
['2-faced', '2-faces', 'a+', 'abnormal', 'abolish', 'abominable', 'abominably',
'abominate', 'abomination', 'abort']
CorpusView

alias of IgnoreReadmeCorpusView

negative()[source]

Return all negative words in alphabetical order.

Returns

a list of negative words.

Return type

list(str)

positive()[source]

Return all positive words in alphabetical order.

Returns

a list of positive words.

Return type

list(str)

words(fileids=None)[source]

Return all words in the opinion lexicon. Note that these words are not sorted in alphabetical order.

Parameters

fileids – a list or regexp specifying the ids of the files whose words have to be returned.

Returns

the given file(s) as a list of words and punctuation symbols.

Return type

list(str)