Opinion Mining / SePL (Sentiment Phrase List)

SePL (Sentiment Phrase List)

Sentiment Phrase List (SePL) is a generated list of opinion bearing words and phrases. The list is currently available for the German language. A list for the English language is process of planning.

The list contains in version 1.0 adjectives and nouns as well as adjective- and noun-based phrases and their opinion values on a continuous range between −1.00 and +1.00. For each word or phrase two additional quality measures are given. The list was produced using a large number of product review titles providing a textual assessment and numerical star ratings.

The phrases were all lemmatized (reduced to their base form). The phrases "kleiner", "kleines" and "kleine" were reduced to "klein", the phrase "große Hilfe" to "groß Hilfe".

In the actual version 1.1 the list was extended with verbs as well as verb-based phrases. Beside the product review titles we now use the review text as well. So our list was extended to more than 14,000 phrases. Furthermore there was made a manual correction to handle "outlier". Thereby the opinion value was examined and corrected if necessary. So a corrected phrase can be neutral (0.00), weak evaluative (±0.40) or strong evaluative (±0.80). The corrected phrases are marked through an new column in the list.

The procedure for generating the list is described in detailed in the publications A Generic Approach to Generate Opinion Lists of Phrases for Opinion Mining Applications and A Phrase-Based Opinion List for the German Language.

SePL can be requested using the formular. A sample of the German list can be downloaded here: 

Following table shows the structure of the list:

Phrase Opinion value Standard deviation Standard error Type Correction
einfach gut 0.93 0.19 0.01 a  
großartig 0.95 0.23 0.01 a  
sehr gut 0.90 0.22 0.00 a  
nur Schrott -0.85 0.52 0.10 n  
nur schlecht -0.97 0.16 0.01 a  
bayrisch 0.00 0.00 0.00 a m
  • Phrase - One or more words which express an opinion.
  • Opinion value - Opinion value between -1.0 (very negative) and 1,0 (very positive).
  • Standard deviation - A low value indicates that the word / phrase is used very consistently, almost always either positive, neutral or negative.
  • Standard error - A low value indicates that the word is used very consistently and that the opinion value is based on a very large number of reviews.
  • Type - a=adjective-, n=nouns-based phrase
  • Manual Correction - m=manual corrected
  • For more information see the above-named publications