IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
Similaire à IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
Trilinos progress, challenges and future plansM Reza Rahmati
Similaire à IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis (20)
Engler and Prantl system of classification in plant taxonomy
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
1. IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon:
Combining Domain Dependency and Distributional Semantics
Features for Aspect Based Sentiment Analysis
Ayush Kumar1, Sarah Kohail2, Amit Kumar1, Asif Ekbal1, Chris Biemann2
1IIT Patna, India 2TU Darmstadt, Germany
Presented by:
Alexander Panchenko, TU Darmstadt, Germany
2. Motivation
People write blog posts, comments, reviews, tweets, etc.
Attitudes, feelings, emotions, opinions, etc.
Mining and summarizing opinions/sentiment from text about
specific entities and their aspects can help:
Organizations to monitor their reputation and products.
Customers to make a decision or choose among multiple options.
2
4. SemEval-Task 5: Aspect-Based Sentiment analysis (ABSA)
Aspect Based Sentiment Analysis (ABSA) task analysis performs a
fine-grained sentiment analysis by addressing three slots:
1. Aspect Category Detection: Identifying the entity#attribute that is referred
to by the aspect. E and A should be chosen from predefined inventories of
entity types (e.g. LAPTOP, MOUSE, RESTAURANT, FOOD) and attribute
labels (e.g. DESIGN, PRICE, QUALITY).
2. Opinion Target (OT) Extraction: Extracting aspects, given a set of
sentences with pre-identified entities (e.g., restaurants), identify the aspect
terms “opinion target” from the review text which present in the sentence.
3. Sentiment Polarity Classification: Each identified Entity#Attribute, OT
tuple has to be assigned one of the following polarity labels: positive,
negative, or neutral.
4
5. Our Submission
We participated in Slot 1 (aspect category detection) and Slot 3
(sentiment polarity classification) for 7 languages and 4 different
domains.
We also conducted experiments for Slot 2 (opinion target
extraction) for 4 languages in restaurants domain.
Overall, we submitted 29 runs, covering 7 languages (English,
Spanish, Dutch, French, Turkish, Russian and Arabic) and 4
different domains (laptop, restaurants, phones, hotels).
5
6. Experimental Setup: Supervised Models
For Slot 1 and Slot 3, we use supervised classification using
Support Vector Machine (SVM) with the linear kernel.
For Slot 2, we use linear-chain Conditional Random Field
(CRF) with default parameters.
We perform 5-fold cross-validation on the training set to
evaluate the performance.
6
7. Feature Extraction: Preprocessing
Normalize digits to ‘num’ and remove stop words for tf-idf
computation.
For English, we use Stanford tools to tokenize, parse and
extract lemma, Part-of-Speech (PoS) and named entity (NE)
information.
For the other languages, we use taggers and dependency
parsers based on Universal Dependencies (UD).
7
8. Contribution I: Lexicon Expansion based on DT
8
1. Based on the notion of distributional thesaurus (DTs), we expand
existing lexical resources to reach a higher coverage of
sentiment lexicons and improve the extraction of rare/unseen
aspect words.
Examples of DT expansions
Token DT Expansion
good bad, excellent, decent, great
powerful potential, influential, strong, sophisticated
small tiny, large, sized, huge, sizable
efficient reliable, effective, energy-efficient, flexible
11. Contribution I: Lexicon Expansion based on DT
11
Expansion statistics for induced lexicons.
Common entries denote the number of words which are present
both in the seed lexicon and the induced lexicon
12. Contribution II: DDGs for Aspect Category Detection
12
processor .
.
.
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
.
.
.
d1
d
2
dn
fast
good
amod(processor, fast)
amod(processor, good)
conj(good, fast)
amod(processor, fast)
#
amod(processor, fast)
24
amod(processor, good)
13
conj(good, fast)
19
amod, 24
amod, 13
conj_and, 19
1. detect topics
underlying a
mixed-domain
dataset using
topic modeling.
2. Aggregate individual dependency relations between domain-specific
content words, weigh them with tf-idf and select the highest-ranked
words and their dependency relations.
13. Contribution II: DDGs for Aspect Category Detection
13
processor
fast
good
amod, 2
amod, 1
#
amod(processor, fast) 2
amod(processor, good) 1
conj(good, fast) 1
#
amod(processor, fast) 2
amod(processor, good) 1
3. Resulting graphs were filtered and only ‘amod’ (adjective modifying a noun) and
‘nsubj’ (nominal subjects of predicates) relations were selected.
4. For each extracted aspect from the
opinion-aspect pairs, we determine the
existence or absence of this aspect
using a binary feature.
14. Aspect Category Detection: Slot 1
Features:
Aspect list produced by Domain Dependency Graphs (DDG). (0/1)
Top 10 DTs expansions for every 5 five words based on tfidf score in
each aspect category (for example: ‘overpriced’, ‘$’, ‘pricey’, ‘cheap’,
‘expensive’ are the most significant terms in ‘food#price’ category).
(0/1)
Bag of Words. (freq)
14
15. Opinion Term “OT” Extraction Features: Slot 2
15
Features:
PoS context [-2..2]
Word and Local Context [-5..5]
5 DT expansions of current token
Expansion Score
Prefix and Suffix up to 4 characters
Noun phrase head word and its PoS
Character N-grams
Presence of adjective modifier dependency relations
Orthographic features (starts with capital letter)
Is frequent aspect?
Additional features for English:
WordNet (4 noun synsets of current
token)
NE information
Chunk information
Lemma
16. Sentiment Polarity Classification: Slot 3
Features:
N-Gram (unigram and bigram)
The sum of sentiment scores (including our DT-expanded lexicons)
Entity#Attribute pair given in the training set.
16
18. Impact of the Induced Lexicon
18
Feature Ablation Experiment for Sentiment Polarity Classification
(Slot 3)
19. Impact of the Induced Lexicon
19
Feature Ablation Experiment for Sentiment Polarity Classification
(Slot 3)
20. Future Work
20
Apply the Aspect-based Sentiment Analysis approach for German
Analysis of the Deutsche Bahn (DB) passenger user feedback
texts
http://lt.informatik.tu-darmstadt.de/de/research/absa-db-aspect-based-sentiment-analysis-for-db-products-and-
services
22. Opinion Target “OT” Extraction: Slot 2
Since we deal with the OT (opinion target) as a sequence labeling
problem, we identify the boundary of OT using the standard BIO
notation.
We follow the standard BIO notation, where ‘BASP’, ‘I-ASP’ and
‘O’ represent the beginning, intermediate and outside tokens of a
multi-word OT respectively.
22
The (O) Beef (B-ASP) Chow (I-ASP) Fun (I-ASP) was (O) very
(O) dry (O) . (O)
’ Beef Chow Fun’ is the
OT.
Notes de l'éditeur
In the next 12 min I am going to talk about our submission to the SemEval Aspect based sentiment analysis task. This work is done in cooperation betweet TU-Darmstadt and IIT Patna.
Social media allow online users to share and explain their views and opinions about products and events. Mining and summarizing customers opinions from text can help organizations to monitor their products and customers as well to make decisions about their purchase.
1) The first task is to identify the entity#attribute that is referred to by the aspect. E and A should be chosen from predefined classes.
2) Extract the aspect terms “opinion target” from the review text in which the opinion is expressed toward.
3) And finally for identified Entity#Attribute and OT, assign one of three polarity labels: positive, negative, or neutral.
Aspect level analysis is a fine grained type of sentiment analysis which identifies the sentiment orientation towards each aspect.
Semeval task 5 offers the opportunity to experiment Aspect based sentiment analysis on benchmark datasets and across various
domains and languages through three subtasks.
Add the name of slot
Mainly we our submission is based on two contributions:
We use distributional thesaurus to expand the existing lexical resources and sentiment lexicon. This allows us to reach a higher coverage on rare/unseen sentiment or aspect words.
The idea is to expand all the words in the existing seed lexicon. Eg. For english ………..
Then for the words which are not present in the original seed lexicon we assign a new sentiment score by the following equation.
Here are the expansion statistics for the induced lexicons.
7 languages .. 7 seed lexicons.
Induced lexicons after expansion using DTs..
And common entries are the words which present in both the seed and the induced lexicon.
Our second contribution is using Domain Dependency Graphs to extract a list of features.
The idea is to detect topics underlying a mixed-domain dataset, aggregate individual dependency relations between domain-specific
content words, weigh them with tf-idf and produce a DDG by selecting the highest-ranked words and their dependency relations. Since the domains are already given, no topic modeling is required.