Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Can Deep Learning solve the Sentiment Analysis Problem? 
Mark CieliebakZurichUniversity ofApplied Sciences 
Annual Meeting...
Outline 
1.What is sentiment analysis? 
2.How good are "classical" approaches? 
3.Does deep learning solve the problem? 
1...
About Me 
18.11.2014 Mark Cieliebak 3 
Mark Cieliebak 
Institute of Applied Information Technology (InIT) 
ZHAW, Winterthu...
WhatisSentiment Analysis 
"… WiFiAnalytics isa freeAndroid appthatI find veryhandywhenitcomestotroubleshootingandmonitorin...
Sample Application: SocialMedia Monitoring 
Text AnalyticsComponents: 
•Find relevant documents 
•Hot topicAnalysis 
•Sent...
FlavoursofSentiment Analysis 
•DocumentBased 
•SentenceBased 
•Target-Specific 
•Rating Prediction 
18.11.2014 Mark Cielie...
Classic ApproachestoSentiment Analysis 
Rule-Based 
Corpus-Based 
18.11.2014 Mark Cieliebak 7 
Predicted 
Label 
[3] 
[4]
Simple Sentiment Analysis 
Idea: Count numberofpositive andnegative words 
"This cameraisgreat[+1]." 
+1 (pos) 
"I find it...
Sample Rules 
18.11.2014 Mark Cieliebak 9 
•DetectBooster Words: "The carisreallyveryexpensive[-1 -1 -2]." 
•New Category"...
Linguistic Analysis 
-> RULE: Invertscoresofwordsbeingin thesame phrasesasnegation. 
“I do not find thecarexpensive[+2] 
a...
Rule-BasedSentiment Analysis 
Most ImportantIssues: 
-Requiresgoodhand-craftedrules 
-Hard totransfertonewtasksorlanguages...
Classic ApproachestoSentiment Analysis 
Rule-Based 
Corpus-Based 
18.11.2014 Mark Cieliebak 12 
Predicted 
Label 
[3] 
[4]
Corpus-BasedSentiment Analysis 
18.11.2014 Mark Cieliebak 13 
Predicted 
Label 
[4]
Corpus-BasedSentiment Analysis 
AnnotatedCorpus 
Sentence 
Polarity 
This analysis is good. 
Pos 
It looks awful. 
Neg 
Th...
Sample Features forTweets 
•Word ngrams:presence or absence of contiguous sequences of 1, 2, 3, and 4 tokens; noncontiguou...
Corpus-BasedSentiment Analysis 
Most ImportantIssues: 
-Requireslarge annotatedcorpora 
-Dependson goodfeatures 
18.11.201...
HowgoodareSentiment Analysis Tools? 
18.11.2014 Mark Cieliebak 17
Quick Poll 
•Short texts: 1-2 sentencesfromTwitter, news, reviewsetc. 
•Three-classclassification: positive, negative, oth...
Tool Accuracy 
0,2 
0,3 
0,4 
0,5 
0,6 
0,7 
0,8 
Accuracy 
Best Tool per Corpus 
Worst Tool per Corpus 
22 
61% 
40% 
Avg...
Tool Accuracy 
0,2 
0,3 
0,4 
0,5 
0,6 
0,7 
0,8 
Accuracy 
Best Tool per Corpus 
Worst Tool per Corpus 
Overall Best Tool...
Take-Home Lesson 
Accuracyofbestcommercialtoolon 
arbitraryshorttextsis59% 
18.11.2014 Mark Cieliebak 24
ApproachestoSentiment Analysis 
Rule-Based 
Corpus-Based 
18.11.2014 Mark Cieliebak 25 
Predicted 
Label 
[9] 
DeepLearnin...
DeepLearning on Text 
It'sall aboutWord Vectors! 
18.11.2014 Mark Cieliebak 26
Word2Vec 
•Hugesetoftextsamples(billionsofwords) 
•Extractdictionary 
•Word-Matrix: k-dimensional vectorforeachword(k typi...
The Magic ofWord Vectors 
18.11.2014 Mark Cieliebak 28 
King -Man + Woman≈ Queen 
Live Demo on 100b wordsfromGoogle News d...
Relations LearnedbyWord2Vec 
18.11.2014 Mark Cieliebak 29 
[11]
UsingWord Vectorsin NLP 
18.11.2014 Mark Cieliebak 30 
Collobertet al., 2011: 
•SENNA: GenericNLP System basedon wordvecto...
DeepLearning andSentiment 
Maas et al., 2011 
•Enrichwordvectorswithsentimentcontext 
•Capture semanticofwords(unsupervise...
DeepLearning andSentiment 
Socher et al. 2013: 
•Word Vectorsdo not helpforSentiment Analysis 
•RecursiveNeuralTensor Netw...
DeepLearning andSentiment 
QuocandMikolov, 2014: 
•"Paragraph Vectors" 
•Add context(sentence, paragraph, document) toword...
DoesDeepLearning solvethe 
Sentiment Analysis Problem? 
18.11.2014 Mark Cieliebak 34
Conclusion: DeepLearning forSentiment 
•Small improvements, not revolution 
•Veryrecentresearch, not yet"end ofthestory" 
...
Talk in Short! 
1.Classic approachesarerule-basedorcorpus-based 
2.State-of-the-art toolsclassify4 out of10 docswrong 
3.D...
ThankYou! 
Mark Cieliebak 
ZurichUniversity ofApplied Sciences(ZHAW) 
Winterthur, Switzerland 
Email: ciel@zhaw.ch, Websit...
Prochain SlideShare
Chargement dans…5
×

Can Deep Learning solve the Sentiment Analysis Problem

Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.

In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.

Can Deep Learning solve the Sentiment Analysis Problem

  1. 1. Can Deep Learning solve the Sentiment Analysis Problem? Mark CieliebakZurichUniversity ofApplied Sciences Annual Meeting ofSGAICO –Swiss Group forArtificialIntelligenceandCognitiveScience 18.11.2014
  2. 2. Outline 1.What is sentiment analysis? 2.How good are "classical" approaches? 3.Does deep learning solve the problem? 18.11.2014 Mark Cieliebak 2
  3. 3. About Me 18.11.2014 Mark Cieliebak 3 Mark Cieliebak Institute of Applied Information Technology (InIT) ZHAW, Winterthur Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel Text Analytics Open Data Automated Test Generation Research Interests Software Engineering
  4. 4. WhatisSentiment Analysis "… WiFiAnalytics isa freeAndroid appthatI find veryhandywhenitcomestotroubleshootingandmonitoringa homenetwork. "[1] 18.11.2014 Mark Cieliebak 4
  5. 5. Sample Application: SocialMedia Monitoring Text AnalyticsComponents: •Find relevant documents •Hot topicAnalysis •Sentiment analysis 18.11.2014 Mark Cieliebak 5 [7]
  6. 6. FlavoursofSentiment Analysis •DocumentBased •SentenceBased •Target-Specific •Rating Prediction 18.11.2014 Mark Cieliebak 6
  7. 7. Classic ApproachestoSentiment Analysis Rule-Based Corpus-Based 18.11.2014 Mark Cieliebak 7 Predicted Label [3] [4]
  8. 8. Simple Sentiment Analysis Idea: Count numberofpositive andnegative words "This cameraisgreat[+1]." +1 (pos) "I find itbeautiful[+1]andgood[+1]." +2 (pos) "Itlooksterrible[-1]." -1 (neg) "This carhasa bluecolor." 0 (neu) POSITIVE: great love nice ... NEUTRAL: hello see I … NEGATIVE: bad hate ugly ... UseSentiment-Dictionary: 18.11.2014 Mark Cieliebak 8
  9. 9. Sample Rules 18.11.2014 Mark Cieliebak 9 •DetectBooster Words: "The carisreallyveryexpensive[-1 -1 -2]." •New Category"Mixed": "This carhasan appealing[+1]design andcomfortable[+1]seats, but itisexpensive[-1]." •Negation: Invertonlyscore ofwordsoccuringafter thenegation: "The carisappealing[+3]andI do not[*-1]find itexpensive[-2]" •I do notfind thecarexpensiveanditisappealing. Need to“understand” thesentence
  10. 10. Linguistic Analysis -> RULE: Invertscoresofwordsbeingin thesame phrasesasnegation. “I do not find thecarexpensive[+2] anditisappealing[+3].” → +5 (pos) Sentence Sentence Conj. Sentence NounPhrase Verb Phrase Verb Adverb Verb Noun Phrase Adj. Noun Phrase Verb Phrase Det. Det Noun Det. Verb Participle I do not find the car expensive and it is appealing 18.11.2014 Mark Cieliebak 10
  11. 11. Rule-BasedSentiment Analysis Most ImportantIssues: -Requiresgoodhand-craftedrules -Hard totransfertonewtasksorlanguages -Doesnot workwellfortextswithbadgrammer(Twitter) 18.11.2014 Mark Cieliebak 11 [5]
  12. 12. Classic ApproachestoSentiment Analysis Rule-Based Corpus-Based 18.11.2014 Mark Cieliebak 12 Predicted Label [3] [4]
  13. 13. Corpus-BasedSentiment Analysis 18.11.2014 Mark Cieliebak 13 Predicted Label [4]
  14. 14. Corpus-BasedSentiment Analysis AnnotatedCorpus Sentence Polarity This analysis is good. Pos It looks awful. Neg This car has a blue color. Neu This car has an appealing design, comfortable seats, but it is expensive. Mix This carhasa veryappealingdesign, comfortableseats, but itisreallyexpensive. Mix This analysis is not good. Neg This car has an appealing design, comfortable seats and it is not expensive. Mix This movie was like a horror event. Neg This carisappealingandisnot expensive. Mix ... ... 18.11.2014 Mark Cieliebak 14
  15. 15. Sample Features forTweets •Word ngrams:presence or absence of contiguous sequences of 1, 2, 3, and 4 tokens; noncontiguous ngrams •POS: the number of occurrences of each part-of-speechtag •SentimentLexica: eachwordannotatedwithtonalityscore (-1..0..+1) •Negation: the number of negated contexts •Punctuation: the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks •Emoticons: presenceorabsence, last token is a positive or negative emoticon; •Hashtags: the number of hashtags; •Elongatedwords: the number of words with one character repeated (e.g. ‘soooo’) from: Mohammad et al., SemEval2013 18.11.2014 Mark Cieliebak 15
  16. 16. Corpus-BasedSentiment Analysis Most ImportantIssues: -Requireslarge annotatedcorpora -Dependson goodfeatures 18.11.2014 Mark Cieliebak 16 [6]
  17. 17. HowgoodareSentiment Analysis Tools? 18.11.2014 Mark Cieliebak 17
  18. 18. Quick Poll •Short texts: 1-2 sentencesfromTwitter, news, reviewsetc. •Three-classclassification: positive, negative, other •Accuracy= #푐표푟푟푒푐푡푑표푐푠 #푑표푐푠 Mark Cieliebak 21 Accuracy Votes <50% 50-60% 60-70% 70-80% 80-90% >90% "Howgoodarestate-of-the-art sentimentanalysistools?" 18.11.2014
  19. 19. Tool Accuracy 0,2 0,3 0,4 0,5 0,6 0,7 0,8 Accuracy Best Tool per Corpus Worst Tool per Corpus 22 61% 40% Avg. 18.11.2014 Mark Cieliebak [14]
  20. 20. Tool Accuracy 0,2 0,3 0,4 0,5 0,6 0,7 0,8 Accuracy Best Tool per Corpus Worst Tool per Corpus Overall Best Tool 23 61% 40% 59% Avg. 18.11.2014 Mark Cieliebak
  21. 21. Take-Home Lesson Accuracyofbestcommercialtoolon arbitraryshorttextsis59% 18.11.2014 Mark Cieliebak 24
  22. 22. ApproachestoSentiment Analysis Rule-Based Corpus-Based 18.11.2014 Mark Cieliebak 25 Predicted Label [9] DeepLearning [8]
  23. 23. DeepLearning on Text It'sall aboutWord Vectors! 18.11.2014 Mark Cieliebak 26
  24. 24. Word2Vec •Hugesetoftextsamples(billionsofwords) •Extractdictionary •Word-Matrix: k-dimensional vectorforeachword(k typically50-500) •Word vectorinitializedrandomly •Train wordvectorstopredictnextwords, givena sequenceofwordsfromsample text 18.11.2014 Mark Cieliebak 27 Major contributionsbyBengioet al. 2003, Collobert&Weston2008, Socher et al. 2011, Mikolovet al. 2013 [9]
  25. 25. The Magic ofWord Vectors 18.11.2014 Mark Cieliebak 28 King -Man + Woman≈ Queen Live Demo on 100b wordsfromGoogle News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/ [10]
  26. 26. Relations LearnedbyWord2Vec 18.11.2014 Mark Cieliebak 29 [11]
  27. 27. UsingWord Vectorsin NLP 18.11.2014 Mark Cieliebak 30 Collobertet al., 2011: •SENNA: GenericNLP System basedon wordvectors •Nomanualfeatureengineering •SolvesmanyNLP-Tasks asgoodasbenchmarksystems [12]
  28. 28. DeepLearning andSentiment Maas et al., 2011 •Enrichwordvectorswithsentimentcontext •Capture semanticofwords(unsupervised) andsentiment(supervised) in parallel, usingmultiple learningtasks wonderful amazing terrible awful 18.11.2014 Mark Cieliebak 31
  29. 29. DeepLearning andSentiment Socher et al. 2013: •Word Vectorsdo not helpforSentiment Analysis •RecursiveNeuralTensor Networks •Representingsentencestructuresastreeswhileaddingsentimentannotationsat same time •Restrictedtosingle, well-structuredsentences • 18.11.2014 Mark Cieliebak 32 [13]
  30. 30. DeepLearning andSentiment QuocandMikolov, 2014: •"Paragraph Vectors" •Add context(sentence, paragraph, document) towordvectorsduringtraining •Improvesmanyexistingapproaches 18.11.2014 Mark Cieliebak 33 [9]
  31. 31. DoesDeepLearning solvethe Sentiment Analysis Problem? 18.11.2014 Mark Cieliebak 34
  32. 32. Conclusion: DeepLearning forSentiment •Small improvements, not revolution •Veryrecentresearch, not yet"end ofthestory" •SemEval2015 will bebenchmark 18.11.2014 Mark Cieliebak 35
  33. 33. Talk in Short! 1.Classic approachesarerule-basedorcorpus-based 2.State-of-the-art toolsclassify4 out of10 docswrong 3.DeepLearning doesnot needhand-craftedfeatures 4.DeepLearning improvesexistingbenchmarks 18.11.2014 Mark Cieliebak 36
  34. 34. ThankYou! Mark Cieliebak ZurichUniversity ofApplied Sciences(ZHAW) Winterthur, Switzerland Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel 18.11.2014 Mark Cieliebak 37 [15]

×