Scaling API-first – The story of a global engineering organization
Social media week
1. Sentiment Analysis
#SMWsentiment
Tuesday 25th September 2-3pm
Stephen Tagg & Jillian Ney
2. Workshop Overview
1. Have you checked in (& on Foursquare)?
Backchat on #SMWsentiment…
2. A little introduction, definitions..
3. Sentiment Analysis issues
4. An example using free software
5. Other free software
6. Other software
7. Workshop Discussion Q&A
3. Definitions
• Text Analysis – a bit more specific
• Opinion Mining – not quite the same but overlap – will
it make surveys less relevant?
• “Sentiment analysis or opinion mining refers to the application
of natural language processing, computational linguistics, and text
analytics to identify and extract subjective information in source
materials.
• Generally speaking, sentiment analysis aims to determine the
attitude of a speaker or a writer with respect to some topic or the
overall contextual polarity of a document. The attitude may be his
or her judgement or evaluation (see appraisal theory), affective
state (that is to say, the emotional state of the author when
writing), or the intended emotional communication (that is to say,
the emotional effect the author wishes to have on the reader).”
Wikipedia
• Appraisal theory – psychological theories of emotion
4. Sentiment analysis and Web 2.0
• The rise of social media such as blogs and social networks has fuelled interest in
sentiment analysis. With the proliferation of reviews, ratings, recommendations
and other forms of online expression, online opinion has turned into a kind of
virtual currency for businesses looking to market their products, identify new
opportunities and manage their reputations. As businesses look to automate the
process of filtering out the noise, understanding the conversations, identifying the
relevant content and actioning it appropriately, many are now looking to the field
of sentiment analysis. If web 2.0 was all about democratizing publishing, then the
next stage of the web may well be based on democratizing data mining of all the
content that is getting published.
• One step towards this aim is accomplished in research. Several research teams in
universities around the world currently focus on understanding the dynamics of
sentiment in e-communities through sentiment analysis. The CyberEmotions
project, for instance, recently identified the role of negative emotions in driving
social networks discussions. Sentiment analysis could therefore help understand
why certain e-communities die or fade away (e.g., MySpace) while others seem to
grow without limits (e.g., Facebook).
• The problem is that most sentiment analysis algorithms use simple terms to
express sentiment about a product or service. However, cultural factors, linguistic
nuances and differing contexts make it extremely difficult to turn a string of
written text into a simple pro or con sentiment. The fact that humans often
disagree on the sentiment of text illustrates how big a task it is for computers to
get this right. The shorter the string of text, the harder it becomes.
5. Sentiment Analysis issues
Hype
• Gartner hype cycle – text analysis is in a good
place – after the initial hype. More avenues
for research
– General Sentiment, Inc
– Attensity
– Lexalytics
– Telligent Systems
– CyberEmotions project
6.
7.
8. An example
• 31,000+ hotel evaluations from Dubai. (thanks to Prof
Alan Wilson).
• Applied the R tm package. R is a statistical package and
tm is a package for text mining available in it.
• Considerable time learning how to read in the data
correctly.
• Sentiment is another R package for generating
sentiment analysis of texts.
• Hotel evaluations need feature/ aspect-based
sentiment analysis
• Machine learning – latent semantic analysis, support
vector machines, Bag of Words, Semantic orientation.
9. First three cases
ID Hotel Date of Data Title Body
Review source
1 247 Ibis 07-MAY- agoda OK if your PROS: - Access to WTC - Not far from anywhere. CONS: - No in-room safe - No mini bar in
606 World 2010 work is at room - No comp water - Rather expensive internet access - poor information at lobby. .
59 Trade the world Upon check in they asked me for a cash deposit which was higher than the entire hotel
Centr trade center bill and informed in advance that the change will not be in US $ but in local currency.
e Anyway did a credit card inprint which satisfied them. No bell boy service, you carry all
Dubai your luggage yourself; travel light. Breakfast is ok, and with a few chages day by day.
Hotel
2 247 Rama 28-MAY- agoda Conveniently PROS: - Convenient location: very near to airport, shopping centres - Hotel facilities are
615 da 2010 located with superb - Good restaurants and entertainment centres - Easy to take taxis. CONS: -
60 Conti all the problems with the room key card, which get spoilt fast and need to be repalced at least
nental facilities once a day. . . If you go to Dubai, I would definately recommend this hotel. It is
Hotel, conveniently located and has all the facilities. The hotel is clean and the staff are friendly.
Dubai It has a wide choice of restaurants and entertainment centres. It has some shopping
centres nearby and a metro station is currently being built very near to this hotel. The
airport is only about 15 mins drive from this hotel
3 256 Nihal 01-MAY- agoda Nihal Hotel PROS: - Cozy - Inexpensive - Good service - Nice disco (Filipino disco) - 450 Meter from
587 Hotel, 2010 Dubai Dubai metro - Clean rooms . CONS: - Small - No swimming pool/Gym - Rooms' doors are
26 Dubai with with old-style locks. . I like the place so much as it is inexpensive, cozy, in the down
town, has a nice disco (which i like too much), has clean rooms, and near to Dubai metro.
11. Sentiment analysis results
Hotel * BEST_FIT Crosstabulation
BEST_FIT Total
anger disgust fear joy sadnes surprise
s
Count 0 0 0 1 0 0 1
ABC Arabain Suites 100.
% within Hotel 0.0% 0.0% 0.0% 100.0% 0.0% 0.0%
0%
Count 0 0 0 0 1 0 1
Abu Dhabi Gulf Hotel 100.0 100.
% within Hotel 0.0% 0.0% 0.0% 0.0% 0.0%
% 0%
Count 8 0 1 92 12 7 120
Admiral Plaza Hotel, Dubai 100.
% within Hotel 6.7% 0.0% 0.8% 76.7% 10.0% 5.8%
0%
Count 0 0 0 2 0 0 2
Akas-Inn Hotel Apartment 100.
% within Hotel 0.0% 0.0% 0.0% 100.0% 0.0% 0.0%
0%
Count 1 0 1 13 1 1 17
Al Bustan Centre &
100.
Residence Hotel, Dubai % within Hotel 5.9% 0.0% 5.9% 76.5% 5.9% 5.9%
0%
Hotel
Count 11 1 0 55 5 4 76
Al Bustan Rotana Hotel,
100.
Dubai % within Hotel 14.5% 1.3% 0.0% 72.4% 6.6% 5.3%
0%
Count 0 0 0 9 1 1 11
Al Deyafa Hotel Apartments
100.
3, Dubai % within Hotel 0.0% 0.0% 0.0% 81.8% 9.1% 9.1%
0%
Count 0 0 0 0 0 1 1
Al Faris Hotel Apartments 1,
100.
Dubai % within Hotel 0.0% 0.0% 0.0% 0.0% 0.0% 100.0%
0%
Count 0 0 1 8 2 0 11
Al Faris Hotel Apartments 2,
100.
Dubai % within Hotel 0.0% 0.0% 9.1% 72.7% 18.2% 0.0%
0%
Count 5 2 1 21 4 3 36
Al Jawhara Gardens Hotel,
100.
Dubai % within Hotel 13.9% 5.6% 2.8% 58.3% 11.1% 8.3%
0%
12. Sentiment Analysis issues
Indicators
• R-sentiment uses either Janyce Wiebe’s
subjectivity lexicon to classify polarity (+, 0, -)
or “Word-net affect” to generate scores for
anger, disgust, fear, joy, sadness and surprise.
• May need own lexicon.
• Remove objective statements.
• Use of Machine Learning, Artificial
intelligence to address ambiguity, multiple
meaning, context…
13. Other free software
• GATE a computer science academic tool which
looks to allow sophisticated processing.
• RapidMiner – built on Java. Assumes you’ve
got maybe 20 or 200 texts (not 31,000) so
rapid it wasn’t!
• You’ll struggle to Acquire data, process it and
then you need to be able to summarise and
interpret it!
14. Sentiment Analysis as an add-on to
social media metrics
• Brand Watch – sentiment analysis – based on
machine learning – select an industry for the
query.
• Alterian SM2 – part of dictionary – includes
emoticons
• Radian6 includes automated sentiment
analysis and a Clarabridge Sentiment analysis
• Other options on the resources hand-out
15. Agenda for participants
• Your experience and interests
– What sentiment analysis have you seen/ used
– What could Sentiment analysis contribute
• Your concerns, barriers to use
– Privacy issues
– Effort/ Cost of acquiring data and doing it yourself
– Trusting a third party to do Sentiment Analysis –
judging their offerings – cost and vfm
16. Workshop activities
• What do people say about
– your brand,
– your company/ organisation?
• How do they feel?
• What UGC (User Generated Content) do you
contribute to social media?
17. Other developments
• What do you say about….? Scottish
independence issues – effect of other forum
content.. (are Scots overwhelmed by so many
English on BBC discussion forum)
• I’m always on the look out for data – but I can
take months/ years!!!