2. We are all connected to each other...
● Information,
thoughts and
opinions are
shared
prolifically on
the social web
these days
● 72% of online
adults use
social
networking sites
3. ● In Britain and the
US, approx 1 hour
a day on social media
● 90% of marketers use
social media channels
for business
4. Popularity of Social Networking Sites
Twitter
● 284 million monthly active users
● 100 million daily active users
● 80% of world leaders use Twitter
Facebook
● 1.35 billion monthly active users. 864 million
daily active, 10 billion messages a day
● 30% of Americans get their news from Facebook
● Facebook has more users than the whole of the
Internet did in 2005
● Google+: 300 million monthly active users
● LinkedIn: 332 million users
● MySpace: 36 million users
● 44% of “users” have
never sent a tweet
● 390 million users have
no followers
● Google+: only 7 minutes
per month
5. Your grandmother is three times as likely to
use a social networking site now as in 2009
6. Why analyse social media?
● Contrary to popular belief, Twitter isn't just full of tweets about
Justin Bieber.
● In an emergency, one in two people would use social media to
let people know they were safe or to find out more information
● Less than 24 hours after the recent Nepal trekking disaster hit,
Facebook and Twitter accounts had been set up to provide
information channels, missing persons register etc.
● For companies, sentiment analysis tools are critical to keep
track of the market pulse, customer feedback, etc.
● Fast-growing, highly dynamic and high volume source of data
● Reflects language and current views of today's society
● Analysing social media is far more efficient than e.g. youGov
polls
7. Opinion mining from social media
● Understanding customer reviews and so on is a huge business
● But also:
● Tracking political opinions: what events make people change
their minds?
● How does public mood influence the stock market, consumer
choices etc?
● How are opinions distributed in relation to demographics?
● Who are the opinion influencers?
● SMA tools are crucial in order to make sense of all the
information
8. Social media analysis for journalists
● Twitter is immensely valuable to news
professionals
● gauging opinion on breaking news
● discovering new stories
● first hand reports from disasters,
war zones, ...
● Issues of veracity: London Eye on Fire!
9. Analysing language in social media is hard
● Grundman:politics makes #climatechange scientific
issue,people don’t like knowitall rational voice tellin em wat 2do
● @adambation Try reading this article , it looks like it would be
really helpful and not obvious at all. http://t.co/mo3vODoX
● Want to solve the problem of #ClimateChange? Just #vote for a
#politician! Poof! Problem gone! #sarcasm #TVP #99%
● Human Caused #ClimateChange is a Monumental Scam!
http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!!
Lying to us like MOFO's Tax The Air We Breath! F**k Them!
10. We need tools for hashtag analysis
● Hashtags need unravelling:
● #gasprices
● And disambiguating:
● #therapist
● #nowthatcherisdead
14. NER on Tweets
● NER on Tweets much harder than on longer text
● Very short, so ambiguous terms hard to interpret
● Poor grammar and spelling, use of abbreviations, shorthands
● Twitter-specific features: hashtags, @mentions, etc.
● Tools designed for longer texts do very badly on Twitter
System P R F1 F0.5
OpenCalais 68.59 67.17 67.87 68.30
Lupedia 70.93 44.17 54.44 63.27
TextRazor 59.12 83.83 69.34 62.82
TwitIE 69.69 61.03 65.07 67.76
Zemanta 29.64 29.31 29.47 29.57
15. Tools for Sentiment Analysis
● There are lots of tools for sentiment analysis around
● Many of them don't work well at more than a very basic level
● They mainly use dictionary lookup for positive and negative words
● ML methods only works for text that's similar in style to the
training data, and it's hard to understand when it goes wrong
● Things like sarcasm tend not to get picked up
● They classify the tweets as positive or negative, but not with
respect to the keyword you're searching for
● keyword search just retrieves any tweet mentioning it, but not
necessarily about it as a topic
● no correlation between the keyword and the sentiment
16. Sentiment Analysis in GATE
● Knowledge-based linguistic approach based on entity
detection for opinion holders and targets
● Sentiment words have to be in a linguistic relation to the
opinion holder and target
● Use linguistic analysis to deal with scope issues
(negation, hashtags, sarcasm etc)
● Sentiment word scores are modified incrementally
● Easy understanding of errors and adaptation of the rules
● Twitter-specific pre-processing using TwitIE
17. This all sounds like it would be hard
to set up on my system!
18. GATE Cloud to the Rescue
● What?
● end-to-end text and web processing solutions from the
GATE family running on cloud computing infrastructures.
● Why?
● Solve any sort of text processing problem: web, text or
opinion mining; indexing and search (fulltext, boolean,
conceptual, structural); information extraction; semantic
annotation; sentiment analysis; ontology population; etc.
● Run large-scale jobs without investing in server hardware
or other fixed costs.
● Exploit a 15-year R&D programme, the expertise of the
GATE community and a defined and repeatable process.
19. Benefits of Gate Cloud
Text Analytics Consumer
Cloud Large scale, no CAPEX,
no system admin, no commitment
Open Source No vendor lock-in
TA Services Twitter, News, BioMed, Sentiment, etc.
low-level pre-processing support (POS tagging etc)
APIs Integrate
20. Application Types
● Low-level: stemmers, PoS taggers, phrase chunkers,
morphological analysers
● Coverage: tools for 18 languages including BG and RU
● General Purpose IE: named entities, numbers,
measurements, language ID
● Domain-specific IE: News, TwitIE, Biomed
● LOD-based semantic annotation: DBpedia, GeoNames,
Freebase
● Sentiment analysis
● Summarisation
● Includes many 3rd party tools also
20
22. It's just like online shopping
● Click through to the online shop, browse products and add them to
your shopping basket.
● Create an account and then buy credit vouchers
● Put the vouchers in your account, and go to checkout.
● We'll email you the login or job creation details for your cloud servers.
● Monitor and control your cloud machines on your dashboard.
● Use our existing applications:
● Just upload your documents and sit down with a cup of tea
● Create your own pipeline:
● Upload your own customised application along with your
documents, and sit down with a cup of tea
23. 23
Summary
● SMA tools are crucial, but hard to find what's good
● Solutions are readily available in GATE
● Easy to test different versions and configurations
● Open source and easily customisable
● Big data and installation problems are solved with GATE Cloud:
● PaaS for text analytics
● Low barrier to entry
● Just pay for what you use
● State-of-the-art pipelines for news and social media
● More pipelines constantly being added
24. Acknowledgements and more information
● GATE: http://gate.ac.uk
● GATE Cloud: http://gatecloud.net
● Annomarket: http://www.annomarket.eu
● Research partially supported by the European Union/EU under
the Information and Communication Technologies (ICT) theme
of the 7th Framework Programme for R&D (FP7) DecarboNet
(610829) and AnnoMarket (296322)
● Original GATE Cloud development supported by JISC/EPSRC,
reference number EP/I034092/1
This document does not represent the opinion of the European Community, and the
European Community is not responsible for any use that might be made of its content