SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
University of Sheffield, NLP
Tools for (Almost) Real-Time
Social Media Analysis
Dr. Diana Maynard
Dept of Computer Science
University of Sheffield, UK
19 March 2015, Vienna
University of Sheffield, NLP
We are all connected to each other...
● Information,
thoughts and
opinions are
shared prolifically
on the social web
these days
● 72% of online
adults use social
networking sites
University of Sheffield, NLP
Your grandmother is three times as likely to
use a social networking site now as in 2009
University of Sheffield, NLP
There are hundreds of tools for social media
analytics
● Most of them are commercial and not freely available
● The research tools tend to focus on specific topics and
scenarios, and aren't easily adaptable
● The analysis they do often doesn't go much beyond number
crunching, e.g.
– look at number of tweets, retweets, favourites
– filter by hashtag or keyword for topic categorisation
– use off-the-shelf sentiment tools
– use counts of word length, POS categories etc
– very little semantics, don't deal with variation, ambiguity,
slang, sarcasm etc.
University of Sheffield, NLP
Analysing Social Media is harder than it sounds
There are lots
of things to
think about!
University of Sheffield, NLP
Analysing language in social media is hard
● Grundman:politics makes #climatechange scientific issue,people
don’t like knowitall rational voice tellin em wat 2do
● @adambation Try reading this article , it looks like it would be
really helpful and not obvious at all. http://t.co/mo3vODoX
● Want to solve the problem of #ClimateChange? Just #vote for a
#politician! Poof! Problem gone! #sarcasm #TVP #99%
● Human Caused #ClimateChange is a Monumental Scam!
http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!!
Lying to us like MOFO's Tax The Air We Breath! F**k Them!
University of Sheffield, NLP
Let's search for keywords like “Arctic”
Oops!
University of Sheffield, NLP
Seems like we need something to help!
How about NLP?
University of Sheffield, NLP
It is difficult to access unstructured information
efficiently
Information extraction tools can help you:
● Save time and money on management of text and data from
multiple sources
● Find hidden links scattered across huge volumes of diverse
information
● Integrate structured data from variety of sources
● Interlink text and data
● Collect information and extract new facts
University of Sheffield, NLP
What is Entity Recognition?
●
Entity Recognition is about recogising and classifying key Named
Entities and terms in the text
●
A Named Entity is a Person, Location, Organisation, Date etc.
●
A term is a key concept or phrase that is representative of the text
●
Entities and terms may be described in different ways but refer to
the same thing. We call this co-reference.
Mitt Romney, the favorite to win the Republican nomination for president in 2012
DatePerson Term
The GOP tweeted that they had knocked on 75,000 doors in Ohio the day prior.
Organisation
co-reference
Location
University of Sheffield, NLP
What is Event Recognition?
●
An event is an action or situation relevant to the domain
expressed by some relation between entities or terms.
●
It is always grounded in time, e.g. the performance of a
band, an election, the death of a person
Mitt Romney, the favorite to win the Republican nomination for president in 2012
Event DatePerson
Relation Relation
University of Sheffield, NLP
Why are Entities and Events Useful?
● They can help answer the “Big 5” journalism questions
(who, what, when, where, why)
● They can be used to categorise the texts in different ways
– look at all texts about Obama.
● They can be used as targets for opinion mining
– find out what people think about President Obama
● When linked to an ontology and/or combined with other
information, they can be used for reasoning about things
not explicit in the text
– seeing how opinions about different American
presidents have changed over the years
University of Sheffield, NLP
Approaches to Information Extraction
Knowledge Engineering

rule based

developed by
experienced language
engineers

make use of human
intuition

easier to understand
results

development could be
very time consuming

some changes may be
hard to accommodate
Learning Systems

use statistics or other
machine learning

developers do not need
LE expertise

requires large amounts of
annotated training data

some changes may
require re-annotation of
the entire training corpus
University of Sheffield, NLP
Seems like we need a tool to do this clever
stuff for us.
How about GATE?
University of Sheffield, NLP
What is GATE?
GATE is an NLP toolkit developed at the University of Sheffield
over the last 20 years.
It's open source and freely available. http://gate.ac.uk
• components for language processing, e.g. parsers, machine
learning tools, stemmers, IR tools, IE components for various
languages...
• tools for visualising and manipulating text, annotations,
ontologies, parse trees, etc.
• various information extraction tools
• evaluation and benchmarking tools
University of Sheffield, NLP
GATE components
● Language Resources (LRs), e.g. lexicons, corpora,
ontologies
● Processing Resources (PRs), e.g. parsers, generators,
taggers
●
Visual Resources (VRs), i.e. visualisation and editing
components
● Algorithms are separated from the data, which means:
– the two can be developed independently by users with
different expertise.
– alternative resources of one type can be used without
affecting the other, e.g. a different visual resource can be
used with the same language resource
University of Sheffield, NLP
ANNIE
• ANNIE is GATE's rule-based IE system
• It uses the language engineering approach (though we also
have tools in GATE for ML)
• Distributed as part of GATE
• Uses a finite-state pattern-action rule language, JAPE
• ANNIE contains a reusable and easily extendable set of
components:
– generic preprocessing components for tokenisation,
sentence splitting etc
– components for performing NE on general open domain text
University of Sheffield, NLP
ANNIE Modules
University of Sheffield, NLP
19
Document with Tokens
University of Sheffield, NLP
20
Document with Sentences
University of Sheffield, NLP
Gazetteer editor
definition file
entries entries for selected list
University of Sheffield, NLP
Named Entity Grammars
• Hand-coded rules written in JAPE applied to annotations to
identify NEs
• Phases run sequentially and constitute a cascade of FSTs over
annotations
• Annotations from format analysis, tokeniser. splitter, POS tagger,
morphological analysis, gazetteer etc.
• Because phases are sequential, annotations can be built up over
a period of phases, as new information is gleaned
• Standard named entities: persons, locations, organisations, dates,
addresses, money
• Basic NE grammars can be adapted for new applications, domains
and languages
University of Sheffield, NLP
Document with NEs
University of Sheffield, NLP
Coreference
University of Sheffield, NLP
Right, so we have a technology, and we have
a tool to apply the technology.
Now how do we do it?
University of Sheffield, NLP
Framework
● Data collection (via Twitter streaming API)
● Documents stored as JSON and processed (annotated) via
GCP
● Documents indexed via MIMIR
● Search and visualisation via MIMIR/Prospector
University of Sheffield, NLP
Live streaming (coming soon)
● If we want to process the tweets in real time, we can use the
Twitter streaming client to feed the incoming tweets to a
message queue.
● A separate process then reads messages from the queue,
annotates them and pushes them into Mimir.
● If the rate of incoming tweets exceeds the capacity of the
processing side, we can simply launch more instances of the
message consumer across different machines to scale the
capacity.
● Query and visualisation can then be performed as before on
whatever data we currently have available
University of Sheffield, NLP
Let's look at some examples
(For anyone who grew up in the UK):
“Here's one I prepared earlier”
University of Sheffield, NLP
DecarboNet project: what do people think about
climate change?
And how much do
we really know
about it?
How do we know
what's really true?
“It's cold in my flat“
University of Sheffield, NLP
Political Futures Tracker Application
● Example of using the technology on a real scenario - analysing
political tweets in the run-up to the UK elections
● Project funded by Nesta http://www.nesta.org.uk/
● Series of blog posts about the project, leading up to the
election, see e.g.
http://www.nesta.org.uk/blog/silver-surfers-and-westminster-
twitterati
University of Sheffield, NLP
Twitter collection
● collected Tweets using Twitter's “statuses/filter” streaming API
● can follow up to 5000 user IDs and receive in real time
● collected all tweets and retweets posted by these users
● also retweets of, and replies to, any tweet posted by these
users
University of Sheffield, NLP
Twitter collection (2)
● Initial list of 506 UK MPs' Twitter accounts, extracted from a
CSV file made available by BBC News Labs and cleaned
● Also added list of UK election candidates collected and made
available at https://yournextmp.com, and updated periodically
– 1,504 on 13th
January 2015
– 1,811 on 2nd
February 2015
● Added 21 official party accounts
● Total number of accounts followed at 16th
Feb: 1,894
– 444 MPs standing again are included in both the MP and
candidate lists
University of Sheffield, NLP
Tweets per hour collected
Government U-turn
on fracking
Douglas Carswell's accidental
“Hello Kitty” tweet
University of Sheffield, NLP
Longer web documents
● Also crawled websites of UK political parties (Con, Lab, LD,
Green, UKIP, BNP, SNP, PC, plus the NI parties and various
smaller parties)
●
Initial crawl on 28th
-29th
October retrieved 375MB
(compressed)
● Re-crawled regularly to pick up new pages
University of Sheffield, NLP
Politician and candidate annotation
● Acquired and corrected a list of UK MPs and election candidates
and their affiliations, twitter accounts and DBpedia URIs
● Converted to gazetteers so that MPs in various forms (name or
twitter handle) can be recognised in tweets and annotated with
the relevant info (URI, full name, constituency etc.)
University of Sheffield, NLP
Recognition of MPs / Candidates
University of Sheffield, NLP
Topic Recognition
● A set of themes was taken from the categories used on http://www.gov.uk
● For each theme, a gazetteer list was developed containing typical
keywords and phrases representative of that theme
● e.g. “asylum seeker” indicates the topic “borders and immigration”
● Each list was expanded via:
– automatic term recognition (based on tf.idf) over a corpus of
manifestos and other political documents
– manual additions
● Each list also contains potential head terms and modifiers which can be
expanded into longer terms on the fly from the text during analysis stage
● e.g. “terrorist” can modify many other words (terrorist attack, terrorst
threat, ...)
University of Sheffield, NLP
Topic recognition
This term is found by first
recognising the head
word “job” from a list
under the theme
“employment” and
matching against its root
form in the text, i.e. “job”.
It is then extended to
include the adjectival
modifier “British”, which
is not present in a list
anywhere.
University of Sheffield, NLP
Sentiment annotation
● Annotations are created over the whole sentence and contain
the following features:
– sentiment_kind: optimism / pessimism
– holder: the person holding the opinion (MP's name)
– holder_URI: the URI fo the holder
– target: the target of the opinion, e.g. MP or topic
– target_URI: if appropriate, the URI of the target
– score: a positive/negative value reflecting the strength of opinion
– sarcasm: yes / no (whether sarcasm is present)
– sentiment_string: the main word(s) that contain sentiment
● These annotations and features will be used as input to MIMIR
to facilitate analysis/aggregation
University of Sheffield, NLP
Positive opinion about science and innovation
University of Sheffield, NLP
GATE Mímir:
Answering Questions Google Can't
University of Sheffield, NLP
GATE Mimir
● can be used to index and search over text,
annotations, semantic metadata (concepts and
instances)
● allows queries that arbitrarily mix full-text,
structural, linguistic and semantic annotations
● is open source
University of Sheffield, NLP
Show me:
● all documents mentioning a temperature between 30 and 90
degrees F (expressed in any unit)
● all abstracts written in French on Patent Documents from the
last 6 months which mention any form of the word “transistor”
in the English abstract
● the names of the patent inventors of those abstracts
● all documents mentioning steel industries in the UK, along with
their location
University of Sheffield, NLP
Search news articles for politicians born in Sheffield
University of Sheffield, NLP
Document Indexing with MIMIR
● MIMIR allows for indexing and querying text, annotations and
semantic knowledge
– this gives a rich source of data for analysis
● Currently we have used MIMIR to index
– the raw collected text
– annotations provided by Twitter and by the applications
University of Sheffield, NLP
Examples of Mimir queries on our corpus
● Get all documents which talk about the borders/immigration topic
{Topic theme = "borders_and_immigration"}
● Get all documents where the author of the document is a candidate
{DocumentAuthor sparql = "?c nesta:candidate ?author_uri"}
● Get all documents where the author is an MP standing for re-election for the
same seat
{DocumentAuthor sparql="?c nesta:candidate ?author_uri . ?c dbp-prop:mp ?
author_uri"}
● Get all documents where the author is a candidate for the Sheffield Hallam
constituency
{DocumentAuthor
sparql="<http://dbpedia.org/resource/Sheffield_Hallam_(UK_Parliament_constit
uency)> nesta:candidate ?author_uri"}
University of Sheffield, NLP
What do the different parties talk about?
Conservative vs Labour
Transport, Europe and employment are
mentioned more by Conservatives
NHS is mentioned more by Labour
University of Sheffield, NLP
Topics mentioned by UKIP
University of Sheffield, NLP
Topics mentioned by the Green Party
Expected topics are high on the list
University of Sheffield, NLP
How old are MPs?
University of Sheffield, NLP
Where did people tweet about the economy?
University of Sheffield, NLP
Measuring engagement about climate change
● We also used the tools to measure how engaged both MPs
and the public are about the topic of climate change
● Comparison of climate change with other political topics
● Theory is that people are quite apathetic about most political
topics in general
● But people are more enthusiastic about climate change,
because it's something they can actively do something about
● Results showed that climate change is not frequently tweeted
about by most politicians apart from the Green Party, but is in
top 3 topics for most of the engagement criteria we applied
(retweets, replies, sentiment, optimism, @mentions, URLs)
● Climate change tweets contained the highest number of URLs
- direct engagement with additional information
University of Sheffield, NLP
Average retweets per tweet
University of Sheffield, NLP
Terms that co-occur with environmental topic
University of Sheffield, NLP
Terms that co-occur with environment topic
University of Sheffield, NLP
Terms that co-occur with immigration topic
University of Sheffield, NLP
Climate change tweets often express sentiment
University of Sheffield, NLP
Summary
● Once you have the indexed data, you can carry on doing all
kinds of interesting comparisons and analysis.
● Simple analysis tools can give you pretty pictures, but you can
do much more interesting things when you delve a bit deeper
and make use of information not explicit in the text
● For this you need both NLP and Linked Open Data
● Our tools are all freely available and open source

Contenu connexe

Tendances

GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringAhmed Magdy Ezzeldin, MSc.
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social mediaDiana Maynard
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Diana Maynard
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...Diana Maynard
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingAnidata
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinChristopher Currin
 
Dark Data and Improving Human Rights in Fulton County
Dark Data and Improving Human Rights in Fulton CountyDark Data and Improving Human Rights in Fulton County
Dark Data and Improving Human Rights in Fulton CountyAnidata
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharingYi-Shin Chen
 
Semantic technology: The tourists’ voice comes alive.
Semantic technology: The tourists’ voice comes alive.Semantic technology: The tourists’ voice comes alive.
Semantic technology: The tourists’ voice comes alive.Luisa Mich
 
Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Hardik Gohel
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsaneeshabakharia
 
Twitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTwitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTERMCAT
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Communities of interpretation2014
Communities of interpretation2014Communities of interpretation2014
Communities of interpretation2014John Griffiths
 

Tendances (20)

GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human Trafficking
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher Currin
 
Dark Data and Improving Human Rights in Fulton County
Dark Data and Improving Human Rights in Fulton CountyDark Data and Improving Human Rights in Fulton County
Dark Data and Improving Human Rights in Fulton County
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharing
 
Semantic technology: The tourists’ voice comes alive.
Semantic technology: The tourists’ voice comes alive.Semantic technology: The tourists’ voice comes alive.
Semantic technology: The tourists’ voice comes alive.
 
Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
 
Twitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTwitter provides a selfie of envolving language
Twitter provides a selfie of envolving language
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Communities of interpretation2014
Communities of interpretation2014Communities of interpretation2014
Communities of interpretation2014
 

Similaire à Tools for (Almost) Real-Time Social Media Analysis

Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedarcomem
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingLeon Derczynski
 
Press Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectPress Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectLiMoSINe Project
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advancedarcomem
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Integrate Metadata Creation into Oral History Production Process, Xiaoli Ma
Integrate Metadata Creation into Oral History Production Process, Xiaoli MaIntegrate Metadata Creation into Oral History Production Process, Xiaoli Ma
Integrate Metadata Creation into Oral History Production Process, Xiaoli MaVisual Resources Association
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxsaivinay93
 
Natural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationNatural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationTakayuki Yamazaki
 
slides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfslides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfPetr Korab
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationCELI
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryRachel Vacek
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - ExperimentsGaurav Marwaha
 
Changing a workshop from physical to online delivery
Changing a workshop from physical to online deliveryChanging a workshop from physical to online delivery
Changing a workshop from physical to online deliverynortherncollaboration
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analyticsshengjing 孙胜晶
 

Similaire à Tools for (Almost) Real-Time Social Media Analysis (20)

Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Entity Search Engine
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Press Kit -LiMoSINe Project
Press Kit -LiMoSINe ProjectPress Kit -LiMoSINe Project
Press Kit -LiMoSINe Project
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Integrate Metadata Creation into Oral History Production Process, Xiaoli Ma
Integrate Metadata Creation into Oral History Production Process, Xiaoli MaIntegrate Metadata Creation into Oral History Production Process, Xiaoli Ma
Integrate Metadata Creation into Oral History Production Process, Xiaoli Ma
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 
Natural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationNatural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business Optimization
 
slides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfslides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdf
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentation
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual Inquiry
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
Changing a workshop from physical to online delivery
Changing a workshop from physical to online deliveryChanging a workshop from physical to online delivery
Changing a workshop from physical to online delivery
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 

Plus de Diana Maynard

Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Diana Maynard
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferencesDiana Maynard
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsDiana Maynard
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Diana Maynard
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDiana Maynard
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Diana Maynard
 
Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social mediaDiana Maynard
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaDiana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 

Plus de Diana Maynard (14)

Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social media
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 

Dernier

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Dernier (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Tools for (Almost) Real-Time Social Media Analysis

  • 1. University of Sheffield, NLP Tools for (Almost) Real-Time Social Media Analysis Dr. Diana Maynard Dept of Computer Science University of Sheffield, UK 19 March 2015, Vienna
  • 2. University of Sheffield, NLP We are all connected to each other... ● Information, thoughts and opinions are shared prolifically on the social web these days ● 72% of online adults use social networking sites
  • 3. University of Sheffield, NLP Your grandmother is three times as likely to use a social networking site now as in 2009
  • 4. University of Sheffield, NLP There are hundreds of tools for social media analytics ● Most of them are commercial and not freely available ● The research tools tend to focus on specific topics and scenarios, and aren't easily adaptable ● The analysis they do often doesn't go much beyond number crunching, e.g. – look at number of tweets, retweets, favourites – filter by hashtag or keyword for topic categorisation – use off-the-shelf sentiment tools – use counts of word length, POS categories etc – very little semantics, don't deal with variation, ambiguity, slang, sarcasm etc.
  • 5. University of Sheffield, NLP Analysing Social Media is harder than it sounds There are lots of things to think about!
  • 6. University of Sheffield, NLP Analysing language in social media is hard ● Grundman:politics makes #climatechange scientific issue,people don’t like knowitall rational voice tellin em wat 2do ● @adambation Try reading this article , it looks like it would be really helpful and not obvious at all. http://t.co/mo3vODoX ● Want to solve the problem of #ClimateChange? Just #vote for a #politician! Poof! Problem gone! #sarcasm #TVP #99% ● Human Caused #ClimateChange is a Monumental Scam! http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!! Lying to us like MOFO's Tax The Air We Breath! F**k Them!
  • 7. University of Sheffield, NLP Let's search for keywords like “Arctic” Oops!
  • 8. University of Sheffield, NLP Seems like we need something to help! How about NLP?
  • 9. University of Sheffield, NLP It is difficult to access unstructured information efficiently Information extraction tools can help you: ● Save time and money on management of text and data from multiple sources ● Find hidden links scattered across huge volumes of diverse information ● Integrate structured data from variety of sources ● Interlink text and data ● Collect information and extract new facts
  • 10. University of Sheffield, NLP What is Entity Recognition? ● Entity Recognition is about recogising and classifying key Named Entities and terms in the text ● A Named Entity is a Person, Location, Organisation, Date etc. ● A term is a key concept or phrase that is representative of the text ● Entities and terms may be described in different ways but refer to the same thing. We call this co-reference. Mitt Romney, the favorite to win the Republican nomination for president in 2012 DatePerson Term The GOP tweeted that they had knocked on 75,000 doors in Ohio the day prior. Organisation co-reference Location
  • 11. University of Sheffield, NLP What is Event Recognition? ● An event is an action or situation relevant to the domain expressed by some relation between entities or terms. ● It is always grounded in time, e.g. the performance of a band, an election, the death of a person Mitt Romney, the favorite to win the Republican nomination for president in 2012 Event DatePerson Relation Relation
  • 12. University of Sheffield, NLP Why are Entities and Events Useful? ● They can help answer the “Big 5” journalism questions (who, what, when, where, why) ● They can be used to categorise the texts in different ways – look at all texts about Obama. ● They can be used as targets for opinion mining – find out what people think about President Obama ● When linked to an ontology and/or combined with other information, they can be used for reasoning about things not explicit in the text – seeing how opinions about different American presidents have changed over the years
  • 13. University of Sheffield, NLP Approaches to Information Extraction Knowledge Engineering  rule based  developed by experienced language engineers  make use of human intuition  easier to understand results  development could be very time consuming  some changes may be hard to accommodate Learning Systems  use statistics or other machine learning  developers do not need LE expertise  requires large amounts of annotated training data  some changes may require re-annotation of the entire training corpus
  • 14. University of Sheffield, NLP Seems like we need a tool to do this clever stuff for us. How about GATE?
  • 15. University of Sheffield, NLP What is GATE? GATE is an NLP toolkit developed at the University of Sheffield over the last 20 years. It's open source and freely available. http://gate.ac.uk • components for language processing, e.g. parsers, machine learning tools, stemmers, IR tools, IE components for various languages... • tools for visualising and manipulating text, annotations, ontologies, parse trees, etc. • various information extraction tools • evaluation and benchmarking tools
  • 16. University of Sheffield, NLP GATE components ● Language Resources (LRs), e.g. lexicons, corpora, ontologies ● Processing Resources (PRs), e.g. parsers, generators, taggers ● Visual Resources (VRs), i.e. visualisation and editing components ● Algorithms are separated from the data, which means: – the two can be developed independently by users with different expertise. – alternative resources of one type can be used without affecting the other, e.g. a different visual resource can be used with the same language resource
  • 17. University of Sheffield, NLP ANNIE • ANNIE is GATE's rule-based IE system • It uses the language engineering approach (though we also have tools in GATE for ML) • Distributed as part of GATE • Uses a finite-state pattern-action rule language, JAPE • ANNIE contains a reusable and easily extendable set of components: – generic preprocessing components for tokenisation, sentence splitting etc – components for performing NE on general open domain text
  • 18. University of Sheffield, NLP ANNIE Modules
  • 19. University of Sheffield, NLP 19 Document with Tokens
  • 20. University of Sheffield, NLP 20 Document with Sentences
  • 21. University of Sheffield, NLP Gazetteer editor definition file entries entries for selected list
  • 22. University of Sheffield, NLP Named Entity Grammars • Hand-coded rules written in JAPE applied to annotations to identify NEs • Phases run sequentially and constitute a cascade of FSTs over annotations • Annotations from format analysis, tokeniser. splitter, POS tagger, morphological analysis, gazetteer etc. • Because phases are sequential, annotations can be built up over a period of phases, as new information is gleaned • Standard named entities: persons, locations, organisations, dates, addresses, money • Basic NE grammars can be adapted for new applications, domains and languages
  • 23. University of Sheffield, NLP Document with NEs
  • 24. University of Sheffield, NLP Coreference
  • 25. University of Sheffield, NLP Right, so we have a technology, and we have a tool to apply the technology. Now how do we do it?
  • 26. University of Sheffield, NLP Framework ● Data collection (via Twitter streaming API) ● Documents stored as JSON and processed (annotated) via GCP ● Documents indexed via MIMIR ● Search and visualisation via MIMIR/Prospector
  • 27. University of Sheffield, NLP Live streaming (coming soon) ● If we want to process the tweets in real time, we can use the Twitter streaming client to feed the incoming tweets to a message queue. ● A separate process then reads messages from the queue, annotates them and pushes them into Mimir. ● If the rate of incoming tweets exceeds the capacity of the processing side, we can simply launch more instances of the message consumer across different machines to scale the capacity. ● Query and visualisation can then be performed as before on whatever data we currently have available
  • 28. University of Sheffield, NLP Let's look at some examples (For anyone who grew up in the UK): “Here's one I prepared earlier”
  • 29. University of Sheffield, NLP DecarboNet project: what do people think about climate change? And how much do we really know about it? How do we know what's really true? “It's cold in my flat“
  • 30. University of Sheffield, NLP Political Futures Tracker Application ● Example of using the technology on a real scenario - analysing political tweets in the run-up to the UK elections ● Project funded by Nesta http://www.nesta.org.uk/ ● Series of blog posts about the project, leading up to the election, see e.g. http://www.nesta.org.uk/blog/silver-surfers-and-westminster- twitterati
  • 31. University of Sheffield, NLP Twitter collection ● collected Tweets using Twitter's “statuses/filter” streaming API ● can follow up to 5000 user IDs and receive in real time ● collected all tweets and retweets posted by these users ● also retweets of, and replies to, any tweet posted by these users
  • 32. University of Sheffield, NLP Twitter collection (2) ● Initial list of 506 UK MPs' Twitter accounts, extracted from a CSV file made available by BBC News Labs and cleaned ● Also added list of UK election candidates collected and made available at https://yournextmp.com, and updated periodically – 1,504 on 13th January 2015 – 1,811 on 2nd February 2015 ● Added 21 official party accounts ● Total number of accounts followed at 16th Feb: 1,894 – 444 MPs standing again are included in both the MP and candidate lists
  • 33. University of Sheffield, NLP Tweets per hour collected Government U-turn on fracking Douglas Carswell's accidental “Hello Kitty” tweet
  • 34. University of Sheffield, NLP Longer web documents ● Also crawled websites of UK political parties (Con, Lab, LD, Green, UKIP, BNP, SNP, PC, plus the NI parties and various smaller parties) ● Initial crawl on 28th -29th October retrieved 375MB (compressed) ● Re-crawled regularly to pick up new pages
  • 35. University of Sheffield, NLP Politician and candidate annotation ● Acquired and corrected a list of UK MPs and election candidates and their affiliations, twitter accounts and DBpedia URIs ● Converted to gazetteers so that MPs in various forms (name or twitter handle) can be recognised in tweets and annotated with the relevant info (URI, full name, constituency etc.)
  • 36. University of Sheffield, NLP Recognition of MPs / Candidates
  • 37. University of Sheffield, NLP Topic Recognition ● A set of themes was taken from the categories used on http://www.gov.uk ● For each theme, a gazetteer list was developed containing typical keywords and phrases representative of that theme ● e.g. “asylum seeker” indicates the topic “borders and immigration” ● Each list was expanded via: – automatic term recognition (based on tf.idf) over a corpus of manifestos and other political documents – manual additions ● Each list also contains potential head terms and modifiers which can be expanded into longer terms on the fly from the text during analysis stage ● e.g. “terrorist” can modify many other words (terrorist attack, terrorst threat, ...)
  • 38. University of Sheffield, NLP Topic recognition This term is found by first recognising the head word “job” from a list under the theme “employment” and matching against its root form in the text, i.e. “job”. It is then extended to include the adjectival modifier “British”, which is not present in a list anywhere.
  • 39. University of Sheffield, NLP Sentiment annotation ● Annotations are created over the whole sentence and contain the following features: – sentiment_kind: optimism / pessimism – holder: the person holding the opinion (MP's name) – holder_URI: the URI fo the holder – target: the target of the opinion, e.g. MP or topic – target_URI: if appropriate, the URI of the target – score: a positive/negative value reflecting the strength of opinion – sarcasm: yes / no (whether sarcasm is present) – sentiment_string: the main word(s) that contain sentiment ● These annotations and features will be used as input to MIMIR to facilitate analysis/aggregation
  • 40. University of Sheffield, NLP Positive opinion about science and innovation
  • 41. University of Sheffield, NLP GATE Mímir: Answering Questions Google Can't
  • 42. University of Sheffield, NLP GATE Mimir ● can be used to index and search over text, annotations, semantic metadata (concepts and instances) ● allows queries that arbitrarily mix full-text, structural, linguistic and semantic annotations ● is open source
  • 43. University of Sheffield, NLP Show me: ● all documents mentioning a temperature between 30 and 90 degrees F (expressed in any unit) ● all abstracts written in French on Patent Documents from the last 6 months which mention any form of the word “transistor” in the English abstract ● the names of the patent inventors of those abstracts ● all documents mentioning steel industries in the UK, along with their location
  • 44. University of Sheffield, NLP Search news articles for politicians born in Sheffield
  • 45. University of Sheffield, NLP Document Indexing with MIMIR ● MIMIR allows for indexing and querying text, annotations and semantic knowledge – this gives a rich source of data for analysis ● Currently we have used MIMIR to index – the raw collected text – annotations provided by Twitter and by the applications
  • 46. University of Sheffield, NLP Examples of Mimir queries on our corpus ● Get all documents which talk about the borders/immigration topic {Topic theme = "borders_and_immigration"} ● Get all documents where the author of the document is a candidate {DocumentAuthor sparql = "?c nesta:candidate ?author_uri"} ● Get all documents where the author is an MP standing for re-election for the same seat {DocumentAuthor sparql="?c nesta:candidate ?author_uri . ?c dbp-prop:mp ? author_uri"} ● Get all documents where the author is a candidate for the Sheffield Hallam constituency {DocumentAuthor sparql="<http://dbpedia.org/resource/Sheffield_Hallam_(UK_Parliament_constit uency)> nesta:candidate ?author_uri"}
  • 47. University of Sheffield, NLP What do the different parties talk about? Conservative vs Labour Transport, Europe and employment are mentioned more by Conservatives NHS is mentioned more by Labour
  • 48. University of Sheffield, NLP Topics mentioned by UKIP
  • 49. University of Sheffield, NLP Topics mentioned by the Green Party Expected topics are high on the list
  • 50. University of Sheffield, NLP How old are MPs?
  • 51. University of Sheffield, NLP Where did people tweet about the economy?
  • 52. University of Sheffield, NLP Measuring engagement about climate change ● We also used the tools to measure how engaged both MPs and the public are about the topic of climate change ● Comparison of climate change with other political topics ● Theory is that people are quite apathetic about most political topics in general ● But people are more enthusiastic about climate change, because it's something they can actively do something about ● Results showed that climate change is not frequently tweeted about by most politicians apart from the Green Party, but is in top 3 topics for most of the engagement criteria we applied (retweets, replies, sentiment, optimism, @mentions, URLs) ● Climate change tweets contained the highest number of URLs - direct engagement with additional information
  • 53. University of Sheffield, NLP Average retweets per tweet
  • 54. University of Sheffield, NLP Terms that co-occur with environmental topic
  • 55. University of Sheffield, NLP Terms that co-occur with environment topic
  • 56. University of Sheffield, NLP Terms that co-occur with immigration topic
  • 57. University of Sheffield, NLP Climate change tweets often express sentiment
  • 58. University of Sheffield, NLP Summary ● Once you have the indexed data, you can carry on doing all kinds of interesting comparisons and analysis. ● Simple analysis tools can give you pretty pictures, but you can do much more interesting things when you delve a bit deeper and make use of information not explicit in the text ● For this you need both NLP and Linked Open Data ● Our tools are all freely available and open source