Presentation for the weekly Artificial Intelligence meeting of the VU. It covers work on affect analysis (master project), and planned work on ranking of Linked Data
1. AFFECT ANALYSIS OF DUTCH SOCIAL
MEDIA
AND
RANKING OF QUERY RESULTS OVER
LINKED DATA
Laurens Rietveld
2. Master Project Background
Affect analysis of Dutch social media
Finished July 2010
VU (Stefan)
GfK Daphne
Marketing Research
Online dashboard
Data Data
Analysis
collection Processing
Not involved yet in webmining
Business case: National Railway Company (NS)
3. Project Background
Affect Analysis
Affect: experience of feeling or emotion[1]
Multiple measurements
Physiological
Behavioral
Vocal
Linguistic
[1] W. Huitt, The Affective System
4. Project Background
Affect Analysis
What is online affect analysis
Detect emotions on web pages
Types of emotions[2]:
Love
Joy
Surprise
Anger
Sadness
Fear
[2] W. Parrott, Emotions in Social Psychology
5. Project Background
Affect Analysis
Main problems
Unstructured data
Internet (html)
Text
Domain dependencies
“Goread the book” positive in book reviews, negative
in movie reviews
Ambivalence
Text
Emotion
6. Project Background
Dutch Social Media
Used Social Media Types:
Blogs (www.blogspot.com)
Online news item reactions (www.fok.nl)
Micro-blogs (www.twitter.com)
7. Project Background
Crowd Sourcing
Problems:
Affect analysis needs training data
Annotating data is time-consuming
Annotate every domain
Normally done by researcher
Solution: Crowd Sourcing
Mechanical Turks
Outsourcing simple tasks to large community
+ -
Many tasks English only
Quick Risk of lower quality
Cheap Unethical (debatably)
9. Research Questions
Is it possible to apply crowd-sourcing to affect
analysis of Dutch social media
Are there differences between social media types
in affect analysis
10. Results
Inter annotator agreement: low
Neutral outvotes emotion
Possible causes:
Missing sentence context
Too few annotators
Noise introduced by translation
11. Results
Period Event
July 2007 Problems in the payment system of ticket automats
January 2009 Required chip card payment method for students
December 2009 Train and railway malfunctions due to snow
February 2010 Filthy train stations due to cleaning crew strikes
All social media
9%
8%
7%
% of all documents
6%
5%
4%
3%
2%
1%
0%
-1%
Period
Joy Surprise Anger Sadness
12. Future work
Other list of emotions
Improve annotation process
More voting
Use other strategies for annotation tasks
Not sentence annotation but paragraph/document
Different social media types, different feature-
extraction/classifier/annotation strategies
13. AFFECT ANALYSIS OF DUTCH SOCIAL
MEDIA
AND
RANKING OF QUERY RESULTS OVER
LINKED DATA
Laurens Rietveld
16. Data2Semantics
Wicherts JM, Bakker M, Molenaar D, 2011
Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.
PLoS ONE 6(11)
17. Data2Semantics
Provide semantic infrastructure for e-Science
How to
share, publish, access, analyze, interpret and
reuse data?
Querying
Ranking
Information utility
Enriched publications
Provenance
Annotation/interpretation
19. Clinical Decision Support
Linked Data
Clinical evidence
e.g. CT report
Hospital
AERS
CDS tools
CDS tools
Patient Profile
EMR LIS Elsevier-published
Clinical Guideline
20. My Research
Ranking
http://dbpedia.org/fct/ http://google.com
21. My Research
Ranking
1. Relevance
No proper „PageRank‟ equivalent for semantic web
Heterogeneous and imprecise data
2. Ordering
Performance
22. Relevance
What query results are most relevant?
Semantic web comes with implicit orderings.
Possible indicators:
Which ontologies are used more often?
What can we say about these ontologies?
Which query results are semantically similar?
Which query results can I trust?
24. Ordering
Related work: Sara Magliacane
SPARQL-Rank
1
traditional
1
Slice Slice
1 1
1
13205
Order Join
13205 95 1
Ran BG
Join kJoi
P
n ?product
438634 13205 30 29 ?producer
Join BG Ran Ran ?offer
P k k ?price
646 679 ?product 646 679
?producer BG BG
BG BG ?offer
P P ?price
P P
?product ?producer
?product ?producer
?rating ?popularity
?rating ?popularity
25. Current Question
What if reasoning is required to materialize
information?
Top-k Closure (Stefan Schlobach)
Avoid full materialization while still being complete
Vb materialisatie
Physiological:hartslagBehavioral: dmv questionnaires Vocal: stemhoogteLinguistic: analyseren van text
Master project based on rankings, made explicit for a certain application (affect analysis)Thesis written more from application view. PhD research more from data
Information utility: suitability of dataset in answering a query (based on complexity measures)
1795 onwardsInformation on municipalities, occupations, housing, etc
AERS: adverse event reporting systemCT: clinical trialCDS: clinical decision supportWhere are the enrichted publications of elsevier?
Mention implicit vs. explicit
Example: online storeSimplified example
Avoid full joins
In other words: what if we need to apply rules, to get all the values to rank on