Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Lecture 4: How can we MINE, ANALYSE & VISUALISE the Social Web? (2014)
1. Social Web
2014
Lecture IV: How can we MINE, ANALYSE &
the Social Web? (1)
Lora Aroyo
The Network Institute
VU University Amsterdam
Social Web 2014, Lora Aroyo!
2. The Age of BIG Data
• 25 billion tweets on Twitter in 2010, by 175
million users
• 360 billion pieces of contents on Facebook
in 2010, by 600 million different users
• 35 hours of videos uploaded to YouTube
every minute
• 130 million photos uploaded to flickr per
month
Social Web 2014, Lora Aroyo!
5. Why?
enormous wealth of data = lots of insights
•
•
•
•
•
•
insights in users’ daily lives and activities
insights in history
insights in politics
insights in communities
insights in trends
insights in businesses & brands
Social Web 2014, Lora Aroyo!
6. Why?
enormous wealth of data = lots of insights
• who uploads/talks? (age, gender, nationality,
community, etc.)
• what are the trending topics? when?
• what else do these users like? on which platform?
• who are the most/least active users?
• ..…
Social Web 2014, Lora Aroyo!
7. This doesn’t work
Image: http://www.co.olmsted.mn.us/prl/
propertyrecords/RecordingDocuments/
PublishingImages/forms.jpg
Social Web 2014, Lora Aroyo!
34. The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
35. The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
36. The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
37. The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
38. The Rise of the Data Scientist
Data Geeks Skills:
Statistics
Data munging
Visualisation
Social Web 2014, Lora Aroyo!
39. The Rise of the Data Scientist
http://radar.oreilly.com/2010/06/what-is-data-science.html
Social Web 2014, Lora Aroyo!
40. Data Science
• Data Science enables the creation of data products
• Data products are applications that acquire their
value from the data, and create more data as a result.
• Users are in a feedback loop: they constantly provide
information about the products they use, which gets
used in the data product.
Social Web 2014, Lora Aroyo!
43. Popular Data Products
Data Science is about
building products
not just answering questions
Social Web 2014, Lora Aroyo!
44. Popular Data Products
empower the others
to their own analysis
empower the others to
use the data
Social Web 2014, Lora Aroyo!
45. Data Mining 101
Data mining is the exploration & analysis of
large quantities of data
in order to discover valid, novel, potentially useful,
& ultimately understandable patterns in data
(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data
Mining Conf. and Toon Calders’ slides)
Social Web 2014, Lora Aroyo! http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.jpg
46. Data Mining 101
Databases
Statistics
Artificial
Intelligence
Social Web 2014, Lora Aroyo!
• Data input &
exploration
• Preprocessing
• Data mining
algorithms
• Evaluation &
Interpretation
47. Data Input & Exploration
• What data do I
need to answer
question X?
• What variables
are in the data?
• Basic stats of my
data?
“LikeMiner”
Social Web 2014, Lora Aroyo!
48. Preprocessing
“LikeMiner”
• Cleanup!
• Choose a suitable data model
• What happens if you integrate data from multiple sources?
• Reformat your data
Social Web 2014, Lora Aroyo!
49. Data Mining Algorithms
• Classification: Generalising a known structure &
apply to new data
• Association: Finding relationships between
variables
• Clustering: Discovering groups and structures in
data
Social Web 2014, Lora Aroyo!
50. Mining in “LikeMiner”
• Filter users by interests
• Construct user graphs
• PageRank on graphs to
mine representativeness
• Result: set of influential users
• Compare page topics to
user interests to find pages
most representative for
topics
Social Web 2014, Lora Aroyo!
51. Evaluation & Interpretation
What does the pattern I found mean?
• Pitfalls:
• Meaningless Discoveries
• Implication ≠ Causality (Intensive care -> death)
• Simpson’s paradox
• Data Dredging
• Redundancy
• No New Information
• Overfitting
• Bad Experimental Setup
Social Web 2014, Lora Aroyo!
58. Mining Social Web Data
source: http://kunau.us/wp-content/uploads/
2011/02/Screen-shot-2011-02-09at-9.03.46-PM-w600-h900.png
Social Web 2014, Lora Aroyo!
65. Assignment 2: Semantic Markup
• Part I: enrich/create a Web page with semantic markup
• Step 1: Mark up two different Web pages with the appropriate markup describing properties of
at least people, relationships to other people, locations, some temporally related data and
some multimedia.You can also try out tools such as Google Markup Helper
• Step 2:Validate your semantic markup. Use existing validator.
• Step 3: Explain why you chose particular markups. Compare the advantages and disadvantages
of the different markups. Include screenshots from validators.
!
• Part II: analyse other team’s Web page markup - as a consumer & as a publisher
• Step 1: Perform evaluation and report your findings (consider findability or content extraction)
• Step 2: Support your critique with examples of how the semantic markup could be improved.
• In introductory section explain what semantic markup is, what it is for, what it looks like etc.
• Support your choices and explanations with appropriate literature references.
• 5 pages (excluding screen shots).
• Other group’s evaluation details in appendix.
!
• Deadline: 4 March 23:59
Social Web 2014, Lora Aroyo!
http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg
66. Final Assignment:
Your SocWeb App
•
•
•
•
•
Create your own Social Web app (in a group)
Use structured data, entity relations, data analysis, visualisation
Write individual report on one of the main aspects of your app
Pitch your app idea before finalising: 13 March, during Hands-on
Submit: 28 March 23:59
Social WebImage Lora Aroyo!
2014, Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpg
67. Hands-on Teaser
• Build your own recommender system 101
• Recommend pages on del.icio.us
• Recommend pages to your Facebook friends
image
Social Web 2014, Lora Aroyo! source: http://www.flickr.com/photos/bionicteaching/1375254387/