This document discusses 7 problems related to human-generated content and quantitative analysis. The first problem involves knowledge extraction from data, information, and knowledge. The second problem examines analyzing social spaces using data from services like Foursquare and Flickr. The third problem looks at analyzing social aspects of software development using data from GitHub and developer communities. The fourth problem is applying machine learning to computational social science topics like politics, debates, and societal issues. The fifth problem relates to using knowledge bases to help with content understanding tasks. The sixth problem discusses using digital humanities to engage citizens in envisioning future policies and visions. The seventh problem covers developing data models to help with mobility services.
3. The Answer to the Great Question...
Of Life, the Universe and Everything
Data
Information
Knowledge
WisdomContext
independence
Understanding
Understanding relations
Understanding patterns
Understanding principles
7. Input (1): Domain Specific Types
Types selected by the expert
Relevant for the domain
8. Input (2): Seeds (emerging entities)
Known and selected by the domain expert
Belonging to an expert type
Thoroughly Described
# @ w
9. Objectives
(1) Discover candidate unknown emerging entities
(2) Determine the relevance of the candidate
(3) Determine the type of the candidate
10. Building Triples (Subject-Predicate-Object)
Relation
extraction
Subject and
object
extraction
Triples
composition
Beautiful
#AngelinaJolie on
the cover of @THR
wears Amen silk top
wear
(Verb -
Relation)
Triples
Objects
Relations
Subjects
Beautiful
#AngelinaJolie on the
cover of @THR wears
Amen silk top
#AngelinaJ
olie
(Subj.)
Amen silk
top
(Obj.)
19. Foursquare
• Check-ins explicitly performed in venues all around the world
• Data set: Geo-localized Foursquare venues, collected through a query
every 50m with radius >50m over:
• Milan area: 20km x 17,5km
• Some numbers
• Total n° of venues: 90K (dirty)
• Total n° of valid venues: 43K
21. Correlation Google Place - Foursquare
Dataset # obs Min 1Q Median 3Q Max
Grid 230 -06406 0.0744 0.3536 0.5529 0.8796
Place 283 -0.6406 0.0654 0.3569 0.5829 0.8796
25. Approach
City-scale: mobile telephone and (gross-grain geo-located)
social media data
Street/square: people counting & profiling IoT
sensors
Point of Interest:
people counting
sensor, WiFi log analysis,
beacons and (fine grain geo-
located)
social media
Descriptive, predictive, privacy-preserving and, when needed, real-time analysis
of a variety of (fused) data sources
27. Collaborative activities on sw. development
• Development repositories
• Github
• Developer communities
• Interactions and contributions
• Networks (social?)
Machine learning on
network data,
Representation learning
In collaboration with UOC, Barcelona
35. Radio Shows & Public Debates
• 60 stations real time radio transcripts
• Twitter data in some US states
• Collaboration with MIT & cortico.ai
36. News and News Sharing
• Understanding how and when people share pieces of news on
social network
• Profiling users against possible risks (fake news, superficial
behaviour)
38. KB and Text
• How can I use KBs for improving…
• Topic analysis
• Other general NLP tasks
• Pre-trained models available (language models, …)
• https://www.aaai-make.info/
44. Gamification
• The process of game-thinking and game mechanics to engage
users and solve problems
• Turning user experience into a game can produce behavior
change
45. KB and Text
• Possible futures
• The KB of science fiction
• Asimov, …
46. Alexa, Tell me a NEW Story
• NN-based approaches for generating new content
• Stories for children
• Jokes !?