Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Theses on
Human-generated Content
and Quantitative Analysis
Marco Brambilla
marco.brambilla@polimi.it
marcobrambi
Problem 1.
Knowledge
Extraction
The Answer to the Great Question...
Of Life, the Universe and Everything
Data
Information
Knowledge
WisdomContext
independ...
Overview
Knowledge Enrichment Setting
HF Entity1 HF Entity5
HF Entity2 HF Entity4
HF Entity3
LF Entity1
??
LF Entity2 LF Entity4
LF...
Emerging Knowledge Harvesting
Input (1): Domain Specific Types
Types selected by the expert
Relevant for the domain
Input (2): Seeds (emerging entities)
Known and selected by the domain expert
Belonging to an expert type
Thoroughly Descri...
Objectives
(1) Discover candidate unknown emerging entities
(2) Determine the relevance of the candidate
(3) Determine the...
Building Triples (Subject-Predicate-Object)
Relation
extraction
Subject and
object
extraction
Triples
composition
Beautifu...
Problem 2.
Social Spaces:
Volume, Consumption,
Presence, Flows
Foursquare
Checkins
Copyright©Milano-Hubproject@PolitecnicodiMilano
Not only space…
Model of social media and reality sensing
Model of social media and reality sensing
Model of social media and reality sensing
Flickr
Copyright©Milano-Hubproject@PolitecnicodiMilano
18
Cities into cities, by language
http://urbanscope.polimi.it
Foursquare
• Check-ins explicitly performed in venues all around the world
• Data set: Geo-localized Foursquare venues, co...
Google Places
Only in
the UI
(scraping)
Via API
Correlation Google Place - Foursquare
Dataset # obs Min 1Q Median 3Q Max
Grid 230 -06406 0.0744 0.3536 0.5529 0.8796
Place...
Electricity Consumption data
• Electrical hubs + mobile phone calls
• Grid-based analysis
• Which locations do people visit from where?
Statistics about nationality
Event location per cluster of users
Approach
City-scale: mobile telephone and (gross-grain geo-located)
social media data
Street/square: people counting & pro...
Problem 3.
Social Aspects of
Sw. Development
Collaborative activities on sw. development
• Development repositories
• Github
• Developer communities
• Interactions and...
• Roles of developers
Collaboration networks
• Cross-project
collaborations
• Networking
Problem 4.
Computational
Social Science
Politics, Debates and Other Societal Issues
• BREXIT
• US Political Observer
• Other cases
US Midterm
• Antonio Lopardo
Brexit
• Emre Calisir, now @ MIT Media Lab
33
Brexit
Radio Shows & Public Debates
• 60 stations real time radio transcripts
• Twitter data in some US states
• Collaboration wi...
News and News Sharing
• Understanding how and when people share pieces of news on
social network
• Profiling users against...
Problem 5.
Content
Understanding
KB and Text
• How can I use KBs for improving…
• Topic analysis
• Other general NLP tasks
• Pre-trained models available (...
Problem 6.
Digital
Humanities for
the Future
Engagement for Future Visions
• Mission-oriented policies
• Gamification and user engagement for policy directions
Perspective
THESIS
Engage
SHARED VISION
New way of engaging
citizens
The evolution of the PE spectrum
Images are the new esperanto
Gamification
• The process of game-thinking and game mechanics to engage
users and solve problems
• Turning user experienc...
KB and Text
• Possible futures
• The KB of science fiction
• Asimov, …
Alexa, Tell me a NEW Story
• NN-based approaches for generating new content
• Stories for children
• Jokes !?
Problem 7.
Data For
Moving
Mobility
Data Models for Gita
Further Problems
…
• ANN for solving differential equations ( with Harvard IACS)
• Conditional GANs for generating data for specific contex...
THANKS!
QUESTIONS?
Marco Brambilla @marcobrambi marco.brambilla@polimi.it
http://datascience.deib.polimi.it http://home.de...
Prochain SlideShare
Chargement dans…5
×

Available Data Science M.Sc. Thesis Proposals

570 vues

Publié le

Possible thesis topics available at the Data Science Lab at Politecnico di Milano, DEIB department.

Publié dans : Formation
  • Soyez le premier à commenter

Available Data Science M.Sc. Thesis Proposals

  1. 1. Theses on Human-generated Content and Quantitative Analysis Marco Brambilla marco.brambilla@polimi.it marcobrambi
  2. 2. Problem 1. Knowledge Extraction
  3. 3. The Answer to the Great Question... Of Life, the Universe and Everything Data Information Knowledge WisdomContext independence Understanding Understanding relations Understanding patterns Understanding principles
  4. 4. Overview
  5. 5. Knowledge Enrichment Setting HF Entity1 HF Entity5 HF Entity2 HF Entity4 HF Entity3 LF Entity1 ?? LF Entity2 LF Entity4 LF Entity3 ?? High Frequency Entities Low Frequency Entities ?? ?? ???? ?? Type1 Type11 Type2 Type111 Instances Types <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> ?? ?? ?? ?? ?? Seed Entity Seed Type Type of interest Legend Expert inputs Enrichment problems Property2 Relations HF - LF entities Relations LF - LF entities Typing of LF entities Extraction of new LF entities Property1 ?? ?? ?? Finding attribute values
  6. 6. Emerging Knowledge Harvesting
  7. 7. Input (1): Domain Specific Types Types selected by the expert Relevant for the domain
  8. 8. Input (2): Seeds (emerging entities) Known and selected by the domain expert Belonging to an expert type Thoroughly Described # @ w
  9. 9. Objectives (1) Discover candidate unknown emerging entities (2) Determine the relevance of the candidate (3) Determine the type of the candidate
  10. 10. Building Triples (Subject-Predicate-Object) Relation extraction Subject and object extraction Triples composition Beautiful #AngelinaJolie on the cover of @THR wears Amen silk top wear (Verb - Relation) Triples Objects Relations Subjects Beautiful #AngelinaJolie on the cover of @THR wears Amen silk top #AngelinaJ olie (Subj.) Amen silk top (Obj.)
  11. 11. Problem 2. Social Spaces: Volume, Consumption, Presence, Flows
  12. 12. Foursquare Checkins Copyright©Milano-Hubproject@PolitecnicodiMilano
  13. 13. Not only space…
  14. 14. Model of social media and reality sensing
  15. 15. Model of social media and reality sensing
  16. 16. Model of social media and reality sensing
  17. 17. Flickr Copyright©Milano-Hubproject@PolitecnicodiMilano
  18. 18. 18 Cities into cities, by language http://urbanscope.polimi.it
  19. 19. Foursquare • Check-ins explicitly performed in venues all around the world • Data set: Geo-localized Foursquare venues, collected through a query every 50m with radius >50m over: • Milan area: 20km x 17,5km • Some numbers • Total n° of venues: 90K (dirty) • Total n° of valid venues: 43K
  20. 20. Google Places Only in the UI (scraping) Via API
  21. 21. Correlation Google Place - Foursquare Dataset # obs Min 1Q Median 3Q Max Grid 230 -06406 0.0744 0.3536 0.5529 0.8796 Place 283 -0.6406 0.0654 0.3569 0.5829 0.8796
  22. 22. Electricity Consumption data • Electrical hubs + mobile phone calls • Grid-based analysis
  23. 23. • Which locations do people visit from where? Statistics about nationality
  24. 24. Event location per cluster of users
  25. 25. Approach City-scale: mobile telephone and (gross-grain geo-located) social media data Street/square: people counting & profiling IoT sensors Point of Interest: people counting sensor, WiFi log analysis, beacons and (fine grain geo- located) social media Descriptive, predictive, privacy-preserving and, when needed, real-time analysis of a variety of (fused) data sources
  26. 26. Problem 3. Social Aspects of Sw. Development
  27. 27. Collaborative activities on sw. development • Development repositories • Github • Developer communities • Interactions and contributions • Networks (social?) Machine learning on network data, Representation learning In collaboration with UOC, Barcelona
  28. 28. • Roles of developers
  29. 29. Collaboration networks • Cross-project collaborations • Networking
  30. 30. Problem 4. Computational Social Science
  31. 31. Politics, Debates and Other Societal Issues • BREXIT • US Political Observer • Other cases
  32. 32. US Midterm • Antonio Lopardo
  33. 33. Brexit • Emre Calisir, now @ MIT Media Lab 33
  34. 34. Brexit
  35. 35. Radio Shows & Public Debates • 60 stations real time radio transcripts • Twitter data in some US states • Collaboration with MIT & cortico.ai
  36. 36. News and News Sharing • Understanding how and when people share pieces of news on social network • Profiling users against possible risks (fake news, superficial behaviour)
  37. 37. Problem 5. Content Understanding
  38. 38. KB and Text • How can I use KBs for improving… • Topic analysis • Other general NLP tasks • Pre-trained models available (language models, …) • https://www.aaai-make.info/
  39. 39. Problem 6. Digital Humanities for the Future
  40. 40. Engagement for Future Visions • Mission-oriented policies • Gamification and user engagement for policy directions
  41. 41. Perspective THESIS
  42. 42. Engage SHARED VISION New way of engaging citizens The evolution of the PE spectrum
  43. 43. Images are the new esperanto
  44. 44. Gamification • The process of game-thinking and game mechanics to engage users and solve problems • Turning user experience into a game can produce behavior change
  45. 45. KB and Text • Possible futures • The KB of science fiction • Asimov, …
  46. 46. Alexa, Tell me a NEW Story • NN-based approaches for generating new content • Stories for children • Jokes !?
  47. 47. Problem 7. Data For Moving Mobility
  48. 48. Data Models for Gita
  49. 49. Further Problems
  50. 50. … • ANN for solving differential equations ( with Harvard IACS) • Conditional GANs for generating data for specific contexts (with Harvard IACS)
  51. 51. THANKS! QUESTIONS? Marco Brambilla @marcobrambi marco.brambilla@polimi.it http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi

×