SlideShare une entreprise Scribd logo
1  sur  36
Part I → Part II → Part III → Part IV → Part V

         Data and Software
Part II: Outline
       Types of datasets

       Propagation of information “memes”

       Propagation of other actions

       Synthetic datasets

       Software tools


2
Contents of a dataset
       Action traces
        ◦ Sometimes not obvious (e.g. gaining weight
          can be an action)
        ◦ Propagation explicitly / implicitly attributed

       Social network
        ◦ Explicitly declared / Implicitly inferred
        ◦ Symmetrical / Non-symmetrical


3
Data availability limits research
       Often you have to pick two of these


                                 Includes
                                   Social
                                 Network




                   Is Publicly              Includes
                   Available                 Action
                                             Traces



4
Classification: according to
    availability
       Proprietary, impossible or very hard to
        reproduce (e.g. shopping history in e-
        commerce)
        ◦ Increasingly being rejected in IR, DM
          communities

       Proprietary, reproducible (e.g. web crawl of
        a sub-set of public websites)

       Existing open dataset

       New open dataset
5
Propagation of Information
           “Memes”




6
Memes and “Internet Memes”




7
Microblogging data
       Providers: Twitter, Identi.ca, Diaspora, etc.
        ◦ Directly or through data re-sellers

       Actions: posting a message

       Connections: explicitly declared, non-
        symmetrical

       Propagations: explicitly linked (in principle),
        but implicitly linked (in practice) due to
        client implementations
8
Extracting info. propagations
       Idea: start from a large corpus and then
        extract information propagations
        ◦ Blogs, news articles, academic papers, generic
          web pages, etc.
        ◦ Simple in theory, extremely difficult in practice

       Looking for citations doesn't work
        ◦ People on the web seldom attribute explicitly

       Keywords and phrases
        ◦ Usually end up with a mixture of too broad (e.g.
          stylistic idioms) and/or too narrow (e.g. one
          specific copy of a news item) “topics”
9
Using #hashtags and URLs
        Twitter: #hashtags and URLs

        With some exceptions
         ◦ #hashtags are too broad,
         ◦ URLs are too narrow

        Let's propose two methods that can
         alleviate these problems ...


10
Extracting info. propagations:
     Meme tracker
        Public dataset: http://memetracker.org/

        Tracks “mutated” key phrases in a
         document collection, example cluster:
           the fundamentals of our economy are strong
           the fundamentals of the economy are strong
           i promise you we will never put america in this position again we will clean up wall street
           the fundamentals of our economy are strong but these are very very difficult times and i
           promise you we will never put america in this position again
           but these are very very difficult times



        No a-priori network exists. Inference
         methods are used.
11                                                                      [Leskovec et al. KDD 2009]
Extracting info. propagations:
     Meme tracker




12                          [Leskovec et al. KDD 2009]
Extracting info. propagation:
     Trending topics
        Method
         ◦ Look for “bursty” (spiky, trending) topics,
           represented e.g. as a collection of keywords
         ◦ Track the propagation of those topics

        Rely on a proven method for burst
           detection
         ◦ The tricky part is not to detect the burst, but
           to represent it (e.g. as a query) e.g. Haiti
           earthquake tweets might not include “Haiti”
           or “earthquake”
13                             [Mathioudakis and Koudas, SIGMOD 2010]
Extracting information
     propagations: Other methods
        Internet chain letters; look for copies
         online of petition letters




14                           [Liben-Nowell and Kleinberg, PNAS 2008]
Sampling issues
   Issues with recall along information
    cascades
    ◦ e.g. twitter stream 1% sample gardenhose




                                [Sadikov et al. WSDM 2011]
Propagation of Other Actions




16
Consuming media and products
        Media consumption/appraisal platforms
         ◦ Examples: Flixter / Last.fm / GoodReads
           Action: rating, watching, listening or reading a
            movie, a song, or a book
           Connections: Explicit friendships
         ◦ Propagations: usually implicitly linked unless
           “recommend to a friend” feature is used and
           publicly available

        Product recommendations
         ◦ Example: @cosme cosmetics
           recommendations
17
@cosme recommendations




18                [Matsuo and Yamamoto, WWW 2009]
Cross-provider data
        One provides the network, the other the
         actions

        MSN + Bing: Social network is MSN IM,
         actions are searches

        YIM + YMovies



19                [Singla and Richardson, WWW 2008] [Goyal et al. CIKM 2008]
Phone calls
        Social networks are calls, actions are
         leaving the company (“churning”)

        Some call datasets are available for
         academic labs (not for industrial ones)




20                                  [Dasgupta et al. EDBT 2008]
Phone calls




21                 [Senseable City Lab, MIT, 2009]
Community membership
        DBLP/Arnetminer
         ◦ Social network is co-authorship
         ◦ Action is publishing in a conference or
           publishing on a topic

        Livejournal / Flickr
         ◦ Social network is friendship graph
         ◦ Action is joining a community/group

        Bloglines
         ◦ Action is subscribing to a rss feed
22
Other datasets
        Flickr
         ◦ Explicit friendship, action is (1) favoring a
           photo or (2) using a tag

        Digg/Reddit votes
           Explicit friendship, action is vote-up




23
Off-line datasets
        Participation of women in 14 social
         activities over 9 months in US south (n=18)

        Romantic network in a high school (n=288)

        Medical records during 32 years (n=12,067)

        Network only
         ◦ Zachary's Karate club
         ◦ Presumed acquaintances links between terrorist
           suspects (n=74, n=63 if main CC is used)
24
“Chains of Affection”




25               [Bearman et al. Amer. Journal of Sociology 2004]
“Chains of Affection”


      Probably
      not a future
      computer
      scientist 




26
Size proportional to BMI, yellow fill indicates obesity. Blue border=men, Red border=women

27                        [Christakis and Fowler, New England Journal of Medicine 2007]
Synthetic Datasets




28
Network data are widely available
        Domains
         ◦ Online social networks: slashdot, epinions, …
         ◦ Communication: internet as, p2p, roads, …
         ◦ Collaboration: scientists, actors, jazz
           musicians, wikipedia editors, ...
         ◦ Citations: web graphs, academic
           publications, patents, …
         ◦ References: linked data in
           freebase/dbpedia, protein interactions,
           metabolic networks, ...
29
Publishing your own datasets
        Document every step of sampling, filtering,
         processing methodology
        CC0 (public domain) data releases
        Ad-hoc data releases: look at items in example
         agreements (duration, purpose, warranties, item
         deletion policies, etc.)
        Privacy concerns
        It may take some extra work, but remember that
         it is also in YOUR interest that your data is used
         by others
30
Software




31
Graph software Tools
        Software
         ◦ SNAP [GPL] Gephi [GPL, gui]
         ◦ Pajek [Free for non-commercial use, Windows, gui
         ◦ Webgraph [GPL] Graphviz [GPL]

        Graph generation, transformation,
         ◦ SNAP, Gephi, Pajek, Webgraph [compress], ...

        Subgraphs: clustering, connected components, etc. Node
         metrics: centrality, local clustering coeff.
         ◦ SNAP, Gephi, Pajek

        Graph visualization: Gephi, Pajek, Graphviz

        Other:
         https://sites.google.com/site/ucinetsoftware/downloads
32
Propagation software tools
        SPINE software
         ◦ IC model
         ◦ Inference with given social network
         ◦ Sparsification of influence models

        Internet network simulator

        Ask authors, some software is known to
         be available on request

33
Visualization




34
Visualization




35                   [Project Cascade by NYT Labs 2011]
Key takeaways of part II
        Data availability affects our research

        Current alternatives are not good
         ◦ Results on proprietary data sources are not
           reproducible
         ◦ Synthetic information propagations might
           not be realistic

        Software is not readily available

        This is something to work on collectively!
36

Contenu connexe

Tendances

Maura.fujieh
Maura.fujiehMaura.fujieh
Maura.fujiehNASAPMC
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsAndrew Hoffman
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worldsBryan Alexander
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...Lynn Connaway
 
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19SocMediaFin - Joyce Sullivan
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 
Micro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraMicro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraThe New School
 
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The  Information Gate Keepers Week# 3 Global Net ActivismThe End Run Around The  Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The Information Gate Keepers Week# 3 Global Net ActivismThe New School
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
How Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationHow Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationMichael Coghlan
 
Hatfieldskip
HatfieldskipHatfieldskip
HatfieldskipNASAPMC
 
Personalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebPersonalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebLora Aroyo
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overviewChris Taggart
 
Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Cameron Neylon
 
Eduwebinar: Our Everyday Tools for Success
Eduwebinar:  Our Everyday Tools for SuccessEduwebinar:  Our Everyday Tools for Success
Eduwebinar: Our Everyday Tools for SuccessJudy O'Connell
 
Social Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusSocial Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusThe New School
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like ThisDavid Millard
 
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Aaron Sloman
 

Tendances (20)

Arc 08 12 10 final short
Arc 08 12 10 final shortArc 08 12 10 final short
Arc 08 12 10 final short
 
Maura.fujieh
Maura.fujiehMaura.fujieh
Maura.fujieh
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & Stats
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worlds
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...
 
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
On Line Basics LinkedIn Facebook Twitter_Joyce M Sullivan NYPL SIBL 2010 05 19
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
Micro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the AgoraMicro-Blogging, Futurism, and the Agora
Micro-Blogging, Futurism, and the Agora
 
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The  Information Gate Keepers Week# 3 Global Net ActivismThe End Run Around The  Information Gate Keepers Week# 3 Global Net Activism
The End Run Around The Information Gate Keepers Week# 3 Global Net Activism
 
Week3
Week3 Week3
Week3
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
How Web2 Is Revolutionising Education
How Web2 Is Revolutionising EducationHow Web2 Is Revolutionising Education
How Web2 Is Revolutionising Education
 
Hatfieldskip
HatfieldskipHatfieldskip
Hatfieldskip
 
Personalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebPersonalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) Web
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?Open Access, Open Data. Open Research?
Open Access, Open Data. Open Research?
 
Eduwebinar: Our Everyday Tools for Success
Eduwebinar:  Our Everyday Tools for SuccessEduwebinar:  Our Everyday Tools for Success
Eduwebinar: Our Everyday Tools for Success
 
Social Media. Introduction to the Syllabus
Social Media. Introduction to the SyllabusSocial Media. Introduction to the Syllabus
Social Media. Introduction to the Syllabus
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like This
 
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
Helping Darwin: How to think about evolution of consciousness (Biosciences ta...
 

En vedette

Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionCarlos Castillo (ChaTo)
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsCarlos Castillo (ChaTo)
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphCarlos Castillo (ChaTo)
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural DisastersCarlos Castillo (ChaTo)
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationCarlos Castillo (ChaTo)
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...IIIT Hyderabad
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsDiarta
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...Artificial Intelligence Institute at UofSC
 
Chapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsChapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsDr. Ankit Kesharwani
 

En vedette (20)

Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of News
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural Disasters
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing Communications
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of Wikipedia
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Chapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing CommunicationsChapter 17 - Designing and Managing Integrated Marketing Communications
Chapter 17 - Designing and Managing Integrated Marketing Communications
 

Similaire à Kdd12 tutorial-inf-part-ii

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lora Aroyo
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abelsfpehar
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXLMarc Smith
 
Cl15 a koene_ca_sma
Cl15 a koene_ca_smaCl15 a koene_ca_sma
Cl15 a koene_ca_smaAnsgar Koene
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Marc Baizman
 
Streamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudStreamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudDebra Askanase
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
 
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsUsing DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsLisa Dyer
 

Similaire à Kdd12 tutorial-inf-part-ii (20)

Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Conclusions: Summary and Outlook
Conclusions: Summary and OutlookConclusions: Summary and Outlook
Conclusions: Summary and Outlook
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abels
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
 
Cl15 a koene_ca_sma
Cl15 a koene_ca_smaCl15 a koene_ca_sma
Cl15 a koene_ca_sma
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!Streamlining Nonprofit Organizations - It's all About the Cloud!
Streamlining Nonprofit Organizations - It's all About the Cloud!
 
Streamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the CloudStreamlining Nonprofit Organizations: It's All About the Cloud
Streamlining Nonprofit Organizations: It's All About the Cloud
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge SystemsUsing DITA-wiki Hybrid Solutions For Better Knowledge Systems
Using DITA-wiki Hybrid Solutions For Better Knowledge Systems
 
Horizon
HorizonHorizon
Horizon
 

Plus de Laks Lakshmanan

Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Laks Lakshmanan
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousLaks Lakshmanan
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsLaks Lakshmanan
 

Plus de Laks Lakshmanan (6)

cuhk-fb-mi-talk.pdf
cuhk-fb-mi-talk.pdfcuhk-fb-mi-talk.pdf
cuhk-fb-mi-talk.pdf
 
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
 
SDM 2019 Keynote
SDM 2019 KeynoteSDM 2019 Keynote
SDM 2019 Keynote
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Kdd12 tutorial-inf-part-ii

  • 1. Part I → Part II → Part III → Part IV → Part V Data and Software
  • 2. Part II: Outline  Types of datasets  Propagation of information “memes”  Propagation of other actions  Synthetic datasets  Software tools 2
  • 3. Contents of a dataset  Action traces ◦ Sometimes not obvious (e.g. gaining weight can be an action) ◦ Propagation explicitly / implicitly attributed  Social network ◦ Explicitly declared / Implicitly inferred ◦ Symmetrical / Non-symmetrical 3
  • 4. Data availability limits research  Often you have to pick two of these Includes Social Network Is Publicly Includes Available Action Traces 4
  • 5. Classification: according to availability  Proprietary, impossible or very hard to reproduce (e.g. shopping history in e- commerce) ◦ Increasingly being rejected in IR, DM communities  Proprietary, reproducible (e.g. web crawl of a sub-set of public websites)  Existing open dataset  New open dataset 5
  • 8. Microblogging data  Providers: Twitter, Identi.ca, Diaspora, etc. ◦ Directly or through data re-sellers  Actions: posting a message  Connections: explicitly declared, non- symmetrical  Propagations: explicitly linked (in principle), but implicitly linked (in practice) due to client implementations 8
  • 9. Extracting info. propagations  Idea: start from a large corpus and then extract information propagations ◦ Blogs, news articles, academic papers, generic web pages, etc. ◦ Simple in theory, extremely difficult in practice  Looking for citations doesn't work ◦ People on the web seldom attribute explicitly  Keywords and phrases ◦ Usually end up with a mixture of too broad (e.g. stylistic idioms) and/or too narrow (e.g. one specific copy of a news item) “topics” 9
  • 10. Using #hashtags and URLs  Twitter: #hashtags and URLs  With some exceptions ◦ #hashtags are too broad, ◦ URLs are too narrow  Let's propose two methods that can alleviate these problems ... 10
  • 11. Extracting info. propagations: Meme tracker  Public dataset: http://memetracker.org/  Tracks “mutated” key phrases in a document collection, example cluster: the fundamentals of our economy are strong the fundamentals of the economy are strong i promise you we will never put america in this position again we will clean up wall street the fundamentals of our economy are strong but these are very very difficult times and i promise you we will never put america in this position again but these are very very difficult times  No a-priori network exists. Inference methods are used. 11 [Leskovec et al. KDD 2009]
  • 12. Extracting info. propagations: Meme tracker 12 [Leskovec et al. KDD 2009]
  • 13. Extracting info. propagation: Trending topics  Method ◦ Look for “bursty” (spiky, trending) topics, represented e.g. as a collection of keywords ◦ Track the propagation of those topics  Rely on a proven method for burst detection ◦ The tricky part is not to detect the burst, but to represent it (e.g. as a query) e.g. Haiti earthquake tweets might not include “Haiti” or “earthquake” 13 [Mathioudakis and Koudas, SIGMOD 2010]
  • 14. Extracting information propagations: Other methods  Internet chain letters; look for copies online of petition letters 14 [Liben-Nowell and Kleinberg, PNAS 2008]
  • 15. Sampling issues  Issues with recall along information cascades ◦ e.g. twitter stream 1% sample gardenhose [Sadikov et al. WSDM 2011]
  • 16. Propagation of Other Actions 16
  • 17. Consuming media and products  Media consumption/appraisal platforms ◦ Examples: Flixter / Last.fm / GoodReads  Action: rating, watching, listening or reading a movie, a song, or a book  Connections: Explicit friendships ◦ Propagations: usually implicitly linked unless “recommend to a friend” feature is used and publicly available  Product recommendations ◦ Example: @cosme cosmetics recommendations 17
  • 18. @cosme recommendations 18 [Matsuo and Yamamoto, WWW 2009]
  • 19. Cross-provider data  One provides the network, the other the actions  MSN + Bing: Social network is MSN IM, actions are searches  YIM + YMovies 19 [Singla and Richardson, WWW 2008] [Goyal et al. CIKM 2008]
  • 20. Phone calls  Social networks are calls, actions are leaving the company (“churning”)  Some call datasets are available for academic labs (not for industrial ones) 20 [Dasgupta et al. EDBT 2008]
  • 21. Phone calls 21 [Senseable City Lab, MIT, 2009]
  • 22. Community membership  DBLP/Arnetminer ◦ Social network is co-authorship ◦ Action is publishing in a conference or publishing on a topic  Livejournal / Flickr ◦ Social network is friendship graph ◦ Action is joining a community/group  Bloglines ◦ Action is subscribing to a rss feed 22
  • 23. Other datasets  Flickr ◦ Explicit friendship, action is (1) favoring a photo or (2) using a tag  Digg/Reddit votes  Explicit friendship, action is vote-up 23
  • 24. Off-line datasets  Participation of women in 14 social activities over 9 months in US south (n=18)  Romantic network in a high school (n=288)  Medical records during 32 years (n=12,067)  Network only ◦ Zachary's Karate club ◦ Presumed acquaintances links between terrorist suspects (n=74, n=63 if main CC is used) 24
  • 25. “Chains of Affection” 25 [Bearman et al. Amer. Journal of Sociology 2004]
  • 26. “Chains of Affection” Probably not a future computer scientist  26
  • 27. Size proportional to BMI, yellow fill indicates obesity. Blue border=men, Red border=women 27 [Christakis and Fowler, New England Journal of Medicine 2007]
  • 29. Network data are widely available  Domains ◦ Online social networks: slashdot, epinions, … ◦ Communication: internet as, p2p, roads, … ◦ Collaboration: scientists, actors, jazz musicians, wikipedia editors, ... ◦ Citations: web graphs, academic publications, patents, … ◦ References: linked data in freebase/dbpedia, protein interactions, metabolic networks, ... 29
  • 30. Publishing your own datasets  Document every step of sampling, filtering, processing methodology  CC0 (public domain) data releases  Ad-hoc data releases: look at items in example agreements (duration, purpose, warranties, item deletion policies, etc.)  Privacy concerns  It may take some extra work, but remember that it is also in YOUR interest that your data is used by others 30
  • 32. Graph software Tools  Software ◦ SNAP [GPL] Gephi [GPL, gui] ◦ Pajek [Free for non-commercial use, Windows, gui ◦ Webgraph [GPL] Graphviz [GPL]  Graph generation, transformation, ◦ SNAP, Gephi, Pajek, Webgraph [compress], ...  Subgraphs: clustering, connected components, etc. Node metrics: centrality, local clustering coeff. ◦ SNAP, Gephi, Pajek  Graph visualization: Gephi, Pajek, Graphviz  Other: https://sites.google.com/site/ucinetsoftware/downloads 32
  • 33. Propagation software tools  SPINE software ◦ IC model ◦ Inference with given social network ◦ Sparsification of influence models  Internet network simulator  Ask authors, some software is known to be available on request 33
  • 35. Visualization 35 [Project Cascade by NYT Labs 2011]
  • 36. Key takeaways of part II  Data availability affects our research  Current alternatives are not good ◦ Results on proprietary data sources are not reproducible ◦ Synthetic information propagations might not be realistic  Software is not readily available  This is something to work on collectively! 36