SlideShare une entreprise Scribd logo
1  sur  80
Télécharger pour lire hors ligne
9th European Summer School in Information Retrieval September 4th, 2013
http://bit.ly/ESSIR13IRSocMedia
IR and Social Media
Arjen P. de Vries
arjen@acm.org
Centrum Wiskunde & Informatica
Delft University of Technology
Spinque B.V.
On slideshare,
IR = Investor Relations
Social Media
Noun
social media (plural only)
Interactive forms of media that allow users
to interact with and publish to each other,
generally by means of the Internet.
The early 21st century saw a huge increase in social
media thanks to the widespread availability of the
Internet.
http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html
Social Media
 “Social bookmarking” sites
 “User generated content”
 Images (flickr) and videos (youtube, vimeo), but also
blogs
 Social network services
 Twitter, facebook
Not just one beast!
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
IR and Social Media?
Red Hot Chili Peppers
“Rock group” in
author’s metadata...
Organisation in
groups may help
disambiguate
query!
More implicit
metadata...
Information Science
“Search for the fundamental knowledge
which will allow us to postulate and utilize
the most efficient combination of [human
and machine] resources”
 M.E. Senko. Information systems: records, relations, sets, entities,
and things. Information systems, 1(1):3–13, 1975.
Core Questions
 How to represent information?
 The information need and search requests
 The objects to be shown in response to an
information request
 How to match information
representations?
IR and Social Media
 Richer information representations!
Richer representations
 User profiles
 User name, full name, description, image,
homepage url, etc.
 Connections between users
 Networks of friends, followers, etc
 Comments/reactions
 Endorsing and sharing
Q: Web ancient social media?
(C) 2008, The New York Times Company
Anchor tekst:
“continue reading”
Not a lot of info
to represent
the page…
Een fan’s hyves page:
Kyteman's HipHop Orchestra: www.kyteman.com
Kaartverkoop luxor theater:
22 mei - Kyteman's hiphop Orkest - www.kyteman.com
Kluun.nl:
De site van Kyteman
Blog Rockin’ Beats:
De 21-jarige Kyteman
(trompettist, componist en
Producer Colin Benders),
heeft drie jaar gewerkt aan
zijn debuut:
the Hermit sessions.
Jazzenzo:
...een optreden van het populaire
Kyteman’s Hiphop Orkest
ESSIR 2013 - IR and Social Media
‘Co-creation’
 Social Media:
 Consumer becomes a co-creator
 ‘Data consumption’ traces
 In essence: many new sources to play the
role of anchor text
 Tags and/or ratings
 Tweets
 Comments, reviews
Potential Benefits for IR
 Expand content representation
 Reduce the vocabulary gap(s) between
creators of content, indexers, and users
 More diverse views on the same content
Potential Benefits for IR
 Relevance depends on user context
 User task
 User knowledge
Potential Benefits for IR
 Relevance depends on user context
 User task
 User knowledge
 Social media provide an opportunity to
make much better assumptions about
user context
 A specific user’s context
 The variety of user contexts that may exist
Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders.
The task dependent effect of tags and ratings on social media access.
TOIS 28, 4, article 21 (November 2010), 42 pages.
LibraryThing
LibraryThing
 Items
 People
 Tags
 Ratings
See also: http://www.macle.nl/tud/LT/
Synonyms
Synonyms
ESSIR 2013 - IR and Social Media
Examples
 Humour
 Classic
LibraryThing
 Items
 People
 Tags
 Ratings
See also: http://www.macle.nl/tud/LT/
ESSIR 2013 - IR and Social Media
Search with Random Walk
 Present nodes according to estimated
probability that a random walk that starts
from (task dependent) starting nodes,
would end at this node
 E.g., tag suggestion starts in a tag node;
personalized search in tag and user nodes
Tagging Relationships
ESSIR 2013 - IR and Social Media
An item recommendation walk
Ratings
 Ratings may enhance the graph, or just
be used for evaluation
Personalized Search
 Assume a user who types a single tag as
query
Personalized Search
 A soft clustering effect smoothly relates
similar concepts before converging to the
background probability
 Homographs like “Java” are
disambiguated because the walk starts in
both the query tag and the target user
 So, content that matches the user’s
preference is more likely to be found first
Common System Designs
Analysis results
 Allowing all users to tag all available
content improves retrieval tasks
 Combining tags and ratings may improve
both search and recommendation tasks
Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices
Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices
 Potentially a problem if tags express opinion
about an item; e.g.,
 “poetry” can independent from item still describe
the user
 “awful” requires to know what item the term
belongs to
ESSIR 2013 - IR and Social Media
Tags vs. rating
 Most tags do not deviate far from the
mean rating
 Only few tags strongly correlated with
opinion
 Note: poetry higher quality than chicklit
Metadata
 Scientific articles have many types of
metadata associated:
 Abstract
 Author
 Booktitle
 Description
 Journal
 Tags
 Are all these types of metadata useful for
item recommendation?
Metadata
 According to Toine Bogers’ PhD thesis:
 Concatenate all fields associated to a single
user’s profile’s items into one huge text field,
and use an off-the-shelf IR model to match
the profile against metadata of the items.
“Profile-centric Matching”
 Or, construct item profiles from meta-data of
all users for that item, and apply an item-
based collaborative filtering approach
“Item-based Hybrid Filtering”
 Author, description, tags, title, url, journal
and booktitle all contribute
Finally: a recent case study
Artist Popularity?
 Let’s ask widely used social media music
platforms!
 I.e., query their APIs
ESSIR 2013 - IR and Social Media
Artist Popularity (1-3)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices
http://bit.ly/ESSIR13IRSocMedia
Artist Popularity
Artist Popularity (?!)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices
The Black Keys
The Black Keys
 Three grammy awards received!
The Black Keys
 Web responds, while service based
popularity index is static
Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?
Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?
 Q: What is the most useful popularity –
one that changes dynamically or one that
lasts?
ESSIR 2013 - IR and Social Media
Many topics I skipped…
ESSIR 2013 - IR and Social Media
Tweets about blip.tv
 “Twanchor text”
 E.g.: http://blip.tv/file/2168377
 Amazing
 Watching “World’s most realistic 3D city
models?”
 Google Earth/Maps killer
 Ludvig Emgard shows how maps/satellite pics
on web is done (learn Google and MS!)
 and ~120 more Tweets
Wikipedia
 Wikipedia contains semantically very rich
annotations:
 Wikipedia Categories
 Wikipedia Lists
 Times (1930, 1931, 1932, etc. etc.)
 Names
 Disambiguation pages
Etc.
 Note: DBPedia is just Wikipedia 
Wikipedia
 People have used Wikipedia edit history to
look for events
Geotags / POIs
 Many social media items carry explicit geo
information
 Geotags are low-level “coordinates”
 POIs are high-level “point-of-interest” labels
 Applications
 Recommend geo-locations to people
 Predict POI tags from (tweet) text
 Predict where a user will go next
Map text to locations
 Build a language model from all tags
assigned to flickr images that belong to a
predefined grid cell
 Neighbouring cells used for smoothing
(like hierarchic language models used
previously for video / scene / shot)
 User frequency of a term in a location
(instead of term frequency)
Neil O’Hare and Vanessa Murdock
Modeling Locations with Social Media
Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
Placing Images: Easy
http://www.flickr.com/photos/63666148@N00/3615989115/
Athens, Ohio or Athens, Greece?
Placing Images: Hard
Ballooning company
in Ottawa
Searching the Social Graph
 Search entities, and the relationships
between them, in the (facebook) social
graph
 Clearly IR problems, but who has the data
to work with?
Micheal Curtiss et al.
Unicorn: A System for Searching the Social Graph
PVLDB, Vol. 6, No. 11
Crawling
 How to get “the” data?
 Rate limited APIs
 ToS
HEADACHES!
Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley
Is the Sample Good Enough? Comparing Data from Twitter’s Streaming
API with Twitter’s Firehose
ICWSM 2013
Not IR yet, but…
Interesting stuff nevertheless!
de Volkskrant, March 13, 2013
Michal Kosinski, David Stillwell, and Thore Graepel
Private traits and attributes are predictable from digital records of
human behavior
PNAS 2013 ; published ahead of print March 11, 2013,
doi:10.1073/pnas.1218772110
Take home message(s)
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly
 Various recommendation and retrieval
tasks exist in social media – can one
theory address all of these?
C U @ #ECIR2014 ? !

Contenu connexe

Tendances

Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Mining social data
Mining social dataMining social data
Mining social dataMalk Zameth
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Bradley Allen
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Mediahome
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesSimon Buckingham Shum
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 

Tendances (19)

Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Mining social data
Mining social dataMining social data
Mining social data
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
About the Social Semantic Web
About the Social Semantic WebAbout the Social Semantic Web
About the Social Semantic Web
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologies
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Semantic Web - Introduction
Semantic Web - IntroductionSemantic Web - Introduction
Semantic Web - Introduction
 
SDoW2010 keynote
SDoW2010 keynoteSDoW2010 keynote
SDoW2010 keynote
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 

Similaire à ESSIR 2013 - IR and Social Media

The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Adventures in Cat Herding
Adventures in Cat HerdingAdventures in Cat Herding
Adventures in Cat HerdingLarry Belmont
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolFarida Vis
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social ProcessRobert Cormia
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Thomas Ryberg
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009Salim Ismail
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Online information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedOnline information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedBasset Hervé
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART projectJulie Allinson
 

Similaire à ESSIR 2013 - IR and Social Media (20)

The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Adventures in Cat Herding
Adventures in Cat HerdingAdventures in Cat Herding
Adventures in Cat Herding
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social Process
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
DMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final PresentationsDMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final Presentations
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Proposal.docx
Proposal.docxProposal.docx
Proposal.docx
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
020610
020610020610
020610
 
Osw Digital Humanities
Osw Digital HumanitiesOsw Digital Humanities
Osw Digital Humanities
 
Online information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedOnline information 2010_track_two_final_corrected
Online information 2010_track_two_final_corrected
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
 

Plus de Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by StrategyArjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?Arjen de Vries
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?! Arjen de Vries
 

Plus de Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?!
 

Dernier

Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxMYDA ANGELICA SUAN
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 

Dernier (20)

Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptx
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 

ESSIR 2013 - IR and Social Media

  • 1. 9th European Summer School in Information Retrieval September 4th, 2013 http://bit.ly/ESSIR13IRSocMedia IR and Social Media Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 2. On slideshare, IR = Investor Relations
  • 3. Social Media Noun social media (plural only) Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet. The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.
  • 5. Social Media  “Social bookmarking” sites  “User generated content”  Images (flickr) and videos (youtube, vimeo), but also blogs  Social network services  Twitter, facebook
  • 6. Not just one beast!
  • 9. IR and Social Media?
  • 10. Red Hot Chili Peppers
  • 11. “Rock group” in author’s metadata... Organisation in groups may help disambiguate query! More implicit metadata...
  • 12. Information Science “Search for the fundamental knowledge which will allow us to postulate and utilize the most efficient combination of [human and machine] resources”  M.E. Senko. Information systems: records, relations, sets, entities, and things. Information systems, 1(1):3–13, 1975.
  • 13. Core Questions  How to represent information?  The information need and search requests  The objects to be shown in response to an information request  How to match information representations?
  • 14. IR and Social Media  Richer information representations!
  • 15. Richer representations  User profiles  User name, full name, description, image, homepage url, etc.  Connections between users  Networks of friends, followers, etc  Comments/reactions  Endorsing and sharing
  • 16. Q: Web ancient social media?
  • 17. (C) 2008, The New York Times Company Anchor tekst: “continue reading”
  • 18. Not a lot of info to represent the page… Een fan’s hyves page: Kyteman's HipHop Orchestra: www.kyteman.com Kaartverkoop luxor theater: 22 mei - Kyteman's hiphop Orkest - www.kyteman.com Kluun.nl: De site van Kyteman Blog Rockin’ Beats: De 21-jarige Kyteman (trompettist, componist en Producer Colin Benders), heeft drie jaar gewerkt aan zijn debuut: the Hermit sessions. Jazzenzo: ...een optreden van het populaire Kyteman’s Hiphop Orkest
  • 20. ‘Co-creation’  Social Media:  Consumer becomes a co-creator  ‘Data consumption’ traces  In essence: many new sources to play the role of anchor text  Tags and/or ratings  Tweets  Comments, reviews
  • 21. Potential Benefits for IR  Expand content representation  Reduce the vocabulary gap(s) between creators of content, indexers, and users  More diverse views on the same content
  • 22. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge
  • 23. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge  Social media provide an opportunity to make much better assumptions about user context  A specific user’s context  The variety of user contexts that may exist
  • 24. Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task dependent effect of tags and ratings on social media access. TOIS 28, 4, article 21 (November 2010), 42 pages.
  • 26. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 31. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 33. Search with Random Walk  Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node  E.g., tag suggestion starts in a tag node; personalized search in tag and user nodes
  • 37. Ratings  Ratings may enhance the graph, or just be used for evaluation
  • 38. Personalized Search  Assume a user who types a single tag as query
  • 40.  A soft clustering effect smoothly relates similar concepts before converging to the background probability
  • 41.  Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user  So, content that matches the user’s preference is more likely to be found first
  • 43. Analysis results  Allowing all users to tag all available content improves retrieval tasks  Combining tags and ratings may improve both search and recommendation tasks
  • 44. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices
  • 45. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices  Potentially a problem if tags express opinion about an item; e.g.,  “poetry” can independent from item still describe the user  “awful” requires to know what item the term belongs to
  • 47. Tags vs. rating  Most tags do not deviate far from the mean rating  Only few tags strongly correlated with opinion  Note: poetry higher quality than chicklit
  • 48. Metadata  Scientific articles have many types of metadata associated:  Abstract  Author  Booktitle  Description  Journal  Tags  Are all these types of metadata useful for item recommendation?
  • 49. Metadata  According to Toine Bogers’ PhD thesis:  Concatenate all fields associated to a single user’s profile’s items into one huge text field, and use an off-the-shelf IR model to match the profile against metadata of the items. “Profile-centric Matching”  Or, construct item profiles from meta-data of all users for that item, and apply an item- based collaborative filtering approach “Item-based Hybrid Filtering”  Author, description, tags, title, url, journal and booktitle all contribute
  • 50. Finally: a recent case study
  • 51. Artist Popularity?  Let’s ask widely used social media music platforms!  I.e., query their APIs
  • 53. Artist Popularity (1-3)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 56. Artist Popularity (?!)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 58. The Black Keys  Three grammy awards received!
  • 59. The Black Keys  Web responds, while service based popularity index is static
  • 60. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?
  • 61. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?  Q: What is the most useful popularity – one that changes dynamically or one that lasts?
  • 63. Many topics I skipped…
  • 65. Tweets about blip.tv  “Twanchor text”  E.g.: http://blip.tv/file/2168377  Amazing  Watching “World’s most realistic 3D city models?”  Google Earth/Maps killer  Ludvig Emgard shows how maps/satellite pics on web is done (learn Google and MS!)  and ~120 more Tweets
  • 66. Wikipedia  Wikipedia contains semantically very rich annotations:  Wikipedia Categories  Wikipedia Lists  Times (1930, 1931, 1932, etc. etc.)  Names  Disambiguation pages Etc.  Note: DBPedia is just Wikipedia 
  • 67. Wikipedia  People have used Wikipedia edit history to look for events
  • 68. Geotags / POIs  Many social media items carry explicit geo information  Geotags are low-level “coordinates”  POIs are high-level “point-of-interest” labels  Applications  Recommend geo-locations to people  Predict POI tags from (tweet) text  Predict where a user will go next
  • 69. Map text to locations  Build a language model from all tags assigned to flickr images that belong to a predefined grid cell  Neighbouring cells used for smoothing (like hierarchic language models used previously for video / scene / shot)  User frequency of a term in a location (instead of term frequency) Neil O’Hare and Vanessa Murdock Modeling Locations with Social Media Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
  • 71. Placing Images: Hard Ballooning company in Ottawa
  • 72. Searching the Social Graph  Search entities, and the relationships between them, in the (facebook) social graph  Clearly IR problems, but who has the data to work with? Micheal Curtiss et al. Unicorn: A System for Searching the Social Graph PVLDB, Vol. 6, No. 11
  • 73. Crawling  How to get “the” data?  Rate limited APIs  ToS HEADACHES!
  • 74. Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose ICWSM 2013
  • 75. Not IR yet, but… Interesting stuff nevertheless! de Volkskrant, March 13, 2013 Michal Kosinski, David Stillwell, and Thore Graepel Private traits and attributes are predictable from digital records of human behavior PNAS 2013 ; published ahead of print March 11, 2013, doi:10.1073/pnas.1218772110
  • 77. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!
  • 78. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly
  • 79. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly  Various recommendation and retrieval tasks exist in social media – can one theory address all of these?
  • 80. C U @ #ECIR2014 ? !