SlideShare une entreprise Scribd logo
1  sur  23
Using Tags and Clustering to Identify Topic-specific Blogs Conor Hayes Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland Paolo Avesani, Bruno Kessler Institute (ITC-IRST) Trento, Italy
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tag clouds
The Long Tail ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering: Tags vs. Content ,[object Object]
Partitioning the tag space
Tag frequency distribution per cluster ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A-bloggers ,[object Object]
Intrablog similarity: A- vs. C-blogs  ,[object Object]
Similarity to centroid: A- vs. C-blogs  ,[object Object]
A-bloggers are ,[object Object],[object Object]
Relevance? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Verification 2: by Google
Similarity to pages from Google
Consistency over time ?
Blogger Entropy ,[object Object],win t+n q   : number of clusters at win t+n  containing users from cluster  r n r i   :   number of users from cluster  r  contained in cluster  i  at win t+n n r :  number of users from cluster  r  available at win t+n win t
Entropy: a-blogs vs c-blogs ,[object Object],[object Object],[object Object]
Example of A-blogs and C-blogs ,[object Object],[object Object],A-blogs C-blogs
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object]
Appendix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],spam

Contenu connexe

Tendances

Dekoh Press Meet, Bangalore, India
Dekoh Press Meet, Bangalore, IndiaDekoh Press Meet, Bangalore, India
Dekoh Press Meet, Bangalore, India
dekohworld
 
Searching the Internet
Searching the InternetSearching the Internet
Searching the Internet
vanalery
 
Interactive Internet
Interactive InternetInteractive Internet
Interactive Internet
James Sutter
 
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
Web 2.0 In a Nutshell: A Librarian Guide to the World of Web 2.0Web 2.0 In a Nutshell: A Librarian Guide to the World of Web 2.0
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
teaguese
 

Tendances (20)

aa
aaaa
aa
 
Dekoh Press Meet, Bangalore, India
Dekoh Press Meet, Bangalore, IndiaDekoh Press Meet, Bangalore, India
Dekoh Press Meet, Bangalore, India
 
Web 2.0 and other emerging technologies
Web 2.0 and other emerging technologiesWeb 2.0 and other emerging technologies
Web 2.0 and other emerging technologies
 
Social Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter AnnotationsSocial Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter Annotations
 
Web 2.0 stuff to make your life easier
Web 2.0 stuff to make your life easierWeb 2.0 stuff to make your life easier
Web 2.0 stuff to make your life easier
 
Toolicious Presentation at SoCon07
Toolicious Presentation at SoCon07Toolicious Presentation at SoCon07
Toolicious Presentation at SoCon07
 
Open Content Library LGM 2007
Open Content Library LGM 2007Open Content Library LGM 2007
Open Content Library LGM 2007
 
Web 2.0 & Social Computing
Web 2.0 & Social Computing Web 2.0 & Social Computing
Web 2.0 & Social Computing
 
Searching the Internet
Searching the InternetSearching the Internet
Searching the Internet
 
Web 2.0 for Lawyers (SL CLE)
Web 2.0 for Lawyers (SL CLE)Web 2.0 for Lawyers (SL CLE)
Web 2.0 for Lawyers (SL CLE)
 
MyLifeBits van Microsoft
MyLifeBits van MicrosoftMyLifeBits van Microsoft
MyLifeBits van Microsoft
 
RSS and Social Bookmarking
RSS and Social BookmarkingRSS and Social Bookmarking
RSS and Social Bookmarking
 
Web 2.0 for IA's
Web 2.0 for IA'sWeb 2.0 for IA's
Web 2.0 for IA's
 
Practical examples of web2.0 in the development sector
Practical examples of web2.0 in the development sectorPractical examples of web2.0 in the development sector
Practical examples of web2.0 in the development sector
 
Interactive Internet
Interactive InternetInteractive Internet
Interactive Internet
 
Web 2.0 And Repositories
Web 2.0 And RepositoriesWeb 2.0 And Repositories
Web 2.0 And Repositories
 
Blogstl (1)
Blogstl (1)Blogstl (1)
Blogstl (1)
 
Using Web 2.0 Principles to Become Librarian 2.0: Blogs
Using Web 2.0 Principles to Become Librarian 2.0: BlogsUsing Web 2.0 Principles to Become Librarian 2.0: Blogs
Using Web 2.0 Principles to Become Librarian 2.0: Blogs
 
Social Bookmarking
Social BookmarkingSocial Bookmarking
Social Bookmarking
 
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
Web 2.0 In a Nutshell: A Librarian Guide to the World of Web 2.0Web 2.0 In a Nutshell: A Librarian Guide to the World of Web 2.0
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
 

Similaire à Using Tags and Clustering to Identify Topic-specific Blogs

GContext: A context-based query construction service for Google
GContext: A context-based query construction service for GoogleGContext: A context-based query construction service for Google
GContext: A context-based query construction service for Google
John Pap
 
Are they any use? Price per use comparisons...
Are they any use? Price per use comparisons...Are they any use? Price per use comparisons...
Are they any use? Price per use comparisons...
Jason Price, PhD
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
maria.grineva
 
On Incentive-based Tagging
On Incentive-based TaggingOn Incentive-based Tagging
On Incentive-based Tagging
Francesco Rizzo
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User Experience
TSoholt
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
Ravi Mynampaty
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
IJRAT
 

Similaire à Using Tags and Clustering to Identify Topic-specific Blogs (20)

Conor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphereConor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphere
 
Blog clustering
Blog clusteringBlog clustering
Blog clustering
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
 
GContext: A context-based query construction service for Google
GContext: A context-based query construction service for GoogleGContext: A context-based query construction service for Google
GContext: A context-based query construction service for Google
 
Are they any use? Price per use comparisons...
Are they any use? Price per use comparisons...Are they any use? Price per use comparisons...
Are they any use? Price per use comparisons...
 
Detecting Blogs Independently from the Language and Content MSM09
Detecting Blogs Independently from the Language and Content MSM09Detecting Blogs Independently from the Language and Content MSM09
Detecting Blogs Independently from the Language and Content MSM09
 
Evidence of Learning in Blogs
Evidence of Learning in BlogsEvidence of Learning in Blogs
Evidence of Learning in Blogs
 
Ay3313861388
Ay3313861388Ay3313861388
Ay3313861388
 
Folksonomy and Tagging in the Social Web
Folksonomy and Tagging in the Social WebFolksonomy and Tagging in the Social Web
Folksonomy and Tagging in the Social Web
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by Wikipedia
 
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPSIMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
 
EDS for IFLA
EDS for IFLAEDS for IFLA
EDS for IFLA
 
On Incentive-based Tagging
On Incentive-based TaggingOn Incentive-based Tagging
On Incentive-based Tagging
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User Experience
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Using Tags and Clustering to Identify Topic-specific Blogs