SlideShare une entreprise Scribd logo
1  sur  26
at io n
In fo rm          l
          ri ev a
     R et
            en ges
   C h a ll           Bruno Pedro
                        March 2010
Bruno Pedro
A n e x p e r i e n c e d We b d e v e l o p e r a n d
entrepreneur. Has extensive background in
large scale projects and technical writing.

http://tarpipe.com/user/bpedro
What is tarpipe?




       User
What is tarpipe?
3 Challenges
• Real-Time Retrieval
• Understanding Context
• Inferring Identify
Real-Time Retrieval




             http://www.flickr.com/photos/josephrobertson/127758523/
WordPress


                     source: wordpress.com




• Average ~10K posts/hour
• ~3 new posts every second
twitter


                      source: mashable.com




• Average ~1.1M tweets/hour
• ~300 new tweets every second
Challenge
• ~300 reads/second
• 160 X 300 = 48 KB/second = 4 GB/day
  (approximate calculation)




• How to process all this information?
Strategy
• Read and store immediately:
 • High performance write storage
 • No locks allowed
 • Prepare for lots of reading errors
Strategy
• Process later:
 • Regular expressions
 • Term extraction
 • Machine learning
Context




     For me context is the key - from
     that comes the understanding of
     everything. — Kenneth Noland
source: joelonsoftware.com




Dogs?
source: Google Reader Play




Dogs with unmatched title?
source: Google Buzz




Still doesn’t make a lot of sense...
This is the
worst case
scenario
Challenge
• Find context from associated content:
 • Pictures
 • Comments
 • Location information
 • Timelines
 • Authors
Strategy
• Associate content through common
  identifiers
• Establish timeline of different pieces
• Group pieces by same author
• Present in a comprehensible fashion
Identity




           source: abc Australia
Many Identifiers
• E-mail: user@example.com
• facebook: @User Name
• flickr: user or User Name (?)
• Google Buzz: @user@example.com
• twitter: @user
 ...
Addressable
• http://facebook.com/user
• http://flickr.com/user
• http://www.google.com/profiles/user
• http://twitter.com/user
  ...
How to make sense?
Challenge
• Parse every message, tweet or post
• Find possible user identifiers
• Substitute for meaningful information:
 • A link to the original profile
 • Equivalent identity on destination
Strategy
• Decentralized processing:
 • Browser based (plugin)
 • Extract identities from page
 • Process
 • Replace with meaningful information
Food for thought
• PubsubHubbub
  http://code.google.com/p/pubsubhubbub/

• Activity Streams
  http://activitystrea.ms/
• Web Finger
  http://webfinger.org/
tarpipe streamlines your     tarpipe is one of the most   Today I had a chance to
updates to various social    curious experiments in       spend time experimenting
web sites, creating simple   social media that I've       with tarpipe and I have to
or complex workflows to       seen lately. The service     say that I am intrigued by
update several buckets in    has the potential to be      the concept and impressed
one fell swoop.              the answer to the lament     by the implementation.
                             I first talked about in The
Adam Pash                    looming crisis: Personal     Jeff Barr
lifehacker                   syndication overload.        Amazon.com

                             Rafe Needleman
                             CNET news




thank you                                                 share your life

Contenu connexe

Similaire à Information Retrieval Challenges

Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequencyAndraz Tori
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchainjasonhaddix
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Blogging for a better classroom
Blogging for a better classroomBlogging for a better classroom
Blogging for a better classroomVicki Davis
 
Technical Communication for Unity Developers
Technical Communication for Unity DevelopersTechnical Communication for Unity Developers
Technical Communication for Unity DevelopersUnity Technologies
 
Real-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social AnalyticsReal-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social AnalyticsAttensity
 
Write a better FM
Write a better FMWrite a better FM
Write a better FMRich Bowen
 
Energizing PowerPoint
Energizing PowerPointEnergizing PowerPoint
Energizing PowerPointLaDonna Coy
 
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...Anselm Hook
 
Nguyen phuong truong anh a story of bug bounty hunter
Nguyen phuong truong anh   a story of bug bounty hunterNguyen phuong truong anh   a story of bug bounty hunter
Nguyen phuong truong anh a story of bug bounty hunterSecurity Bootcamp
 
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...sewilkie
 
Social Media Academy 2016 Presentation Slides
Social Media Academy 2016 Presentation SlidesSocial Media Academy 2016 Presentation Slides
Social Media Academy 2016 Presentation SlidesHarvardComms
 
Code Quality, Standards and Best Practices, Discuss
Code Quality, Standards and Best Practices, DiscussCode Quality, Standards and Best Practices, Discuss
Code Quality, Standards and Best Practices, DiscussJapheth Thomson
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith Jones, PhD
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 topgeek
 

Similaire à Information Retrieval Challenges (20)

From OSINT to Phishing presentation
From OSINT to Phishing presentationFrom OSINT to Phishing presentation
From OSINT to Phishing presentation
 
Doonish
DoonishDoonish
Doonish
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequency
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchain
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Blogging for a better classroom
Blogging for a better classroomBlogging for a better classroom
Blogging for a better classroom
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
Technical Communication for Unity Developers
Technical Communication for Unity DevelopersTechnical Communication for Unity Developers
Technical Communication for Unity Developers
 
Real-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social AnalyticsReal-World Challenges of Real-Time Social Analytics
Real-World Challenges of Real-Time Social Analytics
 
Write a better FM
Write a better FMWrite a better FM
Write a better FM
 
Energizing PowerPoint
Energizing PowerPointEnergizing PowerPoint
Energizing PowerPoint
 
Ideation,demos
Ideation,demosIdeation,demos
Ideation,demos
 
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...
Ubiquitous Angels; ambient sensor networks to crowd source crisis response an...
 
Nguyen phuong truong anh a story of bug bounty hunter
Nguyen phuong truong anh   a story of bug bounty hunterNguyen phuong truong anh   a story of bug bounty hunter
Nguyen phuong truong anh a story of bug bounty hunter
 
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...
The Art of APPlication: Using Apps to Engage Students as Collaborators, Creat...
 
Social Media Academy 2016 Presentation Slides
Social Media Academy 2016 Presentation SlidesSocial Media Academy 2016 Presentation Slides
Social Media Academy 2016 Presentation Slides
 
Code Quality, Standards and Best Practices, Discuss
Code Quality, Standards and Best Practices, DiscussCode Quality, Standards and Best Practices, Discuss
Code Quality, Standards and Best Practices, Discuss
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysis
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 

Plus de Bruno Pedro

What are Web APIs
What are Web APIsWhat are Web APIs
What are Web APIsBruno Pedro
 
Growing your business with an API
Growing your business with an APIGrowing your business with an API
Growing your business with an APIBruno Pedro
 
Product growth with an API
Product growth with an APIProduct growth with an API
Product growth with an APIBruno Pedro
 
How to grow your business with an API
How to grow your business with an APIHow to grow your business with an API
How to grow your business with an APIBruno Pedro
 
APIs Love to Chat
APIs Love to ChatAPIs Love to Chat
APIs Love to ChatBruno Pedro
 
How to Automate API Testing
How to Automate API TestingHow to Automate API Testing
How to Automate API TestingBruno Pedro
 
Asynchronous Microservices in nodejs
Asynchronous Microservices in nodejsAsynchronous Microservices in nodejs
Asynchronous Microservices in nodejsBruno Pedro
 
How to Automate API Discovery
How to Automate API DiscoveryHow to Automate API Discovery
How to Automate API DiscoveryBruno Pedro
 
Api Design & The Paris Subway
Api Design & The Paris SubwayApi Design & The Paris Subway
Api Design & The Paris SubwayBruno Pedro
 
The importance of /me
The importance of /meThe importance of /me
The importance of /meBruno Pedro
 
Maintainable consumers
Maintainable consumersMaintainable consumers
Maintainable consumersBruno Pedro
 
API Code Generation
API Code GenerationAPI Code Generation
API Code GenerationBruno Pedro
 
Bridging the Gap Between APIs and Customers
Bridging the Gap Between APIs and CustomersBridging the Gap Between APIs and Customers
Bridging the Gap Between APIs and CustomersBruno Pedro
 
Who's using your API?
Who's using your API?Who's using your API?
Who's using your API?Bruno Pedro
 
Is OAuth Really Secure?
Is OAuth Really Secure?Is OAuth Really Secure?
Is OAuth Really Secure?Bruno Pedro
 
tarpipe WordPress plugin demo
tarpipe WordPress plugin demotarpipe WordPress plugin demo
tarpipe WordPress plugin demoBruno Pedro
 
Everything OAuth
Everything OAuthEverything OAuth
Everything OAuthBruno Pedro
 
Activity Streams And Contexts
Activity Streams And ContextsActivity Streams And Contexts
Activity Streams And ContextsBruno Pedro
 

Plus de Bruno Pedro (20)

What are Web APIs
What are Web APIsWhat are Web APIs
What are Web APIs
 
Growing your business with an API
Growing your business with an APIGrowing your business with an API
Growing your business with an API
 
Product growth with an API
Product growth with an APIProduct growth with an API
Product growth with an API
 
How to grow your business with an API
How to grow your business with an APIHow to grow your business with an API
How to grow your business with an API
 
APIs Love to Chat
APIs Love to ChatAPIs Love to Chat
APIs Love to Chat
 
How to Automate API Testing
How to Automate API TestingHow to Automate API Testing
How to Automate API Testing
 
Asynchronous Microservices in nodejs
Asynchronous Microservices in nodejsAsynchronous Microservices in nodejs
Asynchronous Microservices in nodejs
 
How to Automate API Discovery
How to Automate API DiscoveryHow to Automate API Discovery
How to Automate API Discovery
 
Api Design & The Paris Subway
Api Design & The Paris SubwayApi Design & The Paris Subway
Api Design & The Paris Subway
 
The importance of /me
The importance of /meThe importance of /me
The importance of /me
 
Maintainable consumers
Maintainable consumersMaintainable consumers
Maintainable consumers
 
API Code Generation
API Code GenerationAPI Code Generation
API Code Generation
 
Bridging the Gap Between APIs and Customers
Bridging the Gap Between APIs and CustomersBridging the Gap Between APIs and Customers
Bridging the Gap Between APIs and Customers
 
Who's using your API?
Who's using your API?Who's using your API?
Who's using your API?
 
node-fs
node-fsnode-fs
node-fs
 
Is OAuth Really Secure?
Is OAuth Really Secure?Is OAuth Really Secure?
Is OAuth Really Secure?
 
tarpipe WordPress plugin demo
tarpipe WordPress plugin demotarpipe WordPress plugin demo
tarpipe WordPress plugin demo
 
OAuth checklist
OAuth checklistOAuth checklist
OAuth checklist
 
Everything OAuth
Everything OAuthEverything OAuth
Everything OAuth
 
Activity Streams And Contexts
Activity Streams And ContextsActivity Streams And Contexts
Activity Streams And Contexts
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Information Retrieval Challenges

  • 1. at io n In fo rm l ri ev a R et en ges C h a ll Bruno Pedro March 2010
  • 2. Bruno Pedro A n e x p e r i e n c e d We b d e v e l o p e r a n d entrepreneur. Has extensive background in large scale projects and technical writing. http://tarpipe.com/user/bpedro
  • 5. 3 Challenges • Real-Time Retrieval • Understanding Context • Inferring Identify
  • 6. Real-Time Retrieval http://www.flickr.com/photos/josephrobertson/127758523/
  • 7. WordPress source: wordpress.com • Average ~10K posts/hour • ~3 new posts every second
  • 8. twitter source: mashable.com • Average ~1.1M tweets/hour • ~300 new tweets every second
  • 9. Challenge • ~300 reads/second • 160 X 300 = 48 KB/second = 4 GB/day (approximate calculation) • How to process all this information?
  • 10. Strategy • Read and store immediately: • High performance write storage • No locks allowed • Prepare for lots of reading errors
  • 11. Strategy • Process later: • Regular expressions • Term extraction • Machine learning
  • 12. Context For me context is the key - from that comes the understanding of everything. — Kenneth Noland
  • 14. source: Google Reader Play Dogs with unmatched title?
  • 15. source: Google Buzz Still doesn’t make a lot of sense...
  • 16. This is the worst case scenario
  • 17. Challenge • Find context from associated content: • Pictures • Comments • Location information • Timelines • Authors
  • 18. Strategy • Associate content through common identifiers • Establish timeline of different pieces • Group pieces by same author • Present in a comprehensible fashion
  • 19. Identity source: abc Australia
  • 20. Many Identifiers • E-mail: user@example.com • facebook: @User Name • flickr: user or User Name (?) • Google Buzz: @user@example.com • twitter: @user ...
  • 21. Addressable • http://facebook.com/user • http://flickr.com/user • http://www.google.com/profiles/user • http://twitter.com/user ...
  • 22. How to make sense?
  • 23. Challenge • Parse every message, tweet or post • Find possible user identifiers • Substitute for meaningful information: • A link to the original profile • Equivalent identity on destination
  • 24. Strategy • Decentralized processing: • Browser based (plugin) • Extract identities from page • Process • Replace with meaningful information
  • 25. Food for thought • PubsubHubbub http://code.google.com/p/pubsubhubbub/ • Activity Streams http://activitystrea.ms/ • Web Finger http://webfinger.org/
  • 26. tarpipe streamlines your tarpipe is one of the most Today I had a chance to updates to various social curious experiments in spend time experimenting web sites, creating simple social media that I've with tarpipe and I have to or complex workflows to seen lately. The service say that I am intrigued by update several buckets in has the potential to be the concept and impressed one fell swoop. the answer to the lament by the implementation. I first talked about in The Adam Pash looming crisis: Personal Jeff Barr lifehacker syndication overload. Amazon.com Rafe Needleman CNET news thank you share your life

Notes de l'éditeur

  1. - Google Chrome plugin - Get identity information from Google Social Graph API or other means