SlideShare une entreprise Scribd logo
1  sur  111
Applications of Open Search Tools:  WWW2010 Tutorial Rosie Jones and Ted Drake Yahoo!  Inc April 26 th , 2010 [email_address] ,  [email_address]
Introductions
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Indexing and Search Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashup Patterns Ted Drake 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Caveat ,[object Object],[object Object],[object Object],For the slides: [email_address] [email_address] http://www.slideshare.net/7mary4
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Motivation
State of the Industry - Mashups ,[object Object],[object Object],[object Object]
State of the Industry - Healthy Market ,[object Object]
Open Source Technology Reduces Barriers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation II: Tools for Academic Papers
In Academia: Paper in WWW 2010 ,[object Object],The server uses Yahoo BOSS2 to search the web for snippets that resemble a paraphrase entered by the user.
In Academia: Papers from SIGIR 2008 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More Publications using Open Source Search Engines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Web Search Architecture Find documents Follow links Fetch freshest content Build graph of hyperlinks Process text and meta-data - compressed - for quick lookup Index Text and meta-data - compressed - for quick lookup Offline Find documents containing query words Runtime Crawlers Indexers Retrieval Ranking Interface
What is Open Search
Open Source Search and Open Search Open source code  lets you  build your own search engine Open search lets you leverage existing commercial search engines
Why Open Search? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scraping Modules http://search.cpan.org/~jfriedl/Yahoo-Search-1.10.13/lib/Yahoo/Search.pm
Do I Look Like A Piece of Bad Software?
Information Superhighway for Known Robots Search engine may stop accepting requests from your IP, or just slow down service
Scrape with Search Engine’s Blessing ,[object Object],[object Object],[object Object],MUCH MORE DETAIL IN THE NEXT SECTION!
Other Parts to the Search Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Your Own Content
Task of Indexing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Brute Force Document Scoring ,[object Object],[object Object],[object Object]
open drake search ted D1 D67 D3 D92 … query= open search ted drake D8 D9 D15 D32 D1 D9 D46 mit D3 D8 D9 D15 D32 D1 D6 D9 D15 D32 D3 D8 D9 D15 D32 Posting Posting list D1 D3 D8 D9 D15 D32 D6 D46 Inverted Index
High Level Comparison Platform License Lang. Docs Ranking Users Parallel Scale Lucene Apache Java Many Flexible Amazon Yes TB zettair BSD like C HTML, TREC, TXT Flexible Research No TB Indri BSD like C++ Many Very Flexible Research Yes TB Sphinx GPL C++ Many Flexible craigslist Yes TB RDBMS BSD, GPL C SQL Text Limited - Maybe GB Xapian GPL C++ Many Flexible gmane Yes TB
Previous Benchmarks  (Middleton+Baeza-Yates 07)
Open Search Benchmarking ,[object Object],[object Object]
Benchmarks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
In action ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lucene ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lucene Indexing
Lucene Search javac -cp /lucenedir/lucene-2.4.1/lucene-core-2.4.1.jar:. Index.java java –Xmv512m –cp /lucenedir/lucene-core-2.4.1.jar:. Index
Sphinx ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sphinx Indexing ,[object Object],[object Object],[object Object]
Sphinx Search Socket connection to searchd Sphinx service
Indri ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: Hello World ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indexed Info in Search API
Index - Structured Meta Data ,[object Object],[object Object],[object Object],[object Object]
Index - Social ,[object Object],[object Object],[object Object],[object Object]
Index – Machine Tags ,[object Object],[object Object],[object Object],[object Object]
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Hello, World! Open Search Service APIs Photo by  Oskay
Roadmap of APIs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Photo by  Scorpions and Centaurs
Google AJAX Search ,[object Object],[object Object],[object Object],[object Object],[object Object]
Google Custom Search ,[object Object],[object Object],[object Object]
Bing 2.0 API ,[object Object],[object Object],[object Object],[object Object]
Yahoo! BOSS ,[object Object],[object Object],[object Object]
Unrestricted? ,[object Object],[object Object],[object Object],[object Object]
BOSS API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SearchMonkey keyterms Bookmarks
Web = Cross Platform ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Platforms
Yahoo! YQL ,[object Object],[object Object],many standard & “open tables” services  »
Amazon Web Services (AWS) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Google App Engine ,[object Object],[object Object],[object Object],[object Object]
Examples
BOSS Out in the Open ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Google Custom Search Examples ,[object Object],[object Object]
Bing Examples ,[object Object]
Coolest Features Across the Board ,[object Object],[object Object],[object Object],[object Object],[object Object]
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashups Ted Drake 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Mashups
Let’s Build Something ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Digression: TF-IDF for Ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TweetNews Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TweetNews Main Source
Non-Search: delicious Classifier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mashup: Related terms ,[object Object],[object Object],[object Object]
Mashup – Social Impact ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mashup – The Fire Hose
Mashup – Government Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Coming Soon: Twitter Annotations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mashup – Open Tables on YQL ,[object Object],[object Object],[object Object],[object Object]
Mashup – Open Tables on YQL ,[object Object]
Mashup – Using an Open Table
Blending Vertical + Service ,[object Object],[object Object]
Delicious Blending Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
From Web
Hack Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schedule 2:00  – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00  – 4:30 Mashups Ted Drake 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
Ranking
Retrieval and Ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ranking with Open Source Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluation with Click Logs
Evaluating with Clicks People click on the good results, right?
Not All Results Are Equally Likely to be Looked At (Source:  iprospect.com  WhitePaper_2006_SearchEngineUserBehavior.pdf) ‏
Clicks and Views Depend on Rank [Joachims et al, 2005]
Evaluation from Click Logs ,[object Object],Read From Top to Bottom Skip Skip Skip Click! [Joachims et al SIGIR 2005]
Mining Clicks for Ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Interleaving for Learning from Clicks – Pairwise Judgments ,[object Object],[object Object],[object Object],Results from Method 1 Results from Method 2
Evaluation using Discounted Cumulative Gain ,[object Object],[object Object],Highly relevant Value = 3 Somewhat relevant Value = 2 Tangentially relevant Value = 1 Irrelevant Value = 0 Most important Value = 1 Less important Value = 1/log(i) ‏
Directly Modeling Relevance From Clicks Which ranking of web pages is better for the query “NIPS 2007”? [Carterette and Jones, NIPS 2007] Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Click count 1 Is DCG 1  > DCG 2 ? P(DCG 1  > DCG 2 )
Ingredients for Learning from Clicks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
How to Get Search Engine Results to Modify? ,[object Object],[object Object],[object Object],[object Object],Search Engine Services Allow You to Do This Kind of Thing Yourself
Query Logs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Wishlist Items ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Reasons to Build a Demo “ Eat Your Own Dogfood” algorithm design and testing - allows you to improve without labeled data - look closely at the results - convince your advisor/funders  it works! Observe user behavior Cheap flight to boston Cheap flights to boston Cheap flights Travelocity Expedia American arlines.com American airlines.com Americanairlines.com Puppy Cute puppy More cute puppy picutres
More About Logs and Evaluation in Other Tutorials ,[object Object],[object Object],[object Object],[object Object]
What Doesn’t Exist? ,[object Object],[object Object],[object Object]
Other Open Source Tools
Lemur Query Log Toolbar ,[object Object],[object Object],[object Object],[object Object]
Book on Hadoop Scale Processing Coming Out ,[object Object],[object Object],[object Object]
Take Home Messages ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pointers - Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mashup Resources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object],[object Object]
[object Object]

Contenu connexe

Tendances

Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
Matthew Rowe
 

Tendances (7)

2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP Spain2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP Spain
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
Aqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State UniversityAqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State University
 
Open Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMOpen Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEM
 
Open Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceOpen Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice Conference
 
Paper Presentation for INF 384H (http://courses.ischool.utexas.edu/Lease_Matt...
Paper Presentation for INF 384H (http://courses.ischool.utexas.edu/Lease_Matt...Paper Presentation for INF 384H (http://courses.ischool.utexas.edu/Lease_Matt...
Paper Presentation for INF 384H (http://courses.ischool.utexas.edu/Lease_Matt...
 
Open Source Community Metrics: LinuxCon Barcelona
Open Source Community Metrics: LinuxCon BarcelonaOpen Source Community Metrics: LinuxCon Barcelona
Open Source Community Metrics: LinuxCon Barcelona
 

En vedette

GEOSS Future Products & GeoSocial API
GEOSS Future Products & GeoSocial APIGEOSS Future Products & GeoSocial API
GEOSS Future Products & GeoSocial API
Pat Cappelaere
 
Finding Public Policy briefs
Finding Public Policy briefsFinding Public Policy briefs
Finding Public Policy briefs
guest388eb8e
 
Project Report - Raymond Chepkwony
Project Report - Raymond ChepkwonyProject Report - Raymond Chepkwony
Project Report - Raymond Chepkwony
Raymond Chepkwony
 
Combined Boolean Slideshare
Combined Boolean SlideshareCombined Boolean Slideshare
Combined Boolean Slideshare
Commvault
 
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg HawkesNACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
Greg Hawkes
 

En vedette (20)

Reddit
RedditReddit
Reddit
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
RESTful OGC Services
RESTful OGC ServicesRESTful OGC Services
RESTful OGC Services
 
GEOSS Future Products & GeoSocial API
GEOSS Future Products & GeoSocial APIGEOSS Future Products & GeoSocial API
GEOSS Future Products & GeoSocial API
 
GCE11 Apache Rave Presentation
GCE11 Apache Rave PresentationGCE11 Apache Rave Presentation
GCE11 Apache Rave Presentation
 
EPC Cloud: Using the Web to Simplify the Global RFID Network
EPC Cloud: Using the Web to Simplify the Global RFID NetworkEPC Cloud: Using the Web to Simplify the Global RFID Network
EPC Cloud: Using the Web to Simplify the Global RFID Network
 
Learn REST API at ASIT
Learn REST API at ASITLearn REST API at ASIT
Learn REST API at ASIT
 
Using the Google AJAX APIs
Using the Google AJAX APIsUsing the Google AJAX APIs
Using the Google AJAX APIs
 
Search APIs for Hack Days
Search APIs for Hack DaysSearch APIs for Hack Days
Search APIs for Hack Days
 
Research Tools
Research ToolsResearch Tools
Research Tools
 
Finding Public Policy briefs
Finding Public Policy briefsFinding Public Policy briefs
Finding Public Policy briefs
 
Searching for evidence
Searching for evidenceSearching for evidence
Searching for evidence
 
Tips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library ElevensesTips for searching (and finding!): Library Elevenses
Tips for searching (and finding!): Library Elevenses
 
Project Report - Raymond Chepkwony
Project Report - Raymond ChepkwonyProject Report - Raymond Chepkwony
Project Report - Raymond Chepkwony
 
Combined Boolean Slideshare
Combined Boolean SlideshareCombined Boolean Slideshare
Combined Boolean Slideshare
 
An Introduction to Data Journalism
An Introduction to Data JournalismAn Introduction to Data Journalism
An Introduction to Data Journalism
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006
 
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
Model-Driven Development of Semantic Mashup Applications with the Open-Source...Model-Driven Development of Semantic Mashup Applications with the Open-Source...
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
 
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg HawkesNACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
NACHR Conf Presentation Headhunting in Healthcare - Free Tools Greg Hawkes
 
Deploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CXDeploying Next Generation Firewalling with ASA - CX
Deploying Next Generation Firewalling with ASA - CX
 

Similaire à Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dApplications of Open Search Tools: WWW2010 Tutorial

PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Nathan Yergler
 
Devops kc meetup_5_20_2013
Devops kc meetup_5_20_2013Devops kc meetup_5_20_2013
Devops kc meetup_5_20_2013
Aaron Blythe
 

Similaire à Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dApplications of Open Search Tools: WWW2010 Tutorial (20)

PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Semantic Web, e-commerce
Semantic Web, e-commerceSemantic Web, e-commerce
Semantic Web, e-commerce
 
Reviewing RESTful Web Apps
Reviewing RESTful Web AppsReviewing RESTful Web Apps
Reviewing RESTful Web Apps
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
 
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentation
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Devops kc meetup_5_20_2013
Devops kc meetup_5_20_2013Devops kc meetup_5_20_2013
Devops kc meetup_5_20_2013
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Internet Searching Version2
Internet Searching Version2Internet Searching Version2
Internet Searching Version2
 
R Programming Overview
R Programming Overview R Programming Overview
R Programming Overview
 
Big Data Week 2013 Flow
Big Data Week 2013 FlowBig Data Week 2013 Flow
Big Data Week 2013 Flow
 

Plus de Ted Drake

Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
Ted Drake
 
Transforming Accessibility one lunch at a tiime - CSUN 2023
Transforming Accessibility one lunch at a tiime - CSUN 2023Transforming Accessibility one lunch at a tiime - CSUN 2023
Transforming Accessibility one lunch at a tiime - CSUN 2023
Ted Drake
 
The Saga of Accessible Colors
The Saga of Accessible ColorsThe Saga of Accessible Colors
The Saga of Accessible Colors
Ted Drake
 

Plus de Ted Drake (20)

Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
Introduce Trauma-Informed Design to Your Organization - CSUN ATC 2024
 
Transforming Accessibility one lunch at a tiime - CSUN 2023
Transforming Accessibility one lunch at a tiime - CSUN 2023Transforming Accessibility one lunch at a tiime - CSUN 2023
Transforming Accessibility one lunch at a tiime - CSUN 2023
 
Inclusive Design for cognitive disabilities, neurodiversity, and chronic illness
Inclusive Design for cognitive disabilities, neurodiversity, and chronic illnessInclusive Design for cognitive disabilities, neurodiversity, and chronic illness
Inclusive Design for cognitive disabilities, neurodiversity, and chronic illness
 
Inclusive design for Long Covid
 Inclusive design for Long Covid  Inclusive design for Long Covid
Inclusive design for Long Covid
 
Covid 19, brain fog, and inclusive design
Covid 19, brain fog, and inclusive designCovid 19, brain fog, and inclusive design
Covid 19, brain fog, and inclusive design
 
Customer obsession and accessibility
Customer obsession and accessibilityCustomer obsession and accessibility
Customer obsession and accessibility
 
The Saga of Accessible Colors
The Saga of Accessible ColorsThe Saga of Accessible Colors
The Saga of Accessible Colors
 
Artificial Intelligence and Accessibility - GAAD 2020 - Hello A11y
Artificial Intelligence and Accessibility - GAAD 2020 - Hello A11yArtificial Intelligence and Accessibility - GAAD 2020 - Hello A11y
Artificial Intelligence and Accessibility - GAAD 2020 - Hello A11y
 
Expand your outreach with an accessibility champions program
Expand your outreach with an accessibility champions program Expand your outreach with an accessibility champions program
Expand your outreach with an accessibility champions program
 
Intuit's Accessibility Champion Program - Coaching and Celebrating
Intuit's Accessibility Champion Program - Coaching and Celebrating Intuit's Accessibility Champion Program - Coaching and Celebrating
Intuit's Accessibility Champion Program - Coaching and Celebrating
 
Accessibility First Innovation
Accessibility First InnovationAccessibility First Innovation
Accessibility First Innovation
 
Inclusive customer interviews make it your friday task
Inclusive customer interviews  make it your friday taskInclusive customer interviews  make it your friday task
Inclusive customer interviews make it your friday task
 
Coaching and Celebrating Accessibility Champions
Coaching and Celebrating Accessibility ChampionsCoaching and Celebrating Accessibility Champions
Coaching and Celebrating Accessibility Champions
 
Accessibility statements and resource publishing best practices csun 2019
Accessibility statements and resource publishing best practices   csun 2019Accessibility statements and resource publishing best practices   csun 2019
Accessibility statements and resource publishing best practices csun 2019
 
Raising Accessibility Awareness at Intuit
Raising Accessibility Awareness at IntuitRaising Accessibility Awareness at Intuit
Raising Accessibility Awareness at Intuit
 
Trickle Down Accessibility
Trickle Down AccessibilityTrickle Down Accessibility
Trickle Down Accessibility
 
Trickle-Down Accessibility - CSUN 2018
Trickle-Down Accessibility - CSUN 2018Trickle-Down Accessibility - CSUN 2018
Trickle-Down Accessibility - CSUN 2018
 
Accessibility metrics Accessibility Data Metrics and Reporting – Industry Bes...
Accessibility metrics Accessibility Data Metrics and Reporting – Industry Bes...Accessibility metrics Accessibility Data Metrics and Reporting – Industry Bes...
Accessibility metrics Accessibility Data Metrics and Reporting – Industry Bes...
 
Mystery Meat 2.0 – Making hidden mobile interactions accessible
Mystery Meat 2.0 – Making hidden mobile interactions accessibleMystery Meat 2.0 – Making hidden mobile interactions accessible
Mystery Meat 2.0 – Making hidden mobile interactions accessible
 
React Native Accessibility - San Diego React and React Native Meetup
React Native Accessibility - San Diego React and React Native MeetupReact Native Accessibility - San Diego React and React Native Meetup
React Native Accessibility - San Diego React and React Native Meetup
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dApplications of Open Search Tools: WWW2010 Tutorial

  • 1. Applications of Open Search Tools: WWW2010 Tutorial Rosie Jones and Ted Drake Yahoo! Inc April 26 th , 2010 [email_address] , [email_address]
  • 3. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Indexing and Search Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashup Patterns Ted Drake 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 4.
  • 5. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 7.
  • 8.
  • 9.
  • 10. Motivation II: Tools for Academic Papers
  • 11.
  • 12.
  • 13.
  • 14. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 15. Web Search Architecture Find documents Follow links Fetch freshest content Build graph of hyperlinks Process text and meta-data - compressed - for quick lookup Index Text and meta-data - compressed - for quick lookup Offline Find documents containing query words Runtime Crawlers Indexers Retrieval Ranking Interface
  • 16. What is Open Search
  • 17. Open Source Search and Open Search Open source code lets you build your own search engine Open search lets you leverage existing commercial search engines
  • 18.
  • 20. Do I Look Like A Piece of Bad Software?
  • 21. Information Superhighway for Known Robots Search engine may stop accepting requests from your IP, or just slow down service
  • 22.
  • 23.
  • 24. Indexing Your Own Content
  • 25.
  • 26.
  • 27. open drake search ted D1 D67 D3 D92 … query= open search ted drake D8 D9 D15 D32 D1 D9 D46 mit D3 D8 D9 D15 D32 D1 D6 D9 D15 D32 D3 D8 D9 D15 D32 Posting Posting list D1 D3 D8 D9 D15 D32 D6 D46 Inverted Index
  • 28. High Level Comparison Platform License Lang. Docs Ranking Users Parallel Scale Lucene Apache Java Many Flexible Amazon Yes TB zettair BSD like C HTML, TREC, TXT Flexible Research No TB Indri BSD like C++ Many Very Flexible Research Yes TB Sphinx GPL C++ Many Flexible craigslist Yes TB RDBMS BSD, GPL C SQL Text Limited - Maybe GB Xapian GPL C++ Many Flexible gmane Yes TB
  • 29. Previous Benchmarks (Middleton+Baeza-Yates 07)
  • 30.
  • 31.
  • 32.
  • 33.
  • 35. Lucene Search javac -cp /lucenedir/lucene-2.4.1/lucene-core-2.4.1.jar:. Index.java java –Xmv512m –cp /lucenedir/lucene-core-2.4.1.jar:. Index
  • 36.
  • 37.
  • 38. Sphinx Search Socket connection to searchd Sphinx service
  • 39.
  • 40.
  • 41. Indexed Info in Search API
  • 42.
  • 43.
  • 44.
  • 45. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashup Patterns Ted & Rosie 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 46. Hello, World! Open Search Service APIs Photo by Oskay
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 56.
  • 57.
  • 58.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashups Ted Drake 4:30 – 5:00 Automatic Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 66.
  • 67.
  • 68.
  • 70.
  • 71.
  • 72.
  • 73. Mashup – The Fire Hose
  • 74.
  • 75.
  • 76.
  • 77.
  • 78. Mashup – Using an Open Table
  • 79.
  • 80.
  • 82.
  • 83. Schedule 2:00 – 2:15 Introductions and Overview Rosie & Ted 2:15 – 2:30 Motivation – state of the industry Ted Drake 2:30 – 3:00 Search and Indexing Rosie & Ted 3:00 – 3:30 Hello World! Using Search Service APIs & Examples Ted Drake 3:30 – 4:00 Coffee Break 4:00 – 4:30 Mashups Ted Drake 4:30 – 5:00 Ranking and Evaluation Rosie Jones 5:00 – 5:30 Discussion, Questions Ted & Rosie
  • 85.
  • 86.
  • 88. Evaluating with Clicks People click on the good results, right?
  • 89. Not All Results Are Equally Likely to be Looked At (Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf) ‏
  • 90. Clicks and Views Depend on Rank [Joachims et al, 2005]
  • 91.
  • 92.
  • 93.
  • 94.
  • 95. Directly Modeling Relevance From Clicks Which ranking of web pages is better for the query “NIPS 2007”? [Carterette and Jones, NIPS 2007] Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Click count 1 Is DCG 1 > DCG 2 ? P(DCG 1 > DCG 2 )
  • 96.
  • 97.  
  • 98.
  • 99.
  • 100.
  • 101. Reasons to Build a Demo “ Eat Your Own Dogfood” algorithm design and testing - allows you to improve without labeled data - look closely at the results - convince your advisor/funders it works! Observe user behavior Cheap flight to boston Cheap flights to boston Cheap flights Travelocity Expedia American arlines.com American airlines.com Americanairlines.com Puppy Cute puppy More cute puppy picutres
  • 102.
  • 103.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.

Notes de l'éditeur

  1. http://developer.yahoo.com/everything.html - for logos
  2. ROSIE – SHOW PSEUDOCODE FOR SIMPLIFIED VERSION – THEN CONVERT TO YQL(TED) OR PERL (ROSIE)? The user uses a search interface to rapidly gather many snippets that contain similar phrases, and then selects those that they would like to mark (Figure 6). The server uses Yahoo BOSS2 to search the web for snippets that resemble a paraphrase entered by the user. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  3. ROSIE – SHOW PSEUDOCODE FOR SIMPLIFIED VERSION – THEN CONVERT TO YQL(TED) OR PERL (ROSIE)? SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  4. ROSIE WORK ON THIS TONIGHT
  5. Eran / Ashim ; okay to inlcude BOSS HERE?
  6. WHAT DO WE SHOW FOR PRESENTATION?? – the SIGIR 2008 papers? Query-biased summaries
  7. Mapping from words to documents containing them
  8. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  9. TED check correctness
  10. TED check correctness
  11. LOGO NEEDED!
  12. TED UPDATE
  13. TED UPDATE?
  14. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  15. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  16. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  17. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  18. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  19. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  20. SIGIR 2008 proceedings http://portal.acm.org/toc.cfm?id=1390334&idx=SERIES278&type=proceeding&coll=ACM&dl=ACM&part=series&WantType=Proceedings&title=SIGIR&CFID=43145604&CFTOKEN=93348762 Jung et al IP&M http://search.yahoo.com/search?p=Click+data+as+implicit+relevance+feedback+in+web+search&ei=UTF-8&fr=moz2 Affective feedback http://eprints.gla.ac.uk/4825/1/4825.pdf http://portal.acm.org/citation.cfm?id=1390566&dl=GUIDE&coll=GUIDE&CFID=43143609&CFTOKEN=22951859
  21. ROSIE ADD A PICTURE
  22. TALK MORE ABOUT THIS EXAMPLE
  23. MAKE A NEW SCREENSHOT WITHOUT CLIPPES TEXT DESCRIBE arXiv.org more fully – who uses it what it does etc. Radlinksi et a – implemented arxiv search on top of lucene http://search.arxiv.org/ One could use eg. Yahoo result ordering as one baseline: BOSS with restriction to arxiv.org What would this pseudocode look like?
  24. TODO: Examples from SIGIR 2008 papers for each of those
  25. RJ show unigram/ngram examples - add refs for Observer User Behavior