SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Faceted Search

   New York CTO Club
   December 9, 2009



 Daniel Tunkelang, Google
Otis Gospodneti!, Sematext
Agenda
Daniel:
!
    What is faceted search?
!
    Why use faceted search?
!
    Thoughts about design and user experience.


Otis:
!
    What are Lucene and Solr?
!
    Why use an open-source search library?
!
    Thoughts about implementation.
“Regular” Search
Interface:
!   User expresses information need as short query.
!   Search engine returns ranked, pageable result set.

User happy when...
!   Top-ranked result satisfies information need.
!   At least some result on first page is relevant.

User unhappy when...
!   No result on first page satisfies information need.
!   Results misleadingly appear relevant (bait and switch).
Relevance Is Subjective

Relevance is defined as a measure of
information conveyed by a document relative to
a query.

It is shown that the relationship between the
document and the query, though necessary, is
not sufficient to determine relevance.


William Goffman, On relevance as a measure, 1964.
Regular Search Experience
Assumptions Are Dangerous
                      !
                          self-awareness

  tf-idf
           PageRank   !
                          self-expression

                      !
                          model knows best

                      !
                          answer is a document

                      !
                          one-shot query
What is Faceted Search?
!   Best understood through examples.
       "   See the following slides.
       "   Or shop on almost any ecommerce site.
!   Facets = multiple ways to organize information.
       "   Often based on available structured information.
       "   But not always, e.g., facets obtained via text mining.
!   Typical interaction:
       "   User starts with a full-text search.
       "   Facets guide query refinement process.
Faceted Search for News
Faceted Search for People
Faceted Search for Breakfast
But Facets are Not a Silver Bullet...
!   Screen real estate is finite.
       "   Choose facets wisely.
       "   Choose facet values wisely for monster facets.
!   Multiple selection within a facet is powerful, but...
       "   Has to be intuitive, especially AND vs. OR.
       "   Even trickier for hierarchical facets.
!   Search relevance still matters!
       "   Most faceted search applications rank results.
       "   Irrelevant results " irrelevant facet refinements.
Exploring Information Science
Deliver Precision and Recall




Easier said than done!

Ranking of facet values is an open research topic.
Be Careful with Faceted Search!



     Cameras have artists?!
Clarify, Then Refine
Take-Aways
!   Faceted search addresses the subjectivity of
    relevance and information overload.
!   But deploying faceted search effectively
    requires that you think about user experience.
!   Recommended reading:
       "   My thin book entitled Faceted Search
       "   Marti Hearst's book on Search User Interfaces
       "   Peter Morville's upcoming book on Search Patterns
Faceted Search with Lucene & Solr




         Otis Gospodneti!, Sematext
What is / isn't Lucene
!   Free, ASL, Java IR library, Jar
!   Doug Cutting, ASF, 2001
!   Application agnostic: Indexing & Searching
!   High performance, scalable
!   No dependencies
!   Heavily ported
!   No: crawler, rich doc parser, turn-key solution
!   No: out of the box faceted search-capability... but...
What is/isn't Solr
!
    Indexing/Search server with HTTP API built on
    top of Lucene
!
    Fast & scalable (distributed search, index
    replication)#
!
    XML, JSON, Ruby, Perl, PHP, javabin
!
    No: crawler (but Nutch ==> Solr works)#
!
    Yes: rich text parser
!
    Yes: Faceted Search out of the box!
Solr and Faceted Search
!
    3 Types of facets: Field Values (text), Dates,
    Queries.
!
    “Text”: return counts for all/top terms in a field
    for a result set - e.g. categories a la Amazon
!
    Dates: return counts for docs in specified date
    ranges
!
    Queries: return counts for docs that also match
    a given query - handy for number ranges (think
    prices!)#
Facet Field Requirements
!
    Must be indexed
!
    Often not tokenized
!
    Often not altered (lowercase, punctuation)#
!
    Storing not required
!
    Multivalued fields OK
Turn It On
!
    0 facets:
    !
        http://host:80/solr/select?q=foo

!
    1 facet:
    !
        http://host:80/solr/select?q=foo&facet=true&facet.field=category

!
    N facets:
    !
        http://host:80/solr/select?
        q=foo&facet=true&facet.field=category&facet.field=inStock

!
    facet=true or facet.on
Text Facet Response
<result numFound="4" start="0"/>
                                       !
                                           facet.mincount=1 to
<lst name="facet_counts">

<lst name="facet_fields">
                                           avoid 0-count facet
 <lst name="category">                     values
     <int name="electronics">3</int>   !
                                           facet.limit=N to limit to
     <int name="copier">0</int>
                                           top N facet values
 </lst>

 <lst name="inStock">                  !
                                           facet.missing=true to
     <int name="false">3</int>             catch uncategorized
     <int name="true">1</int>

 </lst>
                                       !
                                           lots of other options!
</lst>

</lst>
Date Facets
!
    http://.../solr/select/?
    q=*:*&rows=0&facet=true&facet.date=timesta
    mp&facet.date.start=NOW/DAY-
    5DAYS&facet.date.end=NOW/DAY
    %2B1DAY&facet.date.gap=%2B1DAY
!
    (%2B1 ==> +1)#
!
    Solr Date Math Parser syntax: /HOUR,
    +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS,
    +6MONTHS+3DAYS/DAY
Date Facet Response
<result name="response" numFound="42" start="0"/>

<lst name="facet_counts">

<lst name="facet_dates">

 <lst name="timestamp">

     <int name="2007-08-11T00:00:00.000Z">1</int>

     <int name="2007-08-12T00:00:00.000Z">5</int>

     <int name="2007-08-13T00:00:00.000Z">3</int>

     <int name="2007-08-14T00:00:00.000Z">7</int>

     <int name="2007-08-15T00:00:00.000Z">2</int>

     <int name="2007-08-16T00:00:00.000Z">16</int>

     <str name="gap">+1DAY</str>

     <date name="end">2007-08-17T00:00:00Z</date>

 </lst>
Query Facets
!
    http://.../solr/select?
    q=shoes&rows=0&facet=true&facet.field=inStoc
    k&facet.query=price:
    [*+TO+500]&facet.query=price:[500+TO+*]
!
    Avoids the bucket-at-index-time work-around
!
    Keep queries disjoint
Query Facet Response
<result numFound="3" start="0"/>

<lst name="facet_counts">

<lst name="facet_queries">

 <int name="price:[* TO 500]">3</int>

 <int name="price:[500 TO *]">1</int>

</lst>

<lst name="facet_fields">

 <lst name="inStock">

     <int name="false">3</int>

     <int name="true">1</int>

 </lst>

</lst>

</lst>
UI Integration
!
    Use Filter Queries via fq
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]&fq=inStock:true
!
    Important: single request does it all
State of Lucene & Solr
!
    Super healthy community, exploding
    development
!
    Lucene 3.0 – 2009-11-25:
       !
           Performance, faster range queries, clean API, better
           Unicode support, more non-English support
!
    Solr 1.4 – 2009-11-10:
       !
           Performance, new replication, Db indexing, rich-doc
           indexing, results clustering, faster response protocol,
           deduplication...
Lucene, Solr, Enterprise
!
    Free: Community
       !
           Lucene ~ 600 emails/month (dev: 2000/month)#
       !
           Solr ~1300 emails/month (dev: 800/month)#


!
    Commercial: Support Subscriptions
       !
           Sematext
       !
           Lucid Imagination

Contenu connexe

En vedette

Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsShakas Technologies
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...Earley Information Science
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationJim Kalbach
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)Jonathon Colman
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in LibrariesLaura Loveday Maury
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Allotment Digital Marketing
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemesNadeem Nazir
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.Khushboo Shaukat
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateAmye Scavarda
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
4150415
41504154150415
4150415kombi9
 

En vedette (17)

Data mining
Data miningData mining
Data mining
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search results
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted Navigation
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A Pirate
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
4150415
41504154150415
4150415
 

Plus de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 

Plus de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 

Dernier

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Dernier (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Faceted Search Nycto Talk

  • 1. Faceted Search New York CTO Club December 9, 2009 Daniel Tunkelang, Google Otis Gospodneti!, Sematext
  • 2. Agenda Daniel: ! What is faceted search? ! Why use faceted search? ! Thoughts about design and user experience. Otis: ! What are Lucene and Solr? ! Why use an open-source search library? ! Thoughts about implementation.
  • 3. “Regular” Search Interface: ! User expresses information need as short query. ! Search engine returns ranked, pageable result set. User happy when... ! Top-ranked result satisfies information need. ! At least some result on first page is relevant. User unhappy when... ! No result on first page satisfies information need. ! Results misleadingly appear relevant (bait and switch).
  • 4. Relevance Is Subjective Relevance is defined as a measure of information conveyed by a document relative to a query. It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance. William Goffman, On relevance as a measure, 1964.
  • 6. Assumptions Are Dangerous ! self-awareness tf-idf PageRank ! self-expression ! model knows best ! answer is a document ! one-shot query
  • 7. What is Faceted Search? ! Best understood through examples. " See the following slides. " Or shop on almost any ecommerce site. ! Facets = multiple ways to organize information. " Often based on available structured information. " But not always, e.g., facets obtained via text mining. ! Typical interaction: " User starts with a full-text search. " Facets guide query refinement process.
  • 10. Faceted Search for Breakfast
  • 11.
  • 12. But Facets are Not a Silver Bullet... ! Screen real estate is finite. " Choose facets wisely. " Choose facet values wisely for monster facets. ! Multiple selection within a facet is powerful, but... " Has to be intuitive, especially AND vs. OR. " Even trickier for hierarchical facets. ! Search relevance still matters! " Most faceted search applications rank results. " Irrelevant results " irrelevant facet refinements.
  • 14. Deliver Precision and Recall Easier said than done! Ranking of facet values is an open research topic.
  • 15. Be Careful with Faceted Search! Cameras have artists?!
  • 17. Take-Aways ! Faceted search addresses the subjectivity of relevance and information overload. ! But deploying faceted search effectively requires that you think about user experience. ! Recommended reading: " My thin book entitled Faceted Search " Marti Hearst's book on Search User Interfaces " Peter Morville's upcoming book on Search Patterns
  • 18. Faceted Search with Lucene & Solr Otis Gospodneti!, Sematext
  • 19. What is / isn't Lucene ! Free, ASL, Java IR library, Jar ! Doug Cutting, ASF, 2001 ! Application agnostic: Indexing & Searching ! High performance, scalable ! No dependencies ! Heavily ported ! No: crawler, rich doc parser, turn-key solution ! No: out of the box faceted search-capability... but...
  • 20.
  • 21. What is/isn't Solr ! Indexing/Search server with HTTP API built on top of Lucene ! Fast & scalable (distributed search, index replication)# ! XML, JSON, Ruby, Perl, PHP, javabin ! No: crawler (but Nutch ==> Solr works)# ! Yes: rich text parser ! Yes: Faceted Search out of the box!
  • 22. Solr and Faceted Search ! 3 Types of facets: Field Values (text), Dates, Queries. ! “Text”: return counts for all/top terms in a field for a result set - e.g. categories a la Amazon ! Dates: return counts for docs in specified date ranges ! Queries: return counts for docs that also match a given query - handy for number ranges (think prices!)#
  • 23. Facet Field Requirements ! Must be indexed ! Often not tokenized ! Often not altered (lowercase, punctuation)# ! Storing not required ! Multivalued fields OK
  • 24. Turn It On ! 0 facets: ! http://host:80/solr/select?q=foo ! 1 facet: ! http://host:80/solr/select?q=foo&facet=true&facet.field=category ! N facets: ! http://host:80/solr/select? q=foo&facet=true&facet.field=category&facet.field=inStock ! facet=true or facet.on
  • 25. Text Facet Response <result numFound="4" start="0"/> ! facet.mincount=1 to <lst name="facet_counts"> <lst name="facet_fields"> avoid 0-count facet <lst name="category"> values <int name="electronics">3</int> ! facet.limit=N to limit to <int name="copier">0</int> top N facet values </lst> <lst name="inStock"> ! facet.missing=true to <int name="false">3</int> catch uncategorized <int name="true">1</int> </lst> ! lots of other options! </lst> </lst>
  • 26. Date Facets ! http://.../solr/select/? q=*:*&rows=0&facet=true&facet.date=timesta mp&facet.date.start=NOW/DAY- 5DAYS&facet.date.end=NOW/DAY %2B1DAY&facet.date.gap=%2B1DAY ! (%2B1 ==> +1)# ! Solr Date Math Parser syntax: /HOUR, +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS, +6MONTHS+3DAYS/DAY
  • 27. Date Facet Response <result name="response" numFound="42" start="0"/> <lst name="facet_counts"> <lst name="facet_dates"> <lst name="timestamp"> <int name="2007-08-11T00:00:00.000Z">1</int> <int name="2007-08-12T00:00:00.000Z">5</int> <int name="2007-08-13T00:00:00.000Z">3</int> <int name="2007-08-14T00:00:00.000Z">7</int> <int name="2007-08-15T00:00:00.000Z">2</int> <int name="2007-08-16T00:00:00.000Z">16</int> <str name="gap">+1DAY</str> <date name="end">2007-08-17T00:00:00Z</date> </lst>
  • 28. Query Facets ! http://.../solr/select? q=shoes&rows=0&facet=true&facet.field=inStoc k&facet.query=price: [*+TO+500]&facet.query=price:[500+TO+*] ! Avoids the bucket-at-index-time work-around ! Keep queries disjoint
  • 29. Query Facet Response <result numFound="3" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"> <int name="price:[* TO 500]">3</int> <int name="price:[500 TO *]">1</int> </lst> <lst name="facet_fields"> <lst name="inStock"> <int name="false">3</int> <int name="true">1</int> </lst> </lst> </lst>
  • 30. UI Integration ! Use Filter Queries via fq ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300] ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300]&fq=inStock:true ! Important: single request does it all
  • 31. State of Lucene & Solr ! Super healthy community, exploding development ! Lucene 3.0 – 2009-11-25: ! Performance, faster range queries, clean API, better Unicode support, more non-English support ! Solr 1.4 – 2009-11-10: ! Performance, new replication, Db indexing, rich-doc indexing, results clustering, faster response protocol, deduplication...
  • 32. Lucene, Solr, Enterprise ! Free: Community ! Lucene ~ 600 emails/month (dev: 2000/month)# ! Solr ~1300 emails/month (dev: 800/month)# ! Commercial: Support Subscriptions ! Sematext ! Lucid Imagination