SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
“Regular” Search
                 Faceted Search
                                                     Interface:
                                                      !   User expresses information need as short query.
                                                          Search engine returns ranked, pageable result set.
                   New York CTO Club
                                                      !



                   December 9, 2009                  User happy when...
                                                      !   Top-ranked result satisfies information need.
                                                      !   At least some result on first page is relevant.

               Daniel Tunkelang, Google              User unhappy when...
              Otis Gospodneti!, Sematext              !   No result on first page satisfies information need.
                                                      !   Results misleadingly appear relevant (bait and switch).
                                                 1                                                                  3




                         Agenda                                 Relevance Is Subjective
Daniel:
!
    What is faceted search?                          Relevance is defined as a measure of
!
    Why use faceted search?                          information conveyed by a document relative to
!
    Thoughts about design and user experience.       a query.

                                                     It is shown that the relationship between the
Otis:
!
    What are Lucene and Solr?                        document and the query, though necessary, is
!
    Why use an open-source search library?
                                                     not sufficient to determine relevance.
!
    Thoughts about implementation.
                                                     William Goffman, On relevance as a measure, 1964.
                                                 2                                                                  4
Regular Search Experience                                       What is Faceted Search?
                                                     !   Best understood through examples.
                                                            "   See the following slides.
                                                            "   Or shop on almost any ecommerce site.
                                                     !   Facets = multiple ways to organize information.
                                                            "   Often based on available structured information.
                                                            "   But not always, e.g., facets obtained via text mining.
                                                     !   Typical interaction:
                                                            "   User starts with a full-text search.
                                                            "   Facets guide query refinement process.

                                                 5                                                                       7




Assumptions Are Dangerous                                       Faceted Search for News
                      !
                          self-awareness

  tf-idf
           PageRank   !
                          self-expression

                      !
                          model knows best

                      !
                          answer is a document

                      !
                          one-shot query
                                                 6                                                                       8
Faceted Search for People




                                9




Faceted Search for Breakfast        But Facets are Not a Silver Bullet...
                                    !   Screen real estate is finite.
                                           "   Choose facets wisely.
                                           "   Choose facet values wisely for monster facets.
                                    !   Multiple selection within a facet is powerful, but...
                                           "   Has to be intuitive, especially AND vs. OR.
                                           "   Even trickier for hierarchical facets.
                                    !   Search relevance still matters!
                                           "   Most faceted search applications rank results.
                                           "   Irrelevant results " irrelevant facet refinements.


                               10                                                                   12
Exploring Information Science                              Be Careful with Faceted Search!



                                                                Cameras have artists?!




                                                      13                                     15




Deliver Precision and Recall                                     Clarify, Then Refine




 Easier said than done!

 Ranking of facet values is an open research topic.
                                                      14                                     16
Take-Aways                                                 What is / isn't Lucene
!   Faceted search addresses the subjectivity of                    !   Free, ASL, Java IR library, Jar
    relevance and information overload.                             !   Doug Cutting, ASF, 2001
!   But deploying faceted search effectively
                                                                    !   Application agnostic: Indexing & Searching
    requires that you think about user experience.                  !   High performance, scalable
                                                                    !   No dependencies
!   Recommended reading:
                                                                    !   Heavily ported
       "   My thin book entitled Faceted Search
       "   Marti Hearst's book on Search User Interfaces
                                                                    !   No: crawler, rich doc parser, turn-key solution
       "   Peter Morville's upcoming book on Search Patterns        !   No: out of the box faceted search-capability... but...



                                                               17                                                                19




Faceted Search with Lucene & Solr




                   Otis Gospodneti!, Sematext




                                                               18
What is/isn't Solr                                        Facet Field Requirements
!
    Indexing/Search server with HTTP API built on             !
                                                                  Must be indexed
    top of Lucene                                             !
                                                                  Often not tokenized
!
    Fast & scalable (distributed search, index                !
                                                                  Often not altered (lowercase, punctuation)#
    replication)#
                                                              !
                                                                  Storing not required
!
    XML, JSON, Ruby, Perl, PHP, javabin
                                                              !
                                                                  Multivalued fields OK
!
    No: crawler (but Nutch ==> Solr works)#
!
    Yes: rich text parser
!
    Yes: Faceted Search out of the box!
                                                         21                                                                              23




          Solr and Faceted Search                                                           Turn It On
!
    3 Types of facets: Field Values (text), Dates,            !
                                                                  0 facets:
    Queries.                                                      !
                                                                      http://host:80/solr/select?q=foo

!
    “Text”: return counts for all/top terms in a field        !
                                                                  1 facet:
    for a result set - e.g. categories a la Amazon                !
                                                                      http://host:80/solr/select?q=foo&facet=true&facet.field=category

!
    Dates: return counts for docs in specified date           !
                                                                  N facets:
    ranges                                                        !
                                                                      http://host:80/solr/select?
                                                                      q=foo&facet=true&facet.field=category&facet.field=inStock
!
    Queries: return counts for docs that also match           !
                                                                  facet=true or facet.on
    a given query - handy for number ranges (think
    prices!)#
                                                         22                                                                              24
Text Facet Response                                                     Date Facet Response
<result numFound="4" start="0"/>                                          <result name="response" numFound="42" start="0"/>
                                        !
                                            facet.mincount=1 to
<lst name="facet_counts">                                                 <lst name="facet_counts">

<lst name="facet_fields">
                                            avoid 0-count facet           <lst name="facet_dates">

 <lst name="category">                      values                         <lst name="timestamp">

     <int name="electronics">3</int>    !
                                            facet.limit=N to limit to          <int name="2007-08-11T00:00:00.000Z">1</int>

     <int name="copier">0</int>                                                <int name="2007-08-12T00:00:00.000Z">5</int>
                                            top N facet values
 </lst>                                                                        <int name="2007-08-13T00:00:00.000Z">3</int>

 <lst name="inStock">                   !
                                            facet.missing=true to              <int name="2007-08-14T00:00:00.000Z">7</int>

     <int name="false">3</int>              catch uncategorized                <int name="2007-08-15T00:00:00.000Z">2</int>

     <int name="true">1</int>                                                  <int name="2007-08-16T00:00:00.000Z">16</int>

 </lst>
                                        !
                                            lots of other options!             <str name="gap">+1DAY</str>

</lst>                                                                         <date name="end">2007-08-17T00:00:00Z</date>

</lst>                                                               25    </lst>                                              27




                                  Date Facets                                                         Query Facets
!
    http://.../solr/select/?                                              !
                                                                              http://.../solr/select?
    q=*:*&rows=0&facet=true&facet.date=timesta                                q=shoes&rows=0&facet=true&facet.field=inStoc
    mp&facet.date.start=NOW/DAY-                                              k&facet.query=price:
    5DAYS&facet.date.end=NOW/DAY                                              [*+TO+500]&facet.query=price:[500+TO+*]
    %2B1DAY&facet.date.gap=%2B1DAY                                        !
                                                                              Avoids the bucket-at-index-time work-around
!
    (%2B1 ==> +1)#                                                        !
                                                                              Keep queries disjoint
!
    Solr Date Math Parser syntax: /HOUR,
    +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS,
    +6MONTHS+3DAYS/DAY
                                                                     26                                                        28
Query Facet Response                                State of Lucene & Solr
<result numFound="3" start="0"/>
                                                      !
                                                          Super healthy community, exploding
<lst name="facet_counts">

<lst name="facet_queries">
                                                          development
 <int name="price:[* TO 500]">3</int>                 !
                                                          Lucene 3.0 – 2009-11-25:
 <int name="price:[500 TO *]">1</int>
                                                             !
                                                                 Performance, faster range queries, clean API, better
</lst>
                                                                 Unicode support, more non-English support
<lst name="facet_fields">

 <lst name="inStock">
                                                      !
                                                          Solr 1.4 – 2009-11-10:
     <int name="false">3</int>                               !
                                                                 Performance, new replication, Db indexing, rich-doc
     <int name="true">1</int>                                    indexing, results clustering, faster response protocol,
 </lst>                                                          deduplication...
</lst>

</lst>                                           29                                                                        31




                                UI Integration                     Lucene, Solr, Enterprise
!
    Use Filter Queries via fq                         !
                                                          Free: Community
!
    http://.../solr/select?                                  !
                                                                 Lucene ~ 600 emails/month (dev: 2000/month)#
    q=shoes&facet=true&facet.field=category&                 !
                                                                 Solr ~1300 emails/month (dev: 800/month)#
    fq=price:[0 TO 300]
!
    http://.../solr/select?                           !
                                                          Commercial: Support Subscriptions
    q=shoes&facet=true&facet.field=category&                 !
                                                                 Sematext
    fq=price:[0 TO 300]&fq=inStock:true                      !
                                                                 Lucid Imagination
!
    Important: single request does it all

                                                 30                                                                        32

Contenu connexe

En vedette

Noise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning AlgorithmsNoise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning AlgorithmsAkrita Agarwal
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsShakas Technologies
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?epek
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrThomas Koch
 
Hybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparisionHybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparisionShinu Suresh
 
SAP hybris Caching and Monitoring
SAP hybris Caching and MonitoringSAP hybris Caching and Monitoring
SAP hybris Caching and MonitoringZhuo Huang
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisPaolo Mottadelli
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmSwapnil Sonar
 
Achieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for ServiceAchieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for ServiceSAP Customer Experience
 
Discover the Power of Contextual Marketing
Discover the Power of Contextual MarketingDiscover the Power of Contextual Marketing
Discover the Power of Contextual MarketingSAP Customer Experience
 
Solr facets and custom indices
Solr facets and custom indicesSolr facets and custom indices
Solr facets and custom indicescgmonroe
 
SAP hybris - User Account Management
SAP hybris - User Account ManagementSAP hybris - User Account Management
SAP hybris - User Account ManagementZhuo Huang
 
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...youngculture
 
Deliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce ExperienceDeliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce ExperienceSAP Customer Experience
 

En vedette (17)

Noise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning AlgorithmsNoise Resilience in Machine Learning Algorithms
Noise Resilience in Machine Learning Algorithms
 
Resume parser
Resume parserResume parser
Resume parser
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search results
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und Solr
 
Hybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparisionHybris 6.0.0 to 6.3.0 comparision
Hybris 6.0.0 to 6.3.0 comparision
 
What is Product Life Cycle Management?
What is Product Life Cycle Management?What is Product Life Cycle Management?
What is Product Life Cycle Management?
 
SAP hybris Caching and Monitoring
SAP hybris Caching and MonitoringSAP hybris Caching and Monitoring
SAP hybris Caching and Monitoring
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybris
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering Algorithm
 
Achieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for ServiceAchieve Digital Transformation with SAP Hybris Cloud for Service
Achieve Digital Transformation with SAP Hybris Cloud for Service
 
Discover the Power of Contextual Marketing
Discover the Power of Contextual MarketingDiscover the Power of Contextual Marketing
Discover the Power of Contextual Marketing
 
Solr facets and custom indices
Solr facets and custom indicesSolr facets and custom indices
Solr facets and custom indices
 
SAP hybris - User Account Management
SAP hybris - User Account ManagementSAP hybris - User Account Management
SAP hybris - User Account Management
 
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
Developing enterprise ecommerce solutions using hybris by Drazen Nikolic - Be...
 
Deliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce ExperienceDeliver the Perfect Omnichannel Commerce Experience
Deliver the Perfect Omnichannel Commerce Experience
 

Similaire à Faceted Search and Solr

UKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive PersonasUKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive PersonasUXPA UK
 
The hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified worldThe hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified worldnabot
 
Prototyping and Scrum
Prototyping and ScrumPrototyping and Scrum
Prototyping and ScrumMemi Beltrame
 
Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpLucidworks (Archived)
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsTony Russell-Rose
 
From post its to personas
From post its to personasFrom post its to personas
From post its to personasLee McIvor
 
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...Semantic Web Company
 
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”Steffen Hück
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutAre Halland
 
How do we create great user experiences?
How do we create great user experiences?How do we create great user experiences?
How do we create great user experiences?Jan Hagen
 
Search and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and LearningSearch and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and LearningSebastian Ryszard Kruk
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveEna Arel
 
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...Nebraska Library Commission
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysishendrikknoche
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation StarterSEEK Company
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahoutlucenerevolution
 

Similaire à Faceted Search and Solr (20)

UKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive PersonasUKUPA Feb 08 Flow Interactive Personas
UKUPA Feb 08 Flow Interactive Personas
 
Voice of the Customer in Travel
Voice of the Customer in TravelVoice of the Customer in Travel
Voice of the Customer in Travel
 
The hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified worldThe hunt for the perfect interface in a googlified world
The hunt for the perfect interface in a googlified world
 
Prototyping and Scrum
Prototyping and ScrumPrototyping and Scrum
Prototyping and Scrum
 
Practical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it UpPractical Search with Solr: Beyond just Looking it Up
Practical Search with Solr: Beyond just Looking it Up
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
 
From post its to personas
From post its to personasFrom post its to personas
From post its to personas
 
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
LOD2 CKAN WS Vienna: PoolParty für semantische Suche und Vokabular Management...
 
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
NETNOGRAPHY vs. Web Monitoring = “Qual. VS Quant. ?”
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and Out
 
How do we create great user experiences?
How do we create great user experiences?How do we create great user experiences?
How do we create great user experiences?
 
Search and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and LearningSearch and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and Learning
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will Love
 
05 attention
05 attention05 attention
05 attention
 
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
NCompass Live: Tech Talk with Michael Sauers: Artificial Intelligence: Transf...
 
PxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysisPxS’12 - week 4 - qualitative analysis
PxS’12 - week 4 - qualitative analysis
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation Starter
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 

Plus de otisg

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)otisg
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
UIMA
UIMAUIMA
UIMAotisg
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrievalotisg
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 

Plus de otisg (6)

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)
 
Lucandra
LucandraLucandra
Lucandra
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
UIMA
UIMAUIMA
UIMA
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Faceted Search and Solr

  • 1. “Regular” Search Faceted Search Interface: ! User expresses information need as short query. Search engine returns ranked, pageable result set. New York CTO Club ! December 9, 2009 User happy when... ! Top-ranked result satisfies information need. ! At least some result on first page is relevant. Daniel Tunkelang, Google User unhappy when... Otis Gospodneti!, Sematext ! No result on first page satisfies information need. ! Results misleadingly appear relevant (bait and switch). 1 3 Agenda Relevance Is Subjective Daniel: ! What is faceted search? Relevance is defined as a measure of ! Why use faceted search? information conveyed by a document relative to ! Thoughts about design and user experience. a query. It is shown that the relationship between the Otis: ! What are Lucene and Solr? document and the query, though necessary, is ! Why use an open-source search library? not sufficient to determine relevance. ! Thoughts about implementation. William Goffman, On relevance as a measure, 1964. 2 4
  • 2. Regular Search Experience What is Faceted Search? ! Best understood through examples. " See the following slides. " Or shop on almost any ecommerce site. ! Facets = multiple ways to organize information. " Often based on available structured information. " But not always, e.g., facets obtained via text mining. ! Typical interaction: " User starts with a full-text search. " Facets guide query refinement process. 5 7 Assumptions Are Dangerous Faceted Search for News ! self-awareness tf-idf PageRank ! self-expression ! model knows best ! answer is a document ! one-shot query 6 8
  • 3. Faceted Search for People 9 Faceted Search for Breakfast But Facets are Not a Silver Bullet... ! Screen real estate is finite. " Choose facets wisely. " Choose facet values wisely for monster facets. ! Multiple selection within a facet is powerful, but... " Has to be intuitive, especially AND vs. OR. " Even trickier for hierarchical facets. ! Search relevance still matters! " Most faceted search applications rank results. " Irrelevant results " irrelevant facet refinements. 10 12
  • 4. Exploring Information Science Be Careful with Faceted Search! Cameras have artists?! 13 15 Deliver Precision and Recall Clarify, Then Refine Easier said than done! Ranking of facet values is an open research topic. 14 16
  • 5. Take-Aways What is / isn't Lucene ! Faceted search addresses the subjectivity of ! Free, ASL, Java IR library, Jar relevance and information overload. ! Doug Cutting, ASF, 2001 ! But deploying faceted search effectively ! Application agnostic: Indexing & Searching requires that you think about user experience. ! High performance, scalable ! No dependencies ! Recommended reading: ! Heavily ported " My thin book entitled Faceted Search " Marti Hearst's book on Search User Interfaces ! No: crawler, rich doc parser, turn-key solution " Peter Morville's upcoming book on Search Patterns ! No: out of the box faceted search-capability... but... 17 19 Faceted Search with Lucene & Solr Otis Gospodneti!, Sematext 18
  • 6. What is/isn't Solr Facet Field Requirements ! Indexing/Search server with HTTP API built on ! Must be indexed top of Lucene ! Often not tokenized ! Fast & scalable (distributed search, index ! Often not altered (lowercase, punctuation)# replication)# ! Storing not required ! XML, JSON, Ruby, Perl, PHP, javabin ! Multivalued fields OK ! No: crawler (but Nutch ==> Solr works)# ! Yes: rich text parser ! Yes: Faceted Search out of the box! 21 23 Solr and Faceted Search Turn It On ! 3 Types of facets: Field Values (text), Dates, ! 0 facets: Queries. ! http://host:80/solr/select?q=foo ! “Text”: return counts for all/top terms in a field ! 1 facet: for a result set - e.g. categories a la Amazon ! http://host:80/solr/select?q=foo&facet=true&facet.field=category ! Dates: return counts for docs in specified date ! N facets: ranges ! http://host:80/solr/select? q=foo&facet=true&facet.field=category&facet.field=inStock ! Queries: return counts for docs that also match ! facet=true or facet.on a given query - handy for number ranges (think prices!)# 22 24
  • 7. Text Facet Response Date Facet Response <result numFound="4" start="0"/> <result name="response" numFound="42" start="0"/> ! facet.mincount=1 to <lst name="facet_counts"> <lst name="facet_counts"> <lst name="facet_fields"> avoid 0-count facet <lst name="facet_dates"> <lst name="category"> values <lst name="timestamp"> <int name="electronics">3</int> ! facet.limit=N to limit to <int name="2007-08-11T00:00:00.000Z">1</int> <int name="copier">0</int> <int name="2007-08-12T00:00:00.000Z">5</int> top N facet values </lst> <int name="2007-08-13T00:00:00.000Z">3</int> <lst name="inStock"> ! facet.missing=true to <int name="2007-08-14T00:00:00.000Z">7</int> <int name="false">3</int> catch uncategorized <int name="2007-08-15T00:00:00.000Z">2</int> <int name="true">1</int> <int name="2007-08-16T00:00:00.000Z">16</int> </lst> ! lots of other options! <str name="gap">+1DAY</str> </lst> <date name="end">2007-08-17T00:00:00Z</date> </lst> 25 </lst> 27 Date Facets Query Facets ! http://.../solr/select/? ! http://.../solr/select? q=*:*&rows=0&facet=true&facet.date=timesta q=shoes&rows=0&facet=true&facet.field=inStoc mp&facet.date.start=NOW/DAY- k&facet.query=price: 5DAYS&facet.date.end=NOW/DAY [*+TO+500]&facet.query=price:[500+TO+*] %2B1DAY&facet.date.gap=%2B1DAY ! Avoids the bucket-at-index-time work-around ! (%2B1 ==> +1)# ! Keep queries disjoint ! Solr Date Math Parser syntax: /HOUR, +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS, +6MONTHS+3DAYS/DAY 26 28
  • 8. Query Facet Response State of Lucene & Solr <result numFound="3" start="0"/> ! Super healthy community, exploding <lst name="facet_counts"> <lst name="facet_queries"> development <int name="price:[* TO 500]">3</int> ! Lucene 3.0 – 2009-11-25: <int name="price:[500 TO *]">1</int> ! Performance, faster range queries, clean API, better </lst> Unicode support, more non-English support <lst name="facet_fields"> <lst name="inStock"> ! Solr 1.4 – 2009-11-10: <int name="false">3</int> ! Performance, new replication, Db indexing, rich-doc <int name="true">1</int> indexing, results clustering, faster response protocol, </lst> deduplication... </lst> </lst> 29 31 UI Integration Lucene, Solr, Enterprise ! Use Filter Queries via fq ! Free: Community ! http://.../solr/select? ! Lucene ~ 600 emails/month (dev: 2000/month)# q=shoes&facet=true&facet.field=category& ! Solr ~1300 emails/month (dev: 800/month)# fq=price:[0 TO 300] ! http://.../solr/select? ! Commercial: Support Subscriptions q=shoes&facet=true&facet.field=category& ! Sematext fq=price:[0 TO 300]&fq=inStock:true ! Lucid Imagination ! Important: single request does it all 30 32