SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
SOLR, THE INTELLIGENT SEARCH ENGINE
Benoît Largeau




AGENDA:
Stakes | Introduction | Indexing | Scalability | Searching | Admin tools | Conclusion
WHAT ARE THE STAKES?

INTERNAL SEARCH ENGINE IS ESSENTIAL.

Considering:

-   One user on two is a searcher
    one on two will use the internal search engine


-   This searcher population transform more often than other visitors

-   Less patient to browse
    need to find quickly otherwise they leave to another shop




     SEARCH                     FIND                    ADD TO CART   PAY
INTRODUCTION TO SOLR.

SOLR PROJECT.

•   Open source enterprise search server
    Initiated by CNET in 2004
    Openly published the source code in 2006


•                      the underlying engine

•   Independent server using standards to communicate
    such as HTTP / XML / JSON
    usable on every web project
    such as those based on Magento
INTRODUCTION TO SOLR.

SOME REFERENCES.




 More references here: http://wiki.apache.org/solr/PublicServers
INTRODUCTION TO SOLR.

FEATURES OFFERED BY SOLR.



   Indexing data                                 Scalability
   - Index the whole site (including files, …)
   - Tolerance (stemmings, synonyms, …)




   Searching data                                Admin tools
   - Layered navigation                          Display more statistics
   - Customizable relevance calculation          (most frequent requests
   - Predictive search (different kinds)         or search with no answer)
   - Stemming, Plurals, Synonyms,
     Stop words, …
INDEXING DATA.

FEATURES OFFERED BY SOLR.



   Indexing data
   - Index the whole site (including files, …)
   - Tolerance (stemmings, synonyms, …)
INDEXING DATA.

SCHEMA & TEXT ANALYSIS.

Schema

   Define how to handle structured data
    sent by Magento (no crawler such as Nutch)
   Typing data
    price & weight are floats, product name is a string, …
       o Structured data in Solr allows faceted search
           to filter by price range for example
   Determined by the intended search behavior
    if we need to filter per price range
    -> prices have to be stored as floats and not strings to stay comparable


Text analysis
    Text splitted in terms which are processed to calculate stemming, define synonyms, …
INDEXING DATA.
INDEXING DATA.

INDEXING FILES.

 Generally indexing structured data
   e.g. products


 Able to index binary formats
   such as PDF, MS Office, images or music files


 Using an interface Solr Cell
  which is an adapter to Apache Tika

 Apache Tika is a toolkit to detect and
  extract metadata and text content from various documents
SCALABILITY.

FEATURES OFFERED BY SOLR.



                            Scalability
SCALABILITY.

DURABLE SOLUTION.

Suitably efficient and practical
when applied to large situations


With a bigger data index or more visitors
searches are slower!
Testing Solr performance with SolrMeter

Solutions to keep good performances with more data:
1. Scale up: Optimizing a single Solr server
2. Scale horizontally: Moving to multiple Solr Servers with replications
3. Scale deep: Combining replication and sharding (for distributed search)
SEARCHING DATA.

FEATURES OFFERED BY SOLR.




   Searching data
   - Layered navigation
   - Customizable relevance calculation
   - Predictive search (different kinds)
   - Stemming, Plurals, Synonyms,
     Stop words, …
SEARCHING DATA.
SEARCHING DATA.

SEARCH RELEVANCY.

Factors influencing score:

1. Term frequency
2. Inverse document frequency
   the rarer a term is in the whole index, the higher its score is.
3. Co-ordination factor
   the greater the number of query clauses that match a document.
4. Field length
   the shorter the matching field is, the greater the matching document‘s score is.
5. Boosting
   customized mathematical rules to increase score.


   In Magento, based on attribute weights
   E.g. name 5 -> manufacturer 4 -> sku 3 -> price 2 -> meta_keywords 1
ADMIN TOOLS.

FEATURES OFFERED BY SOLR.




                            Admin tools
                            Display more statistics
                            (most frequent requests
                            or search with no answer)
ADMIN TOOLS.

ADMIN FEATURES.

1) Available admin tool in solr but oriented developper
   To check schema, index, general config, Solr server availability, to view
   technical statistics…

2) Prefer to use Magento backend
   To check frequent request or no answer request
   Very helpful to analyse user expectations then to improve the catalog
CONCLUSION.

INTEGRATE SOLR IN YOUR PROJECT.

Steps:

1. Install and configure Solr
   single or multiple servers
   single or multiple languages, …

2. Adapt the standard Magento product schema
   to your project context

3. Define additional customized data to index
   such as other tables, files, …

4. Influence search relevance
   defining attribute weights

5. Integrate in Magento frontend
CONCLUSION.

COMPARISONS.

Features                                      Magento    Magento
                                              Basic SE   with Solr
Product indexing                                 ▲           ▲
Document indexing                                            ▲
Synonyms                                         ▲           ▲
Stemming                                                     ▲
Stop words                                                   ▲
Faceted search                                   ▲           ▲
Relevance calculation                            ▲           ▲
Customizable relevance calculation                           ▲
Scalability                                                  ▲
Predictive search                                            ▲
Admin tools (frequent requests, no answer…)      ▲           ▲
No extra time needed to integrate                ▲
CONCLUSION.

Remember: 1 user on 2 is a searcher!


                   SOLR
                   clearly improves
                   User experience
                   which increases your
                   Transformation Rate
CUSTOMER RELATIONSHIP MANAGEMENT
ELECTRONIC COMMERCE
ONLINE MARKETING




CS2 AG
PLATINUM MEMBER TYPO3 ASSOCIATION
MAGENTO GOLD PARTNER
SUGAR SILVER PARTNER

Gerbegässlein 1 | CH-4450 Sissach
Feldeggstrasse 55 | CH-8008 Zürich
Telefon: +41 61 333 22 22
Twitter: @CS2switzerland
www.CS2.ch

Contenu connexe

Similaire à Solr the intelligent search engine

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic
 
Introduction to enterprise search
Introduction to enterprise searchIntroduction to enterprise search
Introduction to enterprise searchUsama Nada
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesDr. Haxel Consult
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
FAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptxFAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptxBen Gardner
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer OverviewGridlogics
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Sematext Group, Inc.
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStartSumo Logic
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should HaveOracle Korea
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceLucidworks
 

Similaire à Solr the intelligent search engine (20)

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Introduction to enterprise search
Introduction to enterprise searchIntroduction to enterprise search
Introduction to enterprise search
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
II-PIC 2017 in Bangalore
II-PIC 2017 in BangaloreII-PIC 2017 in Bangalore
II-PIC 2017 in Bangalore
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
FAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptxFAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptx
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStart
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should Have
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 

Plus de CS2 AG

It Takes Two
It Takes TwoIt Takes Two
It Takes TwoCS2 AG
 
Update TYPO3 V4.5 > 6.2 LTS
Update TYPO3 V4.5 > 6.2 LTSUpdate TYPO3 V4.5 > 6.2 LTS
Update TYPO3 V4.5 > 6.2 LTSCS2 AG
 
22 Web Tipps
22 Web Tipps22 Web Tipps
22 Web TippsCS2 AG
 
TYPO3 | Das zukunftsichere Enterprise CMS
TYPO3 | Das zukunftsichere Enterprise CMSTYPO3 | Das zukunftsichere Enterprise CMS
TYPO3 | Das zukunftsichere Enterprise CMSCS2 AG
 
Increase your conversion rate
Increase your conversion rateIncrease your conversion rate
Increase your conversion rateCS2 AG
 
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...CS2 AG
 
TYPO3 4.5 LTS - Was ist neu?
TYPO3 4.5 LTS - Was ist neu?TYPO3 4.5 LTS - Was ist neu?
TYPO3 4.5 LTS - Was ist neu?CS2 AG
 
TYPO3 Version 4.5 LTS - Preview / Vorschau
TYPO3 Version 4.5 LTS - Preview / VorschauTYPO3 Version 4.5 LTS - Preview / Vorschau
TYPO3 Version 4.5 LTS - Preview / VorschauCS2 AG
 
TYPO3 Version 4.4 Neuerungen
TYPO3 Version 4.4 NeuerungenTYPO3 Version 4.4 Neuerungen
TYPO3 Version 4.4 NeuerungenCS2 AG
 

Plus de CS2 AG (9)

It Takes Two
It Takes TwoIt Takes Two
It Takes Two
 
Update TYPO3 V4.5 > 6.2 LTS
Update TYPO3 V4.5 > 6.2 LTSUpdate TYPO3 V4.5 > 6.2 LTS
Update TYPO3 V4.5 > 6.2 LTS
 
22 Web Tipps
22 Web Tipps22 Web Tipps
22 Web Tipps
 
TYPO3 | Das zukunftsichere Enterprise CMS
TYPO3 | Das zukunftsichere Enterprise CMSTYPO3 | Das zukunftsichere Enterprise CMS
TYPO3 | Das zukunftsichere Enterprise CMS
 
Increase your conversion rate
Increase your conversion rateIncrease your conversion rate
Increase your conversion rate
 
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...
SOM Campus Talk: Social. Driving, Stringent. Von 0 auf 100 im B2B Online-Mark...
 
TYPO3 4.5 LTS - Was ist neu?
TYPO3 4.5 LTS - Was ist neu?TYPO3 4.5 LTS - Was ist neu?
TYPO3 4.5 LTS - Was ist neu?
 
TYPO3 Version 4.5 LTS - Preview / Vorschau
TYPO3 Version 4.5 LTS - Preview / VorschauTYPO3 Version 4.5 LTS - Preview / Vorschau
TYPO3 Version 4.5 LTS - Preview / Vorschau
 
TYPO3 Version 4.4 Neuerungen
TYPO3 Version 4.4 NeuerungenTYPO3 Version 4.4 Neuerungen
TYPO3 Version 4.4 Neuerungen
 

Dernier

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Solr the intelligent search engine

  • 1. SOLR, THE INTELLIGENT SEARCH ENGINE Benoît Largeau AGENDA: Stakes | Introduction | Indexing | Scalability | Searching | Admin tools | Conclusion
  • 2. WHAT ARE THE STAKES? INTERNAL SEARCH ENGINE IS ESSENTIAL. Considering: - One user on two is a searcher one on two will use the internal search engine - This searcher population transform more often than other visitors - Less patient to browse need to find quickly otherwise they leave to another shop SEARCH FIND ADD TO CART PAY
  • 3. INTRODUCTION TO SOLR. SOLR PROJECT. • Open source enterprise search server Initiated by CNET in 2004 Openly published the source code in 2006 • the underlying engine • Independent server using standards to communicate such as HTTP / XML / JSON usable on every web project such as those based on Magento
  • 4. INTRODUCTION TO SOLR. SOME REFERENCES. More references here: http://wiki.apache.org/solr/PublicServers
  • 5. INTRODUCTION TO SOLR. FEATURES OFFERED BY SOLR. Indexing data Scalability - Index the whole site (including files, …) - Tolerance (stemmings, synonyms, …) Searching data Admin tools - Layered navigation Display more statistics - Customizable relevance calculation (most frequent requests - Predictive search (different kinds) or search with no answer) - Stemming, Plurals, Synonyms, Stop words, …
  • 6. INDEXING DATA. FEATURES OFFERED BY SOLR. Indexing data - Index the whole site (including files, …) - Tolerance (stemmings, synonyms, …)
  • 7. INDEXING DATA. SCHEMA & TEXT ANALYSIS. Schema  Define how to handle structured data sent by Magento (no crawler such as Nutch)  Typing data price & weight are floats, product name is a string, … o Structured data in Solr allows faceted search to filter by price range for example  Determined by the intended search behavior if we need to filter per price range -> prices have to be stored as floats and not strings to stay comparable Text analysis Text splitted in terms which are processed to calculate stemming, define synonyms, …
  • 9. INDEXING DATA. INDEXING FILES.  Generally indexing structured data e.g. products  Able to index binary formats such as PDF, MS Office, images or music files  Using an interface Solr Cell which is an adapter to Apache Tika  Apache Tika is a toolkit to detect and extract metadata and text content from various documents
  • 10. SCALABILITY. FEATURES OFFERED BY SOLR. Scalability
  • 11. SCALABILITY. DURABLE SOLUTION. Suitably efficient and practical when applied to large situations With a bigger data index or more visitors searches are slower! Testing Solr performance with SolrMeter Solutions to keep good performances with more data: 1. Scale up: Optimizing a single Solr server 2. Scale horizontally: Moving to multiple Solr Servers with replications 3. Scale deep: Combining replication and sharding (for distributed search)
  • 12. SEARCHING DATA. FEATURES OFFERED BY SOLR. Searching data - Layered navigation - Customizable relevance calculation - Predictive search (different kinds) - Stemming, Plurals, Synonyms, Stop words, …
  • 14. SEARCHING DATA. SEARCH RELEVANCY. Factors influencing score: 1. Term frequency 2. Inverse document frequency the rarer a term is in the whole index, the higher its score is. 3. Co-ordination factor the greater the number of query clauses that match a document. 4. Field length the shorter the matching field is, the greater the matching document‘s score is. 5. Boosting customized mathematical rules to increase score. In Magento, based on attribute weights E.g. name 5 -> manufacturer 4 -> sku 3 -> price 2 -> meta_keywords 1
  • 15. ADMIN TOOLS. FEATURES OFFERED BY SOLR. Admin tools Display more statistics (most frequent requests or search with no answer)
  • 16. ADMIN TOOLS. ADMIN FEATURES. 1) Available admin tool in solr but oriented developper To check schema, index, general config, Solr server availability, to view technical statistics… 2) Prefer to use Magento backend To check frequent request or no answer request Very helpful to analyse user expectations then to improve the catalog
  • 17. CONCLUSION. INTEGRATE SOLR IN YOUR PROJECT. Steps: 1. Install and configure Solr single or multiple servers single or multiple languages, … 2. Adapt the standard Magento product schema to your project context 3. Define additional customized data to index such as other tables, files, … 4. Influence search relevance defining attribute weights 5. Integrate in Magento frontend
  • 18. CONCLUSION. COMPARISONS. Features Magento Magento Basic SE with Solr Product indexing ▲ ▲ Document indexing ▲ Synonyms ▲ ▲ Stemming ▲ Stop words ▲ Faceted search ▲ ▲ Relevance calculation ▲ ▲ Customizable relevance calculation ▲ Scalability ▲ Predictive search ▲ Admin tools (frequent requests, no answer…) ▲ ▲ No extra time needed to integrate ▲
  • 19. CONCLUSION. Remember: 1 user on 2 is a searcher! SOLR clearly improves User experience which increases your Transformation Rate
  • 20. CUSTOMER RELATIONSHIP MANAGEMENT ELECTRONIC COMMERCE ONLINE MARKETING CS2 AG PLATINUM MEMBER TYPO3 ASSOCIATION MAGENTO GOLD PARTNER SUGAR SILVER PARTNER Gerbegässlein 1 | CH-4450 Sissach Feldeggstrasse 55 | CH-8008 Zürich Telefon: +41 61 333 22 22 Twitter: @CS2switzerland www.CS2.ch