SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Digging into solr
Rails Usergroup Hamburg 13. April 2011
Overview
●   What is solr
●   Solr integration into Rails
●   Challenges for the search
●   Experiences
What is solr
●   Matthew 7:7b / Lukas 11:9b
●   (sermon on the Mount)
●   seek and you will find;
What is solr
What is solr
                           HTTP Request Servlet                     Update Servlet

Admin
                                                                        XML
                          Different Request Handler
                                                                       Update


                 schema
                                                      caching
        config                       Solr Core
                                                                concurrency



                                    Lucene

                                                                      Replication
What is solr
●   Unstructured rows
●   Denormalization of data
●   Dynamic fields
●   Schema → Tokenizer, Filters, etc.
●   Tons of XML
What is solr

          Indexing                                      Query


                                               Filter   Tokenizer Query
Tokenizer Token   Filter   Strings


                                     Index

                                             Results
What is solr
●   Get Requests
hl.fragsize=0
&spellcheck=true
&spellcheck.extendedResults=true
&qf=everything_phonetic_wa^1+display_name_phonetic_wa^2+comment_en_wa^4+revi
ew_en_wa^8+everything_en_wa^16+everything_wa^32+display_name_en_wa^64+displ
ay_name_wa^128
&spellcheck.collate=true
&wt=ruby
&hl=true
&rows=100
&f =pk_i,score
  l
&start=0
&q=chipotle+bbq
&spellcheck.dictionary=spell_en
&bf=linear(en_rating_points_i,100,0)
&spellcheck.count=1
&qt=dismax&
fq=closed_b:false+AND+domain_id_s:uki*+AND+(type_s:Place)
What is solr
●   Response type
    ●   XML
    ●   Ruby
    ●   JSON
    ●   XML + XSLT
    ●   etc.
Solr integration into Rails
●   Sunspot
●   acts_as_solr
●   Qype → acts_as_solr
●   Optimized Queries for solr
    ●   Monkey patching
    ●   Defined queries without dynamic fields
    ●   Names of search fields differ from AR names
Solr integration into Rails
●   Data consistency
    ●   Synchronous
        –   AR stores in mysql and solr
        –   Longer response times
        –   Not really synchron in case of replication
    ●   Asynchronous
        –   AR stores in mysql
        –   Data import via mysql requests by solr master
        –   Out of sync for some minutes
        –   Deletion by flag, later physically
        –   Javascript preprocessing of data possible
Challenges - Spellchecking
●   Pool of words for spellchecking
    Words from real data

                                           ?
●


●   Beeeeeeer
●   9 Languages                            CC BY-ND 2.0 - JM3


●   New → Spellchecker for different kind of data
●   Suggestion → Locator → Facet → best match ?
●   Similar word → fuzzy search vs. spellchecking
Challenges - Spellchecking

                                                           Chipotle BBQ
CC BY-ND 2.0
 raybdbomb          CC BY-ND 2.0 - Meindert Arnold Jacob




Chinese Baby
                                                                CC BY-ND 2.0 - joshDubya




        !      CC BY-ND 2.0 - michael clarke stuff
                                                           shingles
Challenges – Stemming
●   Stemming vs. Lemmatizing
●   9 Languages
●   Hafen – Hafer (Harbor – Oat)
●   Performance
●   Stemming → solr SnowBallPorterFactory
●   Polish → Lemmatizng → OpenOffice
Challenges – Synonyms
●   9 Languages
●   OpenOffice rules !
●   Not all languages available → NL is missing
Challenges – NGrams
●   Hugh Index
●   Tee matches Steeb
●   EdgeNGrams
●   Bar → Sofabar → Barmbek
    ●   Not matched string shall be a word → performance
Challenges – Phrases
●   Boost matching of phrases → whole entry
    ●   'Europa Passage'
●   Boost matching of phrases → left sided
    ●   'Galeria Kaufhof in Hamburg'
    ●   'Boutique in Galeria Kaufhof'
    ●   Javascript pre processing
●   Boost matching of phrase somewhere in entry
●   How to handle matches of some words in given
    phrase?
Challenges – Whitespace in index
●   Index: 'Ping Pong'
●   Search word: 'Pingpong'
●   Javascript pre processing


                                     CC BY-ND 2.0 - zimpenfish




             CC BY-ND 2.0 - Ewan-M
Experiences – sever setup
               Live                Staging      Dev
            Loadbalancer            Slave        iMac

 Solr queries
                                    Master
   Slave        Slave      Slave

Replication                                   Solr & MySql
                                   DB Slave
               Master

           Import
              DB Slave
Experiences – size of indices
●   Staging System → Sunday evening
●   Places in simple format: 712 MB
●   Previews simple format: 5,519 GByte
●   Places Previews Comments extended: 3,5 GB
●   Big Spellchecker: 16 GByte
●   New combined index: 15 GByte
    ●   Index: 14 Gbyte
    ●   Spellchecker: 1 GByte
Experiences – server setup
●   Live Servers
●   2 x 8 Cores, 2 x 16 Cores
●   32 Gbyte RAM
●   Max. CPU usage: up to 500%
●   Solr loves RAM → 32 Gbyte full with cache
Experiences – Solr loves RAM
●   Dev → 1 Gig
●   Staging → 4.5 Gig (no load)
●   Import → 11 Gig and more
●   Production → 14 Gig
Experiences – Solr loves RAM prod.
              slave
Experiences – accesses
●   More than ~60 requests per seconds are not
    recommended
●   Max of 40 requests per seconds is OK
Experiences – accesses
Experiences – CPU load
●   Last Import → up to 250 %
●   Production (slave):
Experiences – Response times
Experiences – Response times
●   Spellchecking 'pizzt' big index (staging):
●   1502 / 48 / 47 / 48 / 31 ms
●   Spellchecking 'pizzt' small index (staging):
●   603 / 12 / 8 / 9 / 9 ms
Experiences – Response times
●   Facet for spellchecking:
●   facet=true&facet.mincount=0&facet.limit=1&wt=ruby&rows=0&fl=pk_i,score&
    facet.query=comment_de_wa:"pizza"+OR+review_de_wa:"pizza"+OR+everything_de_wa:"pizza"+OR+everything_wa:"pizza"+
    OR+display_name_de_wa:"pizza"+OR+display_name_wa:"pizza"+OR+display_name_ngram:"pizza"&
    facet.query=comment_de_wa:"pizze"+OR+review_de_wa:"pizze"+OR+everything_de_wa:"pizze"+OR+everything_wa:"pizze"+
    OR+display_name_de_wa:"pizze"+OR+display_name_wa:"pizze"+OR+display_name_ngram:"pizze"&
    facet.query=comment_de_wa:"pizz"+OR+review_de_wa:"pizz"+OR+everything_de_wa:"pizz"+OR+everything_wa:"pizz"+OR+di
    splay_name_de_wa:"pizz"+OR+display_name_wa:"pizz"+OR+display_name_ngram:"pizz"&
    facet.query=comment_de_wa:"pizzi"+OR+review_de_wa:"pizzi"+OR+everything_de_wa:"pizzi"+OR+everything_wa:"pizzi"+OR+
    display_name_de_wa:"pizzi"+OR+display_name_wa:"pizzi"+OR+display_name_ngram:"pizzi"&
    facet.query=comment_de_wa:"pizzs"+OR+review_de_wa:"pizzs"+OR+everything_de_wa:"pizzs"+OR+everything_wa:"pizzs"+O
    R+display_name_de_wa:"pizzs"+OR+display_name_wa:"pizzs"+OR+display_name_ngram:"pizzs"&f
    facet.query=comment_de_wa:"pizzo"+OR+review_de_wa:"pizzo"+OR+everything_de_wa:"pizzo"+OR+everything_wa:"pizzo"+
    OR+display_name_de_wa:"pizzo"+OR+display_name_wa:"pizzo"+OR+display_name_ngram:"pizzo"&
    facet.query=comment_de_wa:"pizzy"+OR+review_de_wa:"pizzy"+OR+everything_de_wa:"pizzy"+OR+everything_wa:"pizzy"+O
    R+display_name_de_wa:"pizzy"+OR+display_name_wa:"pizzy"+OR+display_name_ngram:"pizzy"&
    facet.query=comment_de_wa:"pizzn"+OR+review_de_wa:"pizzn"+OR+everything_de_wa:"pizzn"+OR+everything_wa:"pizzn"+
    OR+display_name_de_wa:"pizzn"+OR+display_name_wa:"pizzn"+OR+display_name_ngram:"pizzn"&
    facet.query=comment_de_wa:"pezzt"+OR+review_de_wa:"pezzt"+OR+everything_de_wa:"pezzt"+OR+everything_wa:"pezzt"+
    OR+display_name_de_wa:"pezzt"+OR+display_name_wa:"pezzt"+OR+display_name_ngram:"pezzt"&
    facet.query=comment_de_wa:"pizz√§"+OR+review_de_wa:"pizz√§"+OR+everything_de_wa:"pizz√§"+OR+everything_wa:"pizz√
    §"+OR+display_name_de_wa:"pizz√§"+OR+display_name_wa:"pizz√§"+OR+display_name_ngram:"pizz√§"&
    q=*:*&qt=standard&fq=closed_b:false+AND+domain_id_s:de600-hamburg*+AND+(type_s:Place)


●   10 facets: 231 / 5 /4 / 22 / 3(->xml) ms
Experiences – Response times

●   Warming up → Staging vs. Production
●   Staging: slow
●   Production: fast
Experiences – Response times

●   Staging / index schama on prod
●   Standard Query 'pizza': 106 / 0 / 0 (9122)
●   Fuzzy (pizza~0.3): 4440 / 663 / 0 (40149)
●   Fuzzy (pizza~0.5): 822 / 0 / 0    (12129)
●   Fuzzy (pizza~0.8): 34 / 1 / 0     (9122)
●   Wildcard: (rest*): 39 / 0 / 0      (41031)
Experiences - Monitoring
●   Munin
●   New Relic

Contenu connexe

Similaire à Solr rug

Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
Lucidworks (Archived)
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
MongoDB
 

Similaire à Solr rug (20)

Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Use all the buzzwords
Use all the buzzwordsUse all the buzzwords
Use all the buzzwords
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
mtl_rubykaigi
mtl_rubykaigimtl_rubykaigi
mtl_rubykaigi
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)
 
Erlang White Label
Erlang White LabelErlang White Label
Erlang White Label
 
MapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubyMapReduce with Hadoop and Ruby
MapReduce with Hadoop and Ruby
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
10 EZ Steps to SOLR Domination - Berlin Buzzwords 201210 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Solr rug

  • 1. Digging into solr Rails Usergroup Hamburg 13. April 2011
  • 2. Overview ● What is solr ● Solr integration into Rails ● Challenges for the search ● Experiences
  • 3. What is solr ● Matthew 7:7b / Lukas 11:9b ● (sermon on the Mount) ● seek and you will find;
  • 5. What is solr HTTP Request Servlet Update Servlet Admin XML Different Request Handler Update schema caching config Solr Core concurrency Lucene Replication
  • 6. What is solr ● Unstructured rows ● Denormalization of data ● Dynamic fields ● Schema → Tokenizer, Filters, etc. ● Tons of XML
  • 7. What is solr Indexing Query Filter Tokenizer Query Tokenizer Token Filter Strings Index Results
  • 8. What is solr ● Get Requests hl.fragsize=0 &spellcheck=true &spellcheck.extendedResults=true &qf=everything_phonetic_wa^1+display_name_phonetic_wa^2+comment_en_wa^4+revi ew_en_wa^8+everything_en_wa^16+everything_wa^32+display_name_en_wa^64+displ ay_name_wa^128 &spellcheck.collate=true &wt=ruby &hl=true &rows=100 &f =pk_i,score l &start=0 &q=chipotle+bbq &spellcheck.dictionary=spell_en &bf=linear(en_rating_points_i,100,0) &spellcheck.count=1 &qt=dismax& fq=closed_b:false+AND+domain_id_s:uki*+AND+(type_s:Place)
  • 9. What is solr ● Response type ● XML ● Ruby ● JSON ● XML + XSLT ● etc.
  • 10. Solr integration into Rails ● Sunspot ● acts_as_solr ● Qype → acts_as_solr ● Optimized Queries for solr ● Monkey patching ● Defined queries without dynamic fields ● Names of search fields differ from AR names
  • 11. Solr integration into Rails ● Data consistency ● Synchronous – AR stores in mysql and solr – Longer response times – Not really synchron in case of replication ● Asynchronous – AR stores in mysql – Data import via mysql requests by solr master – Out of sync for some minutes – Deletion by flag, later physically – Javascript preprocessing of data possible
  • 12. Challenges - Spellchecking ● Pool of words for spellchecking Words from real data ? ● ● Beeeeeeer ● 9 Languages CC BY-ND 2.0 - JM3 ● New → Spellchecker for different kind of data ● Suggestion → Locator → Facet → best match ? ● Similar word → fuzzy search vs. spellchecking
  • 13. Challenges - Spellchecking Chipotle BBQ CC BY-ND 2.0 raybdbomb CC BY-ND 2.0 - Meindert Arnold Jacob Chinese Baby CC BY-ND 2.0 - joshDubya ! CC BY-ND 2.0 - michael clarke stuff shingles
  • 14. Challenges – Stemming ● Stemming vs. Lemmatizing ● 9 Languages ● Hafen – Hafer (Harbor – Oat) ● Performance ● Stemming → solr SnowBallPorterFactory ● Polish → Lemmatizng → OpenOffice
  • 15. Challenges – Synonyms ● 9 Languages ● OpenOffice rules ! ● Not all languages available → NL is missing
  • 16. Challenges – NGrams ● Hugh Index ● Tee matches Steeb ● EdgeNGrams ● Bar → Sofabar → Barmbek ● Not matched string shall be a word → performance
  • 17. Challenges – Phrases ● Boost matching of phrases → whole entry ● 'Europa Passage' ● Boost matching of phrases → left sided ● 'Galeria Kaufhof in Hamburg' ● 'Boutique in Galeria Kaufhof' ● Javascript pre processing ● Boost matching of phrase somewhere in entry ● How to handle matches of some words in given phrase?
  • 18. Challenges – Whitespace in index ● Index: 'Ping Pong' ● Search word: 'Pingpong' ● Javascript pre processing CC BY-ND 2.0 - zimpenfish CC BY-ND 2.0 - Ewan-M
  • 19. Experiences – sever setup Live Staging Dev Loadbalancer Slave iMac Solr queries Master Slave Slave Slave Replication Solr & MySql DB Slave Master Import DB Slave
  • 20. Experiences – size of indices ● Staging System → Sunday evening ● Places in simple format: 712 MB ● Previews simple format: 5,519 GByte ● Places Previews Comments extended: 3,5 GB ● Big Spellchecker: 16 GByte ● New combined index: 15 GByte ● Index: 14 Gbyte ● Spellchecker: 1 GByte
  • 21. Experiences – server setup ● Live Servers ● 2 x 8 Cores, 2 x 16 Cores ● 32 Gbyte RAM ● Max. CPU usage: up to 500% ● Solr loves RAM → 32 Gbyte full with cache
  • 22. Experiences – Solr loves RAM ● Dev → 1 Gig ● Staging → 4.5 Gig (no load) ● Import → 11 Gig and more ● Production → 14 Gig
  • 23. Experiences – Solr loves RAM prod. slave
  • 24. Experiences – accesses ● More than ~60 requests per seconds are not recommended ● Max of 40 requests per seconds is OK
  • 26. Experiences – CPU load ● Last Import → up to 250 % ● Production (slave):
  • 28. Experiences – Response times ● Spellchecking 'pizzt' big index (staging): ● 1502 / 48 / 47 / 48 / 31 ms ● Spellchecking 'pizzt' small index (staging): ● 603 / 12 / 8 / 9 / 9 ms
  • 29. Experiences – Response times ● Facet for spellchecking: ● facet=true&facet.mincount=0&facet.limit=1&wt=ruby&rows=0&fl=pk_i,score& facet.query=comment_de_wa:"pizza"+OR+review_de_wa:"pizza"+OR+everything_de_wa:"pizza"+OR+everything_wa:"pizza"+ OR+display_name_de_wa:"pizza"+OR+display_name_wa:"pizza"+OR+display_name_ngram:"pizza"& facet.query=comment_de_wa:"pizze"+OR+review_de_wa:"pizze"+OR+everything_de_wa:"pizze"+OR+everything_wa:"pizze"+ OR+display_name_de_wa:"pizze"+OR+display_name_wa:"pizze"+OR+display_name_ngram:"pizze"& facet.query=comment_de_wa:"pizz"+OR+review_de_wa:"pizz"+OR+everything_de_wa:"pizz"+OR+everything_wa:"pizz"+OR+di splay_name_de_wa:"pizz"+OR+display_name_wa:"pizz"+OR+display_name_ngram:"pizz"& facet.query=comment_de_wa:"pizzi"+OR+review_de_wa:"pizzi"+OR+everything_de_wa:"pizzi"+OR+everything_wa:"pizzi"+OR+ display_name_de_wa:"pizzi"+OR+display_name_wa:"pizzi"+OR+display_name_ngram:"pizzi"& facet.query=comment_de_wa:"pizzs"+OR+review_de_wa:"pizzs"+OR+everything_de_wa:"pizzs"+OR+everything_wa:"pizzs"+O R+display_name_de_wa:"pizzs"+OR+display_name_wa:"pizzs"+OR+display_name_ngram:"pizzs"&f facet.query=comment_de_wa:"pizzo"+OR+review_de_wa:"pizzo"+OR+everything_de_wa:"pizzo"+OR+everything_wa:"pizzo"+ OR+display_name_de_wa:"pizzo"+OR+display_name_wa:"pizzo"+OR+display_name_ngram:"pizzo"& facet.query=comment_de_wa:"pizzy"+OR+review_de_wa:"pizzy"+OR+everything_de_wa:"pizzy"+OR+everything_wa:"pizzy"+O R+display_name_de_wa:"pizzy"+OR+display_name_wa:"pizzy"+OR+display_name_ngram:"pizzy"& facet.query=comment_de_wa:"pizzn"+OR+review_de_wa:"pizzn"+OR+everything_de_wa:"pizzn"+OR+everything_wa:"pizzn"+ OR+display_name_de_wa:"pizzn"+OR+display_name_wa:"pizzn"+OR+display_name_ngram:"pizzn"& facet.query=comment_de_wa:"pezzt"+OR+review_de_wa:"pezzt"+OR+everything_de_wa:"pezzt"+OR+everything_wa:"pezzt"+ OR+display_name_de_wa:"pezzt"+OR+display_name_wa:"pezzt"+OR+display_name_ngram:"pezzt"& facet.query=comment_de_wa:"pizz√§"+OR+review_de_wa:"pizz√§"+OR+everything_de_wa:"pizz√§"+OR+everything_wa:"pizz√ §"+OR+display_name_de_wa:"pizz√§"+OR+display_name_wa:"pizz√§"+OR+display_name_ngram:"pizz√§"& q=*:*&qt=standard&fq=closed_b:false+AND+domain_id_s:de600-hamburg*+AND+(type_s:Place) ● 10 facets: 231 / 5 /4 / 22 / 3(->xml) ms
  • 30. Experiences – Response times ● Warming up → Staging vs. Production ● Staging: slow ● Production: fast
  • 31. Experiences – Response times ● Staging / index schama on prod ● Standard Query 'pizza': 106 / 0 / 0 (9122) ● Fuzzy (pizza~0.3): 4440 / 663 / 0 (40149) ● Fuzzy (pizza~0.5): 822 / 0 / 0 (12129) ● Fuzzy (pizza~0.8): 34 / 1 / 0 (9122) ● Wildcard: (rest*): 39 / 0 / 0 (41031)
  • 32. Experiences - Monitoring ● Munin ● New Relic