SlideShare une entreprise Scribd logo
1  sur  51
Faceted Search – the 120 Million Documents Story
Who am I? ,[object Object],[object Object],[object Object]
Who are Sourcesense? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Committers and Contributors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Who is the customer? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Their story? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Solution:
The Solution: Apache Solr ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object],[object Object]
How Solr Works ,[object Object],[object Object],[object Object],[object Object],[object Object]
How Solr Works Index
How Solr Works Index Index Snapshot Active Index Reader Searches
How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active  Index Writer commit
How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active  Index Writer
How Solr Works Index Index Snapshot Index Reader Searches New Content Active  Index Writer
How Solr Distributes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 searches
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr at The Customer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Oops. OutOfMemoryError
Solr: a Java web application ,[object Object],[object Object],[object Object],[object Object]
How Solr Works Index Index Snapshot Searches New Content Active  Index Writer Active Index Reader
How Solr Works Index Index Snapshot Searches New Content Active  Index Writer cache Active Index Reader
How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content cache Active Index Reader cache commit Active  Index Writer
How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content Active  Index Writer cache Active Index Reader cache
How Solr Works Index Index Snapshot Index Reader Searches New Content Active  Index Writer cache
Optimisation #1: autowarm < listener   event = &quot;newSearcher&quot;   class = &quot;solr.QuerySenderListener&quot; > < arr   name = &quot;queries&quot; > < lst >   < str   name = &quot;q&quot; > solr </ str >   < str   name = &quot;relf&quot; > 4 </ str > < str   name = &quot;facet.field&quot; > sourceCountryCS </ str > < str   name = &quot;facet.field&quot; > entityCSPerson </ str >   < str   name = &quot;facet.field&quot; > entityCSCompany </ str >   < str   name = &quot;facet.field&quot; > entityCSProduct </ str > < str   name = &quot;facet.field&quot; > sourceCS </ str > < str   name = &quot;facet.field&quot; > authorCS </ str > < str   name = &quot;facet.field&quot; > stockTickerCS </ str > < str   name = &quot;facet.field&quot; > feedClassCS </ str > < str   name = &quot;facet.field&quot; > entityCSOrganization </ str > < str   name = &quot;facet.field&quot; > platformCS </ str > < str   name = &quot;facet.field&quot; > eventOrFactCS </ str > < str   name = &quot;facet.field&quot; > sourceRank </ str > < str   name = &quot;facet&quot; > true </ str > < str   name = &quot;facet.date&quot; > harvestDate </ str > < str   name = &quot;facet.date.start&quot; > NOW-1MONTH </ str > < str   name = &quot;facet.date.end&quot; > NOW </ str > < str   name = &quot;facet.date.gap&quot; > +24HOURS </ str > < str   name = &quot;qt&quot; > /duplicate </ str > < str   name = &quot;duplicateOrder&quot; > latest </ str > < str   name = &quot;collapseFields&quot; > duplicateGroup titleForDuplicates </ str > </ lst > </ arr > </ listener >
#2: Garbage collection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
#3: Profiling ,[object Object],[object Object],[object Object],[object Object],[object Object]
Managing So Many Hosts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator 35Gb 35Gb 35Gb
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator Entire row: 40 minutes
Content Archiving ,[object Object],[object Object],[object Object],[object Object]
Being Dynamic ,[object Object],[object Object],[object Object],[object Object]
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator archive ingestion
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator shard 1 shard 2 shard   3 co-ordinator
Solr Host Configuration shard 1 shard 2 shard   3 co-ordinator load balancer shard 1 shard 2 shard   3 co-ordinator
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
thank you [email_address]

Contenu connexe

En vedette

Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 

En vedette (20)

Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Hackathon
HackathonHackathon
Hackathon
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 

Similaire à Faceted Search – the 120 Million Documents Story

Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portaleby
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Yauheni Akhotnikau
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
 
Searching the Now
Searching the NowSearching the Now
Searching the Nowlucasjosh
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Christopher Biow
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4Jjkumaranc
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deploymentsOdoo
 
10reasons
10reasons10reasons
10reasonsLi Huan
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing ServicesLeigh Dodds
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoringMiguel Rodriguez
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheCris Holdorph
 
Sinatra and JSONQuery Web Service
Sinatra and JSONQuery Web ServiceSinatra and JSONQuery Web Service
Sinatra and JSONQuery Web Servicevvatikiotis
 

Similaire à Faceted Search – the 120 Million Documents Story (20)

Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
RESTFul IDEAS
RESTFul IDEASRESTFul IDEAS
RESTFul IDEAS
 
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
 
Searching the Now
Searching the NowSearching the Now
Searching the Now
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
LOGBack and SLF4J
LOGBack and SLF4JLOGBack and SLF4J
LOGBack and SLF4J
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deployments
 
Struts2
Struts2Struts2
Struts2
 
10reasons
10reasons10reasons
10reasons
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
 
Web::Scraper
Web::ScraperWeb::Scraper
Web::Scraper
 
Sinatra and JSONQuery Web Service
Sinatra and JSONQuery Web ServiceSinatra and JSONQuery Web Service
Sinatra and JSONQuery Web Service
 

Plus de Sourcesense

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiSourcesense
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionSourcesense
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesSourcesense
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introSourcesense
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Sourcesense
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSourcesense
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with masterSourcesense
 

Plus de Sourcesense (9)

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad Cavalcanti
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps Session
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense References
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 intro
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with master
 

Dernier

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Dernier (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Faceted Search – the 120 Million Documents Story

  • 1. Faceted Search – the 120 Million Documents Story
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. How Solr Works Index
  • 14. How Solr Works Index Index Snapshot Active Index Reader Searches
  • 15. How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active Index Writer
  • 16. How Solr Works Index Index Snapshot Active Index Reader Searches New Content Active Index Writer commit
  • 17. How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active Index Writer
  • 18. How Solr Works Index Index Snapshot Index Snapshot Index Reader Active Index Reader Searches New Content Active Index Writer
  • 19. How Solr Works Index Index Snapshot Index Reader Searches New Content Active Index Writer
  • 20.
  • 21. Solr Host Configuration shard 1 shard 2 shard 3 searches
  • 22. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  • 23. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 24. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 25. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 26.
  • 28.
  • 29. How Solr Works Index Index Snapshot Searches New Content Active Index Writer Active Index Reader
  • 30. How Solr Works Index Index Snapshot Searches New Content Active Index Writer cache Active Index Reader
  • 31. How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content cache Active Index Reader cache commit Active Index Writer
  • 32. How Solr Works Index Index Snapshot Index Snapshot Index Reader Searches New Content Active Index Writer cache Active Index Reader cache
  • 33. How Solr Works Index Index Snapshot Index Reader Searches New Content Active Index Writer cache
  • 34. Optimisation #1: autowarm < listener event = &quot;newSearcher&quot; class = &quot;solr.QuerySenderListener&quot; > < arr name = &quot;queries&quot; > < lst > < str name = &quot;q&quot; > solr </ str > < str name = &quot;relf&quot; > 4 </ str > < str name = &quot;facet.field&quot; > sourceCountryCS </ str > < str name = &quot;facet.field&quot; > entityCSPerson </ str > < str name = &quot;facet.field&quot; > entityCSCompany </ str > < str name = &quot;facet.field&quot; > entityCSProduct </ str > < str name = &quot;facet.field&quot; > sourceCS </ str > < str name = &quot;facet.field&quot; > authorCS </ str > < str name = &quot;facet.field&quot; > stockTickerCS </ str > < str name = &quot;facet.field&quot; > feedClassCS </ str > < str name = &quot;facet.field&quot; > entityCSOrganization </ str > < str name = &quot;facet.field&quot; > platformCS </ str > < str name = &quot;facet.field&quot; > eventOrFactCS </ str > < str name = &quot;facet.field&quot; > sourceRank </ str > < str name = &quot;facet&quot; > true </ str > < str name = &quot;facet.date&quot; > harvestDate </ str > < str name = &quot;facet.date.start&quot; > NOW-1MONTH </ str > < str name = &quot;facet.date.end&quot; > NOW </ str > < str name = &quot;facet.date.gap&quot; > +24HOURS </ str > < str name = &quot;qt&quot; > /duplicate </ str > < str name = &quot;duplicateOrder&quot; > latest </ str > < str name = &quot;collapseFields&quot; > duplicateGroup titleForDuplicates </ str > </ lst > </ arr > </ listener >
  • 35.
  • 36.
  • 37.
  • 38. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 39. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 40. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator 35Gb 35Gb 35Gb
  • 41. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 42. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator Entire row: 40 minutes
  • 43.
  • 44.
  • 45. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 46. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator
  • 47. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator archive ingestion
  • 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator shard 1 shard 2 shard 3 co-ordinator
  • 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer shard 1 shard 2 shard 3 co-ordinator
  • 50.