SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
How SOLR Search Works
Rajat Jain - 20th Dec, 2016
Agenda
• What do you mean by Search?
• Search Requirements
• Comparison of SOLR with SQL/NoSQL
• SOLR Architecture
• SOLR Usage in Trellis
• How Google Search Works
• Other Search Technologies
What do you mean by Search?
What do you mean by Search?
What do you mean by Search?
Search Requirements
• Text Search – eg. “Architects”
• Filters – eg. “In New Delhi”, “iOS”
• Sorting – eg. “Best Match”, “Highest Rating”, etc.
• And More..
• Facets
• Stemming
• Fuzzy Matching
• Image Search, etc.
Search Requirements
• Full Text Search
• Fast reads (writes can be slower)
• Various Combinations of Filters
• Various Combinations of Sorting
• Non Features:
• Real-time – usually staleness is not a problem
• Data Integrity – usually not a source of storage – can be ‘lossy’
Search Requirements – Faceted Search
• A Type of Filtering with
suggestions
• In most cases – sorted by
number
• Basically helps the user to
narrow down the search without
having to ‘guess’ how to narrow
it
Conventional Storage for Search
• SQL (MySQL)
• Relational Tables
• Normalized Data
• Assuming using Keys / Indexes for reads & writes
• Optimized for reads and writes & transactional data (acid transactions)
• Lots of security, etc.
• Table Data stored in File System
• Indexing - Individual columns – set of columns
• Full Text search – recent addition (full text index)
Conventional Storage for Search
• No SQL (think MongoDB)
• Key Value Pairs
• De-normalized Data
• Unstructured Data
• Optimized for Reads – writes can be slightly slower (in case of transactional)
• Data stored in File System
• Indexing – individual fields
• Full Text Search – has in-built support
Advantages of SOLR over MySQL/NoSQL
• Reversed Index
• Mind-blowing Text-analysis / stemming / scoring / fuzziness
• Weighting fields / boosting – custom scoring functions
• Single document concept – no relations (in general)
• Faceting support out-of-the box
• Optimized for search and search alone (at scale without performance
drop)
SOLR Architecture – Indexing
• Take a ‘document’ / field, etc.
• For each field apply set of filters
/ tokenizers
• Convert to individual tokens
• Update the ‘inverted’ index
based on the tokens
• In general in the Index keep
track of stats, etc. for the various
terms
• Different indexes per field
SOLR Architecture - Indexing
13
XML Update
Handler
CSV Update
Handler
/update /update/csv
XML Update
with custom
processor chain
/update/xml
Extracting
RequestHandler
(PDF, Word, …)
/update/extract
Lucene Index
Data Import
Handler
Database pull
RSS pull
Simple
transforms
SQL DB
RSS
feed
<doc>
<title>
Remove Duplicates
processor
Logging
processor
Index
processor
Custom Transform
processor
PDF
HTTP POST
HTTP POST
pull
pull
Update Processor Chain (per handler)
Lucene
Text Index
Analyzers
SOLR Architecture – Searching
• User enters query
• Parse the query, i.e. apply the
required filters and tokenizers
• Converted to tokens
• Parallel search across multiple
indexes (per field)
• Score all the documents
• Sort in async fashion
SOLR Architecture - Full
SOLR Architecture – Updating Index
• Types of Index Updates
• Instant Index
• Incremental Indexing
• Full Indexing
• Index Update Strategies
• Instant / Incremental Index cannot happen continuously
• Too much causes performance degradation
• Full Index periodically to optimize the index
SOLR Architecture – Scalability
• Sharding
• Splitting collections across servers
– search in parallel
• Replication
• More than one copy of the data
for failover
• SolrCloud
• Using Zookeeper for managing
clusters
SOLR Architecture – Other Features
• Stemming
• Identify root word and variations of the word, eg. "stems", "stemmer",
"stemming", "stemmed" as based on "stem"
• Fuzzy Matching
• Similar Words / Misspellings
• Edit Distance
• NLP
• Identify Entities / Nouns in Search Query
• OpenNLP Plugin for SOLR
• And much more…
SOLR Usage in Trellis
• Architecture
• Data-in from MySQL
• Index Update Strategy
• AutoComplete
• Basic Search
• Advanced Search
• Filters / Sorting / Facets & More
• Demo (Incl. Config Files)
How Google Search Works
• Crawling
• Robots.txt
• Indexing
• Multiple Indexes – Instant / Daily / Weekly / Long Tail
• Searching
• NLP, Stemming, Auto-correct, etc.
• Ranking – PageRank
• Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
Other Search Technologies
• ElasticSearch
• Much newer than Solr
• Built-in scalability
• Uses same Lucene as the base
• JSON instead of XML
• Good for Analytical querying
• Others
• Splunk
• Sphinx
That’s All Folks
References
• SOLR Home Page -
http://lucene.apache.org/solr/
• Tutorials
• http://www.solrtutorial.com/index.h
tml
• https://lucene.apache.org/solr/4_10
_0/tutorial.html
• Just Google the rest!!

Contenu connexe

Tendances

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationOri Reshef
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinDatabricks
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0Databricks
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookDatabricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 

Tendances (20)

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 

Similaire à How Solr Search Works

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyBozhidar Bozhanov
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuningNIKHIL DUBEY
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSaaS Is Beautiful
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution designAlexander Tokarev
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced editionAlexander Tokarev
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 

Similaire à How Solr Search Works (20)

Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Solr
SolrSolr
Solr
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 

Plus de Atlogys Technical Consulting

BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesAtlogys Technical Consulting
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)Atlogys Technical Consulting
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkAtlogys Technical Consulting
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys Technical Consulting
 

Plus de Atlogys Technical Consulting (20)

Latest UI guidelines for Web Apps
Latest UI guidelines for Web AppsLatest UI guidelines for Web Apps
Latest UI guidelines for Web Apps
 
Discipline at Atlogys
Discipline at AtlogysDiscipline at Atlogys
Discipline at Atlogys
 
Reprogram your mind for Positive Thinking
Reprogram your mind for Positive ThinkingReprogram your mind for Positive Thinking
Reprogram your mind for Positive Thinking
 
Docker @ Atlogys
Docker @ AtlogysDocker @ Atlogys
Docker @ Atlogys
 
Tests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure AppsTests for Scalable, Fast, Secure Apps
Tests for Scalable, Fast, Secure Apps
 
Atomic Design with PatternLabs
Atomic Design with PatternLabsAtomic Design with PatternLabs
Atomic Design with PatternLabs
 
Git and Version Control at Atlogys
Git and Version Control at AtlogysGit and Version Control at Atlogys
Git and Version Control at Atlogys
 
Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)Guidelines HTML5 & CSS3 - Atlogys (2018)
Guidelines HTML5 & CSS3 - Atlogys (2018)
 
Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys Rabbit MQ - Tech Talk at Atlogys
Rabbit MQ - Tech Talk at Atlogys
 
BDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy SeriesBDD and Test Automation Tech Talk - Atlogys Academy Series
BDD and Test Automation Tech Talk - Atlogys Academy Series
 
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
QA Best Practices at Atlogys - Tech Talk (Atlogys Academy)
 
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech TalkInfinite Scaling using Lambda and Aws - Atlogys Tech Talk
Infinite Scaling using Lambda and Aws - Atlogys Tech Talk
 
Wordpress Tech Talk
Wordpress Tech Talk Wordpress Tech Talk
Wordpress Tech Talk
 
Tech Talk on ReactJS
Tech Talk on ReactJSTech Talk on ReactJS
Tech Talk on ReactJS
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
 
Atlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design GuidelinesAtlogys Tech Talk - Web 2.0 Design Guidelines
Atlogys Tech Talk - Web 2.0 Design Guidelines
 
Firebase Tech Talk By Atlogys
Firebase Tech Talk By AtlogysFirebase Tech Talk By Atlogys
Firebase Tech Talk By Atlogys
 
Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!Atlogys - Don’t Just Sell Technology, Sell The Experience!
Atlogys - Don’t Just Sell Technology, Sell The Experience!
 
Smart CTO Service
Smart CTO ServiceSmart CTO Service
Smart CTO Service
 
Atlogys Technical Consulting
Atlogys Technical ConsultingAtlogys Technical Consulting
Atlogys Technical Consulting
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

How Solr Search Works

  • 1. How SOLR Search Works Rajat Jain - 20th Dec, 2016
  • 2. Agenda • What do you mean by Search? • Search Requirements • Comparison of SOLR with SQL/NoSQL • SOLR Architecture • SOLR Usage in Trellis • How Google Search Works • Other Search Technologies
  • 3. What do you mean by Search?
  • 4. What do you mean by Search?
  • 5. What do you mean by Search?
  • 6. Search Requirements • Text Search – eg. “Architects” • Filters – eg. “In New Delhi”, “iOS” • Sorting – eg. “Best Match”, “Highest Rating”, etc. • And More.. • Facets • Stemming • Fuzzy Matching • Image Search, etc.
  • 7. Search Requirements • Full Text Search • Fast reads (writes can be slower) • Various Combinations of Filters • Various Combinations of Sorting • Non Features: • Real-time – usually staleness is not a problem • Data Integrity – usually not a source of storage – can be ‘lossy’
  • 8. Search Requirements – Faceted Search • A Type of Filtering with suggestions • In most cases – sorted by number • Basically helps the user to narrow down the search without having to ‘guess’ how to narrow it
  • 9. Conventional Storage for Search • SQL (MySQL) • Relational Tables • Normalized Data • Assuming using Keys / Indexes for reads & writes • Optimized for reads and writes & transactional data (acid transactions) • Lots of security, etc. • Table Data stored in File System • Indexing - Individual columns – set of columns • Full Text search – recent addition (full text index)
  • 10. Conventional Storage for Search • No SQL (think MongoDB) • Key Value Pairs • De-normalized Data • Unstructured Data • Optimized for Reads – writes can be slightly slower (in case of transactional) • Data stored in File System • Indexing – individual fields • Full Text Search – has in-built support
  • 11. Advantages of SOLR over MySQL/NoSQL • Reversed Index • Mind-blowing Text-analysis / stemming / scoring / fuzziness • Weighting fields / boosting – custom scoring functions • Single document concept – no relations (in general) • Faceting support out-of-the box • Optimized for search and search alone (at scale without performance drop)
  • 12. SOLR Architecture – Indexing • Take a ‘document’ / field, etc. • For each field apply set of filters / tokenizers • Convert to individual tokens • Update the ‘inverted’ index based on the tokens • In general in the Index keep track of stats, etc. for the various terms • Different indexes per field
  • 13. SOLR Architecture - Indexing 13 XML Update Handler CSV Update Handler /update /update/csv XML Update with custom processor chain /update/xml Extracting RequestHandler (PDF, Word, …) /update/extract Lucene Index Data Import Handler Database pull RSS pull Simple transforms SQL DB RSS feed <doc> <title> Remove Duplicates processor Logging processor Index processor Custom Transform processor PDF HTTP POST HTTP POST pull pull Update Processor Chain (per handler) Lucene Text Index Analyzers
  • 14. SOLR Architecture – Searching • User enters query • Parse the query, i.e. apply the required filters and tokenizers • Converted to tokens • Parallel search across multiple indexes (per field) • Score all the documents • Sort in async fashion
  • 16. SOLR Architecture – Updating Index • Types of Index Updates • Instant Index • Incremental Indexing • Full Indexing • Index Update Strategies • Instant / Incremental Index cannot happen continuously • Too much causes performance degradation • Full Index periodically to optimize the index
  • 17. SOLR Architecture – Scalability • Sharding • Splitting collections across servers – search in parallel • Replication • More than one copy of the data for failover • SolrCloud • Using Zookeeper for managing clusters
  • 18. SOLR Architecture – Other Features • Stemming • Identify root word and variations of the word, eg. "stems", "stemmer", "stemming", "stemmed" as based on "stem" • Fuzzy Matching • Similar Words / Misspellings • Edit Distance • NLP • Identify Entities / Nouns in Search Query • OpenNLP Plugin for SOLR • And much more…
  • 19. SOLR Usage in Trellis • Architecture • Data-in from MySQL • Index Update Strategy • AutoComplete • Basic Search • Advanced Search • Filters / Sorting / Facets & More • Demo (Incl. Config Files)
  • 20. How Google Search Works • Crawling • Robots.txt • Indexing • Multiple Indexes – Instant / Daily / Weekly / Long Tail • Searching • NLP, Stemming, Auto-correct, etc. • Ranking – PageRank • Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
  • 21. Other Search Technologies • ElasticSearch • Much newer than Solr • Built-in scalability • Uses same Lucene as the base • JSON instead of XML • Good for Analytical querying • Others • Splunk • Sphinx
  • 22. That’s All Folks References • SOLR Home Page - http://lucene.apache.org/solr/ • Tutorials • http://www.solrtutorial.com/index.h tml • https://lucene.apache.org/solr/4_10 _0/tutorial.html • Just Google the rest!!