SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
SolrCloud and Shard Splitting
Shalin Shekhar Mangar
Bangalore Lucene/Solr Meetup
8th
June 2013
Who am I?
●
Apache Lucene/Solr Committer and PMC member
●
Contributor since January 2008
●
Currently: Engineer at LucidWorks
●
Formerly with AOL
●
Email: shalin@apache.org
●
Twitter: shalinmangar
●
Blog: http://shal.in
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud: Overview
●
Distributed searching/indexing
●
No single points of failure
●
Near Real Time Friendly (push replication)
●
Transaction logs for durability and recovery
●
Real-time get
●
Atomic Updates
●
Optimistic Concurrency
●
Request forwarding from any node in cluster
●
A strong contender for your NoSQL needs as well
Bangalore Lucene/Solr Meetup
8th
June 2013
Bangalore Lucene/Solr Meetup
8th
June 2013
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-7fffffff
c0000000-ffffffff
shard1shard4
shard3 shard2
1f27
3c7
1
(MurmurHash
3)
1f27
000
0
1f27 ffffto
(hash)
shard
1
q=my_query
shard.keys=BigCo!
numShards=4
router=compositeId
id = BigCo!doc5
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud Collections API
●
/admin/collections?action=CREATE&name=mycollection
– &numShards=3
– &replicationFactor=4
– &maxShardsPerNode=2
– &createNodeSet=node1:8080,node2:8080,node3:8080,...
– &collection.configName=myconfigset
●
/admin/collections?action=DELETE&name=mycollection
●
/admin/collections?action=RELOAD&name=mycollection
●
/admin/collections?action=CREATEALIAS&name=south
– &collections=KA,TN,AP,KL,...
●
Coming soon: Shard aliases
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Background
●
Before Solr 4.3, number of shards had to fixed at the time
of collection creation
●
Forced people to start with large number of shards
●
If a shard ran too hot, the only fix was to re-index and
therefore re-balance the collection
●
Each shard is assigned a hash range
●
Each shard also has a state which defaults to 'ACTIVE'
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Features
●
Seamless on-the-fly splitting – no downtime required
●
Retried on failures
●
/admin/collections?
action=SPLITSHARD&collection=mycollection
– &shard=shardId
●
A lower-level CoreAdmin API comes free!
– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2
– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard2_0
Shard1
replic
a
leade
r
Shard2
replic
a
leade
r
Shard3
replic
a
leade
r
Shard2_1
update
Shard Splitting
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Mechanism
●
New sub-shards created in “construction” state
●
Leader starts forwarding applicable updates, which are buffered
by the sub-shards
●
Leader index is split and installed on the sub-shards
●
Sub-shards apply buffered updates
●
Replicas are created for sub-shards and brought up to speed
●
Sub-shard becomes “active” and old shard becomes “inactive”
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and Gotchas
●
Supports collections with a hash based router i.e. “plain”
or “compositeId” routers
●
Operation is executed by the Overseer node, not by the
node you requested
●
HTTP request is synchronous but operation is async. A
read timeout does not mean failure!
●
Operation is retried on failure. Check parent leader's logs
before you re-issue the command or you may end with
more shards than you want
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and gotchas
●
Solr Admin GUI is not aware of shard states yet so the
inactive parent shard is also shown in “green”
●
The CoreAdmin split command can be used against non-
cloud deployments. It will spread docs alternately among
the sub-indexes
●
Inactive shards have to be cleaned up manually. Solr 4.4
will have a delete shard API
●
Shard splitting in 4.3 release is buggy. Wait for 4.3.1
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Looking towards the future
●
GUI integration and better progress reporting/monitoring
●
Better support for custom sharding use-cases
●
More flexibility towards number of sub-shards, hash
ranges, number of replicas etc
●
Store replication factor per shard
●
Suggest splits to admins based on cluster state and load
Confidential and Proprietary
© 2012 LucidWorks14
About LucidWorks
• Intro to LucidWorks (formerly Lucid Imagination)
– Follow: @lucidworks, @lucidimagineer
– Learn: http://www.lucidworks.com
• Check out SearchHub: http://www.searchhub.org
• Solr 4.1 Reference Guide: http://bit.ly/11KSiMN
– Older versions: http://bit.ly/12t1Egq
• Our Products
– LucidWorks Search
– LucidWorks Big Data
• Lucene Revolution
– http://www.lucenerevolution.com
Bangalore Lucene/Solr Meetup
8th
June 2013
Thank you
Shalin Shekhar Mangar
LucidWorks

Contenu connexe

Tendances

An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Percona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesPercona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesFrederic Descamps
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchNETWAYS
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
 
Oracle RAC on Extended Distance Clusters - Customer Examples
Oracle RAC on Extended Distance Clusters - Customer ExamplesOracle RAC on Extended Distance Clusters - Customer Examples
Oracle RAC on Extended Distance Clusters - Customer ExamplesMarkus Michalewicz
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleSingleStore
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMarkus Michalewicz
 
Oracle RAC on Extended Distance Clusters - Presentation
Oracle RAC on Extended Distance Clusters - PresentationOracle RAC on Extended Distance Clusters - Presentation
Oracle RAC on Extended Distance Clusters - PresentationMarkus Michalewicz
 

Tendances (20)

An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Percona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesPercona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL Architectures
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Oracle RAC on Extended Distance Clusters - Customer Examples
Oracle RAC on Extended Distance Clusters - Customer ExamplesOracle RAC on Extended Distance Clusters - Customer Examples
Oracle RAC on Extended Distance Clusters - Customer Examples
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Using Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and TuningUsing Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and Tuning
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19c
 
Oracle RAC on Extended Distance Clusters - Presentation
Oracle RAC on Extended Distance Clusters - PresentationOracle RAC on Extended Distance Clusters - Presentation
Oracle RAC on Extended Distance Clusters - Presentation
 

En vedette

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 

En vedette (20)

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Search engine ppt
Search engine pptSearch engine ppt
Search engine ppt
 

Similaire à SolrCloud and Shard Splitting

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Jitendra Chittoda
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Olivier Dobberkau
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfssuserbb9f511
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoUri Savelchev
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsChris Muir
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010fschupp
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013Andrew Morgan
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Claus Ibsen
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - CopenhagenClaus Ibsen
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And ShardingKnoldus Inc.
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New ThingsY. Thong Kuah
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfYunusShaikh49
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 

Similaire à SolrCloud and Shard Splitting (20)

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
Intro to openfaas
Intro to openfaasIntro to openfaas
Intro to openfaas
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdf
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
 
Advance Features of Hibernate
Advance Features of HibernateAdvance Features of Hibernate
Advance Features of Hibernate
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And Sharding
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New Things
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdf
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

SolrCloud and Shard Splitting

  • 1. SolrCloud and Shard Splitting Shalin Shekhar Mangar
  • 2. Bangalore Lucene/Solr Meetup 8th June 2013 Who am I? ● Apache Lucene/Solr Committer and PMC member ● Contributor since January 2008 ● Currently: Engineer at LucidWorks ● Formerly with AOL ● Email: shalin@apache.org ● Twitter: shalinmangar ● Blog: http://shal.in
  • 3. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud: Overview ● Distributed searching/indexing ● No single points of failure ● Near Real Time Friendly (push replication) ● Transaction logs for durability and recovery ● Real-time get ● Atomic Updates ● Optimistic Concurrency ● Request forwarding from any node in cluster ● A strong contender for your NoSQL needs as well
  • 5. Bangalore Lucene/Solr Meetup 8th June 2013 Document Routing 80000000-bfffffff 00000000-3fffffff 40000000-7fffffff c0000000-ffffffff shard1shard4 shard3 shard2 1f27 3c7 1 (MurmurHash 3) 1f27 000 0 1f27 ffffto (hash) shard 1 q=my_query shard.keys=BigCo! numShards=4 router=compositeId id = BigCo!doc5
  • 6. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud Collections API ● /admin/collections?action=CREATE&name=mycollection – &numShards=3 – &replicationFactor=4 – &maxShardsPerNode=2 – &createNodeSet=node1:8080,node2:8080,node3:8080,... – &collection.configName=myconfigset ● /admin/collections?action=DELETE&name=mycollection ● /admin/collections?action=RELOAD&name=mycollection ● /admin/collections?action=CREATEALIAS&name=south – &collections=KA,TN,AP,KL,... ● Coming soon: Shard aliases
  • 7. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Background ● Before Solr 4.3, number of shards had to fixed at the time of collection creation ● Forced people to start with large number of shards ● If a shard ran too hot, the only fix was to re-index and therefore re-balance the collection ● Each shard is assigned a hash range ● Each shard also has a state which defaults to 'ACTIVE'
  • 8. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Features ● Seamless on-the-fly splitting – no downtime required ● Retried on failures ● /admin/collections? action=SPLITSHARD&collection=mycollection – &shard=shardId ● A lower-level CoreAdmin API comes free! – /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2 – /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
  • 9. Bangalore Lucene/Solr Meetup 8th June 2013 Shard2_0 Shard1 replic a leade r Shard2 replic a leade r Shard3 replic a leade r Shard2_1 update Shard Splitting
  • 10. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Mechanism ● New sub-shards created in “construction” state ● Leader starts forwarding applicable updates, which are buffered by the sub-shards ● Leader index is split and installed on the sub-shards ● Sub-shards apply buffered updates ● Replicas are created for sub-shards and brought up to speed ● Sub-shard becomes “active” and old shard becomes “inactive”
  • 11. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and Gotchas ● Supports collections with a hash based router i.e. “plain” or “compositeId” routers ● Operation is executed by the Overseer node, not by the node you requested ● HTTP request is synchronous but operation is async. A read timeout does not mean failure! ● Operation is retried on failure. Check parent leader's logs before you re-issue the command or you may end with more shards than you want
  • 12. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and gotchas ● Solr Admin GUI is not aware of shard states yet so the inactive parent shard is also shown in “green” ● The CoreAdmin split command can be used against non- cloud deployments. It will spread docs alternately among the sub-indexes ● Inactive shards have to be cleaned up manually. Solr 4.4 will have a delete shard API ● Shard splitting in 4.3 release is buggy. Wait for 4.3.1
  • 13. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Looking towards the future ● GUI integration and better progress reporting/monitoring ● Better support for custom sharding use-cases ● More flexibility towards number of sub-shards, hash ranges, number of replicas etc ● Store replication factor per shard ● Suggest splits to admins based on cluster state and load
  • 14. Confidential and Proprietary © 2012 LucidWorks14 About LucidWorks • Intro to LucidWorks (formerly Lucid Imagination) – Follow: @lucidworks, @lucidimagineer – Learn: http://www.lucidworks.com • Check out SearchHub: http://www.searchhub.org • Solr 4.1 Reference Guide: http://bit.ly/11KSiMN – Older versions: http://bit.ly/12t1Egq • Our Products – LucidWorks Search – LucidWorks Big Data • Lucene Revolution – http://www.lucenerevolution.com
  • 15. Bangalore Lucene/Solr Meetup 8th June 2013 Thank you Shalin Shekhar Mangar LucidWorks