SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
MongoDB	
  Sharding	
  

How	
  queries	
  work	
  in	
  sharded	
  environments	
  
One	
  small	
  server.	
  	
  We	
  want	
  more	
  
capacity.	
  	
  What	
  to	
  do?	
  
TradiAonally,	
  we	
  would	
  scale	
  
verAcally	
  with	
  a	
  bigger	
  box.	
  
With	
  sharding	
  we	
  instead	
  scale	
  
horizontally	
  to	
  achieve	
  the	
  same	
  
computaAonal/storage/memory	
  
footprint	
  from	
  smaller	
  servers.	
  




                           m=10	
  
We	
  will	
  show	
  the	
  verAcally	
  scale	
  db	
  and	
  the	
  horizontally	
  scaled	
  db	
  
side-­‐by-­‐side	
  for	
  comparison.	
  
A	
  sharded	
  MongoDB	
  collecAon	
  has	
  a	
  shard	
  key.	
  	
  The	
  collecAon	
  is	
  
parAAoned	
  in	
  an	
  order-­‐preserving	
  manner	
  using	
  this	
  key.	
  	
  In	
  this	
  
example	
  a	
  is	
  our	
  shard	
  key:	
  

{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  declared	
  shard	
  key	
  for	
  the	
  collec0on	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  shard	
  key	
  

Metadata	
  is	
  maintained	
  on	
  chunks	
  which	
  are	
  represented	
  by	
  
shard	
  key	
  ranges.	
  	
  Each	
  chunk	
  is	
  assigned	
  to	
  a	
  parAcular	
  shard.	
  
                                                                                                          Range	
                        Shard	
  
                                                                                                          a	
  in	
  [-­‐∞,2000)	
       2	
  
                                                                                                          a	
  in	
  [2000,2100)	
       8	
  
                                                                                                          a	
  in	
  [2100,5500)	
       3	
  
                                                                                                          …	
                            …	
  
                                                                                                          a	
  in	
  [88700,	
  ∞)	
     0	
  



When	
  a	
  chunk	
  becomes	
  too	
  large,	
  MongoDB	
  automaAcally	
  splits	
  
it,	
  and	
  the	
  balancer	
  will	
  later	
  migrate	
  chunks	
  as	
  necessary.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  shard	
  key	
  

find(	
  {	
  a	
  :	
  {	
  $gt	
  :	
  333,	
  $lt	
  :	
  400	
  }	
  )	
  

                                                                                                          Range	
                        Shard	
  
                                                                                                          a	
  in	
  [-­‐∞,2000)	
       2	
  
                                                                                                          a	
  in	
  [2000,2100)	
       8	
  
                                                                                                          a	
  in	
  [2100,5500)	
       3	
  
                                                                                                          …	
                            …	
  
                                                                                                          a	
  in	
  [88700,	
  ∞)	
     0	
  



The	
  mongos	
  process	
  routes	
  a	
  query	
  to	
  the	
  correct	
  shard(s).	
  	
  For	
  
the	
  query	
  above,	
  all	
  data	
  possibly	
  relevant	
  is	
  on	
  shard	
  2,	
  so	
  the	
  
query	
  is	
  sent	
  to	
  that	
  node	
  only,	
  and	
  processed	
  there.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  declared	
  shard	
  key	
  

find(	
  {	
  a	
  :	
  {	
  $gt	
  :	
  333,	
  $lt	
  :	
  2012	
  }	
  )	
  

                                                                                              Range	
                                  Shard	
  
                                                                                              a	
  in	
  [-­‐∞,2000)	
                 2	
  
                                                                                              a	
  in	
  [2000,2100)	
                 8	
  
                                                                                              a	
  in	
  [2100,5500)	
                 3	
  
                                                                                              …	
                                      …	
  
                                                                                              a	
  in	
  [88700,	
  ∞)	
               0	
  



SomeAmes	
  a	
  query	
  range	
  might	
  span	
  more	
  than	
  one	
  shard,	
  but	
  
very	
  few	
  in	
  total.	
  This	
  is	
  reasonably	
  efficient.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  non-­‐shard	
  key	
  query,	
  no	
  index	
  

find(	
  {	
  b	
  :	
  99	
  }	
  )	
  

                                                                                       Range	
                                       Shard	
  
                                                                                       a	
  in	
  [-­‐∞,2000)	
                      2	
  
                                                                                       a	
  in	
  [2000,2100)	
                      8	
  
                                                                                       a	
  in	
  [2100,5500)	
                      3	
  
                                                                                       …	
                                           …	
  
                                                                                       a	
  in	
  [88700,	
  ∞)	
                    0	
  



Queries	
  not	
  involving	
  the	
  shard	
  key	
  will	
  be	
  sent	
  to	
  all	
  shards	
  as	
  a	
  
“scader/gather”	
  operaAon.	
  	
  This	
  is	
  someAmes	
  ok.	
  	
  Here	
  on	
  both	
  
our	
  tradiAonal	
  machine	
  and	
  the	
  shards,	
  we	
  do	
  a	
  table	
  scan	
  -­‐-­‐	
  
equally	
  expensive	
  (roughly)	
  on	
  both.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Sca8er	
  /	
  gather	
  with	
  secondary	
  index	
  

ensureIndex({b:1})	
  
find(	
  {	
  b	
  :	
  99	
  }	
  )	
  
                                                                                 Range	
                                   Shard	
  
                                                                                 a	
  in	
  [-­‐∞,2000)	
                  2	
  
                                                                                 a	
  in	
  [2000,2100)	
                  8	
  
                                                                                 a	
  in	
  [2100,5500)	
                  3	
  
                                                                                 …	
                                       …	
  
                                                                                 a	
  in	
  [88700,	
  ∞)	
                0	
  



Once	
  again	
  a	
  query	
  with	
  a	
  shard	
  key	
  results	
  in	
  a	
  scader/gather	
  
operaAon.	
  	
  However	
  at	
  each	
  shard,	
  we	
  can	
  use	
  the	
  {b:1}	
  index	
  to	
  
make	
  the	
  operaAon	
  efficient	
  for	
  that	
  shard.	
  	
  We	
  have	
  a	
  lidle	
  extra	
  
overhead	
  over	
  the	
  verAcal	
  configuraAon	
  for	
  the	
  communicaAons	
  
effort	
  from	
  mongos	
  to	
  each	
  shard	
  -­‐-­‐	
  not	
  too	
  bad	
  if	
  number	
  of	
  
shards	
  is	
  small	
  (10)	
  but	
  quite	
  substanAal	
  for	
  say,	
  a	
  1000	
  shard	
  
system.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Non-­‐shard	
  key	
  query,	
  secondary	
  index	
  

find(	
  {	
  b	
  :	
  99,	
  a	
  :	
  100	
  }	
  )	
  

                                                                               Range	
                                    Shard	
  
                                                                               a	
  in	
  [-­‐∞,2000)	
                   2	
  
                                                                               a	
  in	
  [2000,2100)	
                   8	
  
                                                                               a	
  in	
  [2100,5500)	
                   3	
  
                                                                               …	
                                        …	
  
                                                                               a	
  in	
  [88700,	
  ∞)	
                 0	
  



The	
  a	
  term	
  involves	
  the	
  shard	
  key	
  and	
  allows	
  mongos	
  to	
  
intelligently	
  route	
  the	
  query	
  to	
  shard	
  2.	
  	
  Once	
  the	
  query	
  reaches	
  
shard	
  2,	
  the	
  {	
  b	
  :	
  1	
  }	
  index	
  can	
  be	
  used	
  to	
  efficiently	
  process	
  the	
  
query.	
  
When	
  sorAng	
  is	
  specified,	
  the	
  relevant	
  shards	
  sort	
  locally,	
  and	
  
            then	
  mongos	
  merges	
  the	
  results.	
  	
  Thus	
  the	
  mongos	
  resource	
  
            usage	
  is	
  not	
  terribly	
  high.	
  



                                        client	
  



                                         Adam	
  
                                          Bob	
  
                                         David	
  
                                          Julie	
  
                                          Sue	
  
                                         Time	
  
                                         Zack	
  




                                       mongos	
  


 Bob	
  
David	
                                   Sue	
                     Adam	
  
Julie	
                                   Tim	
                     Zack	
  
When	
  using	
  replicaAon	
  (typically	
  a	
  replica	
  set),	
  we	
  simply	
  have	
  
more	
  than	
  one	
  node	
  per	
  shard.	
  

(arrows	
  below	
  indicate	
  replica0on,	
  tradi0onal	
  vs.	
  sharded	
  
environments)	
  

Contenu connexe

Tendances

Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
chriskite
 

Tendances (20)

ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
MySQL Shell for DBAs
MySQL Shell for DBAsMySQL Shell for DBAs
MySQL Shell for DBAs
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
 
Proxysql sharding
Proxysql shardingProxysql sharding
Proxysql sharding
 
ProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management Overview
 
Introduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard Broker
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionOracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
 
Percona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesPercona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL Architectures
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code ExtensionMySQL Day Roma - MySQL Shell and Visual Studio Code Extension
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 

En vedette

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
MongoDB
 

En vedette (20)

Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
 
Eclipse Paho - MQTT and the Internet of Things
Eclipse Paho - MQTT and the Internet of ThingsEclipse Paho - MQTT and the Internet of Things
Eclipse Paho - MQTT and the Internet of Things
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 
Pagination Done the Right Way
Pagination Done the Right WayPagination Done the Right Way
Pagination Done the Right Way
 
Open Source IoT at Eclipse
Open Source IoT at EclipseOpen Source IoT at Eclipse
Open Source IoT at Eclipse
 
BigData_TP5 : Neo4J
BigData_TP5 : Neo4JBigData_TP5 : Neo4J
BigData_TP5 : Neo4J
 
BigData_TP2: Design Patterns dans Hadoop
BigData_TP2: Design Patterns dans HadoopBigData_TP2: Design Patterns dans Hadoop
BigData_TP2: Design Patterns dans Hadoop
 
BigData_TP4 : Cassandra
BigData_TP4 : CassandraBigData_TP4 : Cassandra
BigData_TP4 : Cassandra
 
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all togetherBigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
 
BigData_TP1: Initiation à Hadoop et Map-Reduce
BigData_TP1: Initiation à Hadoop et Map-ReduceBigData_TP1: Initiation à Hadoop et Map-Reduce
BigData_TP1: Initiation à Hadoop et Map-Reduce
 
BigData_TP3 : Spark
BigData_TP3 : SparkBigData_TP3 : Spark
BigData_TP3 : Spark
 
BigData_Chp3: Data Processing
BigData_Chp3: Data ProcessingBigData_Chp3: Data Processing
BigData_Chp3: Data Processing
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-Reduce
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

How queries work with sharding

  • 1. MongoDB  Sharding   How  queries  work  in  sharded  environments  
  • 2. One  small  server.    We  want  more   capacity.    What  to  do?  
  • 3. TradiAonally,  we  would  scale   verAcally  with  a  bigger  box.  
  • 4. With  sharding  we  instead  scale   horizontally  to  achieve  the  same   computaAonal/storage/memory   footprint  from  smaller  servers.   m=10  
  • 5. We  will  show  the  verAcally  scale  db  and  the  horizontally  scaled  db   side-­‐by-­‐side  for  comparison.  
  • 6. A  sharded  MongoDB  collecAon  has  a  shard  key.    The  collecAon  is   parAAoned  in  an  order-­‐preserving  manner  using  this  key.    In  this   example  a  is  our  shard  key:   {  a  :  …,  b  :  …,  c  :  …  }                    a  is  declared  shard  key  for  the  collec0on  
  • 7. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  shard  key   Metadata  is  maintained  on  chunks  which  are  represented  by   shard  key  ranges.    Each  chunk  is  assigned  to  a  parAcular  shard.   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   When  a  chunk  becomes  too  large,  MongoDB  automaAcally  splits   it,  and  the  balancer  will  later  migrate  chunks  as  necessary.  
  • 8. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  shard  key   find(  {  a  :  {  $gt  :  333,  $lt  :  400  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   The  mongos  process  routes  a  query  to  the  correct  shard(s).    For   the  query  above,  all  data  possibly  relevant  is  on  shard  2,  so  the   query  is  sent  to  that  node  only,  and  processed  there.  
  • 9. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  declared  shard  key   find(  {  a  :  {  $gt  :  333,  $lt  :  2012  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   SomeAmes  a  query  range  might  span  more  than  one  shard,  but   very  few  in  total.  This  is  reasonably  efficient.  
  • 10. {  a  :  …,  b  :  …,  c  :  …  }                    non-­‐shard  key  query,  no  index   find(  {  b  :  99  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   Queries  not  involving  the  shard  key  will  be  sent  to  all  shards  as  a   “scader/gather”  operaAon.    This  is  someAmes  ok.    Here  on  both   our  tradiAonal  machine  and  the  shards,  we  do  a  table  scan  -­‐-­‐   equally  expensive  (roughly)  on  both.  
  • 11. {  a  :  …,  b  :  …,  c  :  …  }                    Sca8er  /  gather  with  secondary  index   ensureIndex({b:1})   find(  {  b  :  99  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   Once  again  a  query  with  a  shard  key  results  in  a  scader/gather   operaAon.    However  at  each  shard,  we  can  use  the  {b:1}  index  to   make  the  operaAon  efficient  for  that  shard.    We  have  a  lidle  extra   overhead  over  the  verAcal  configuraAon  for  the  communicaAons   effort  from  mongos  to  each  shard  -­‐-­‐  not  too  bad  if  number  of   shards  is  small  (10)  but  quite  substanAal  for  say,  a  1000  shard   system.  
  • 12. {  a  :  …,  b  :  …,  c  :  …  }                    Non-­‐shard  key  query,  secondary  index   find(  {  b  :  99,  a  :  100  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   The  a  term  involves  the  shard  key  and  allows  mongos  to   intelligently  route  the  query  to  shard  2.    Once  the  query  reaches   shard  2,  the  {  b  :  1  }  index  can  be  used  to  efficiently  process  the   query.  
  • 13. When  sorAng  is  specified,  the  relevant  shards  sort  locally,  and   then  mongos  merges  the  results.    Thus  the  mongos  resource   usage  is  not  terribly  high.   client   Adam   Bob   David   Julie   Sue   Time   Zack   mongos   Bob   David   Sue   Adam   Julie   Tim   Zack  
  • 14. When  using  replicaAon  (typically  a  replica  set),  we  simply  have   more  than  one  node  per  shard.   (arrows  below  indicate  replica0on,  tradi0onal  vs.  sharded   environments)