SlideShare a Scribd company logo
1 of 52
Riak
Search
the next generation
Tuesday, September 17, 13
tsantero
@basho.com
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
2.0coming soon..
Tuesday, September 17, 13
the history of
Riak Search
Tuesday, September 17, 13
home grown
full-text search
Tuesday, September 17, 13
lucene
Tuesday, September 17, 13
SCALETuesday, September 17, 13
NODE # = HASH(KEY) % NUM_NODES
NH(Ka) = 0
NH(Kb) = 1
NH(Kc) = 2
NH(Kd) = 0
...
Naive Hashing
Tuesday, September 17, 13
NODE 0 NODE 1 NODE 2
Ka Kb KcKd Ke KfKg Kh Ki
Kj KkKm KlKp Kn KoKq Kr
Naive Hashing
Tuesday, September 17, 13
NODE 0 NODE 1 NODE 2
Ka Kb Kc KdKgKi
NODE 3
Ke Kf KhKj Kk Kl
Km Kn Ko KpKq Kr
Naive Hashing
Tuesday, September 17, 13
K * (NN - 1) / NN => K
• K = # OF KEYS
• NN = # OF NODES
• AS NN GROWS FACTOR ESSENTIALLY BECOMES 1, THUS
ALL KEYS MOVE
Naive Hashing
Tuesday, September 17, 13
PARTITION # = HASH(KEY) % PARTITIONS
• # PARTITIONS REMAINS CONSTANT
• KEY ALWAYS MAPS TO SAME PARTITION
• NODES OWN PARTITIONS
• PARTITIONS CONTAIN KEYS
• EXTRA LEVEL OF INDIRECTION
Consistent Hashing
Tuesday, September 17, 13
P9P6P3P8P5P2P7P4P1
NODE 0 NODE 1 NODE 2
Ka Kb KcKd Ke KfKg Kh Ki
Kj KkKm KlKp Kn KoKq Kr
Consistent Hashing
Tuesday, September 17, 13
P9P6P3P8 P5P2P7P4 P1
NODE 0 NODE 1 NODE 2
KaKb KcKd KeKfKg Kh Ki
KjKkKm KlKp KnKoKq Kr
NODE 3
Consistent Hashing
Tuesday, September 17, 13
NN * K/Q => K/Q
• K = # OF KEYS
• NN = # OF NODES
• Q = # OF PARTITIONS
• AS K GROWS NN BECOMES CONSTANT,THUS K/Q KEYS
MOVE
Consistent Hashing
Tuesday, September 17, 13
uniform distribution
Consistent
Hashing
{logical vs physical
partitioning scheme
even division of keys
Tuesday, September 17, 13
the future of
Riak Search
Tuesday, September 17, 13
Tuesday, September 17, 13
persistence
distributing Solr
querying
indexing
Tuesday, September 17, 13
Each Riak node runs an instance of Solr
Tuesday, September 17, 13
Solr index = riak bucket
document = RObj value
plaintext, JSON, XML
Tuesday, September 17, 13
Distributed Searching in Solr
query
faceting
highlighting
stats
spell check
term vectors
Tuesday, September 17, 13
SolrCloud
Tuesday, September 17, 13
SolrCloud
Tuesday, September 17, 13
Harvest vs Yield
Tuesday, September 17, 13
A better measure of
Availability
Tuesday, September 17, 13
Queries Issues
Queries OfferedYield =
Tuesday, September 17, 13
Harvest =
Data Available
Total Dataset
Tuesday, September 17, 13
Harvest
Yield
Tuesday, September 17, 13
Manage Harvest by storing Index Replicas
Tuesday, September 17, 13
Term vs Document
Partitioning Schemes
Tuesday, September 17, 13
Node 0
Node 1
Node 2
Term Based Partitioning
Tuesday, September 17, 13
Node 0
Node 1
Node 2
Document Based Partitioning
Tuesday, September 17, 13
Replicas
Node 0
Node 1
Node 2
Tuesday, September 17, 13
Quorums
Tuesday, September 17, 13
Concurrency
=>
Siblings
Tuesday, September 17, 13
Read Repair
(Anti-Entropy)
Tuesday, September 17, 13
replica replica replica
Tuesday, September 17, 13
replica replica replica
X
Tuesday, September 17, 13
replica replica replica
replica replica replica
Tuesday, September 17, 13
Active Anti-Entropy
(self healing clusters)
Tuesday, September 17, 13
real-time updates
persistent
non-blocking
disk-based
Tuesday, September 17, 13
Tuesday, September 17, 13
= hashes marked “dirty”
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
Tuesday, September 17, 13
= keys to read-repair
Tuesday, September 17, 13
Questions?
make it so!
Tuesday, September 17, 13

More Related Content

More from Caserta

Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
Caserta
 

More from Caserta (20)

Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Riak Search - The Next Generation