SlideShare une entreprise Scribd logo
1  sur  23
Eventual Consistency 
@WalmartLabs with Kafka, 
SolrCloud and Hadoop 
Ayon Sinha 
asinha@walmartlabs.com
Introductions 
• @WalmartLabs – Building Walmart Global eCommerce from the 
2 
ground up 
• Data Foundation Team – Build, manage and provide tools for all OLTP 
operations
Large Scale eCommerce problems 
• Our customers love to shop online 24X7 and we love them for that 
• Reads are many orders of magnitude more than writes, and reads 
3 
have to be blazing fast (every millisecond has a monetary value attached to it, according to 
some studies) 
• Scaling up only takes you so far, you have to scale out 
• Low latency analytics absolutely canNOT be on OLTP data stores 
• No full table scans 
• Too many RDBMS column indexes leading to slow writes
Data Foundation Architecture 
4 
End
Very large scale and always available means.. 
• There is really NO way around Brewer’s CAP theorem 
Source: http://blog.mccrory.me/2010/11/03/cap-theorem- 
and-the-clouds/ 
• Embrace “eventual” consistency and asynchrony 
• Clearly articulate “eventual” to business stakeholders. Computer 
5 
“eventual” and human “eventual” are different scales entirely.
EC Use cases 
6
Typical data flow into EC data stores 
IC Web Service 
Web Service Client 
7 
Client 
Web Service Client 
EC Web Service 
Web Service Client 
Orchestrator Service 
Client 
Resource Tier Resource Tier Resource Tier 
Batch layer (processes data on 
Hadoop and loads into faster serving 
Kafka 
Event driven 
updater 
Kafka Consumer 
for Solr 
datastore) 
Fire job and pull results 
Kafka Consumer 
for Hadoop 
SolrCloud Hadoop 
Web Service Client 
70-80% of 
total load 
read 
write write
Challenges 
• Messaging System: Kafka was already being used and supported by 
8 
our Big Fast Data team 
• Virtualization 
– Shared CPU and memory among compute tenants generally bad 
for Search engine infrastructure. If your use-case takes off, you will 
eventually move to dedicated hardware. 
– We started with big dedicated bare-metal hardware 
– Virtualization requires complete lifecycle management 
• Serialization format 
– Our choice Avro (Schema + Data) 
• Hierarchical Object to Flat 
– If you are familiar with ElasticSearch, you’d say “No 
problem..maybe” 
– If you are already using HBase or Cassandra or similar, you’d say 
“No problem..maybe” 
– For Solr people, lets talk about schema.xml and plugin based 
flattening
SolrCloud 101 
• Solr is the web app wrapper on Lucene 
• SolrCloud is the distributed search where a bunch of Solr nodes 
9 
coordinate using ZooKeeper 
Source: SolrCloud Wiki
Solr schema.xml choices 
• Let each team build their own schema.xml from scratch 
10 
– This would require each customer team to intimately learn search 
engines, Solr etc. 
– This would also mean that each time there is a change in 
schema.xml, everything must be re-indexed. 
• Leverage Solr’s dynamic fields and create a naming convention 
– this gives the customer a kick-start 
– Schema.xml doesn’t need to change often and can be mostly used 
unchanged team to team
Best possible (unrealistic) scenario 
• No writes 
• No scoring, sorting, faceting 
• 100% document cache hit ratio 
• 99.6% of 192GB physical memory usage 
• 2000+ select/sec 
• 0.3 ms/query 
11
We even got.. 
12
Initial Solr Settings 
13
Getting Worse.. 
• Hundreds of ms/query with close to 100% Doc cache hit ratio 
14
Most common causes of slowdowns 
• GC pauses. Cure: trial-and-error with help from experts 
15
More naïve mistakes.. 
• Zookeeper in the same Solr machine 
16 
– We did not experience this, as we knew this going in 
• Frequent commits (in our case was DB-style, 1 doc/update + commit) 
– DON’T commit after every update. Solr commit is very different 
from DBMS commit. It opens up a new searcher and warms it up in 
the background. “Too many on-deck searchers” warning is a telltale 
sign 
– Batch as many docs as your application can tolerate in a single 
update post 
– We chose batching docs for 1 sec 
• IO contention (Log level too high) 
– Easy fix
Zookeeper 
• Prefer odd number of nodes for the ensemble as quorum is N/2 + 1 
• More nodes are not necessarily better 
17 
– 3 nodes is too low as you can handle only 1 failure 
– 5 nodes is good balance between HA and write speed. More nodes 
creates slower writes and slower quorums. 
– We had to go with 9 = 3 nodes in each of 3 protects us from a 
complete outage in one cloud. 
• Pay good attention to Zookeeper availability as SolrCloud will only 
function for a little while after ZK is dead 
• CloudSolrServer (SolrJ client) completely relies on Zookeeper for 
talking to SolrCloud
How do you do Disaster Recovery? 
• SolrCloud is CP model (CAP theorem) 
• You should not add replica from another data center. Every write will 
18 
get excruciatingly slow 
• Use Kafka or other messaging system to send data cross-DC 
• Get used to cross-DC eventual consistency. Monitor for tolerance 
thresholds
Metrics Monitoring 
• We poll metrics from Mbeans and push to Graphite servers 
19
Real Query Performance 
20
Real Update Performance 
21
Real Customer Results 
22
23 
Q&A

Contenu connexe

Tendances

Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 

Tendances (20)

Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 

Similaire à Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop

Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous ApplicationsJohan Edstrom
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity securityLen Bass
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...confluent
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfScyllaDB
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityOSSCube
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
High scale flavour
High scale flavourHigh scale flavour
High scale flavourTomas Doran
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBMariaDB plc
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
 

Similaire à Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop (20)

Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous Applications
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdf
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
High scale flavour
High scale flavourHigh scale flavour
High scale flavour
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop

  • 1. Eventual Consistency @WalmartLabs with Kafka, SolrCloud and Hadoop Ayon Sinha asinha@walmartlabs.com
  • 2. Introductions • @WalmartLabs – Building Walmart Global eCommerce from the 2 ground up • Data Foundation Team – Build, manage and provide tools for all OLTP operations
  • 3. Large Scale eCommerce problems • Our customers love to shop online 24X7 and we love them for that • Reads are many orders of magnitude more than writes, and reads 3 have to be blazing fast (every millisecond has a monetary value attached to it, according to some studies) • Scaling up only takes you so far, you have to scale out • Low latency analytics absolutely canNOT be on OLTP data stores • No full table scans • Too many RDBMS column indexes leading to slow writes
  • 5. Very large scale and always available means.. • There is really NO way around Brewer’s CAP theorem Source: http://blog.mccrory.me/2010/11/03/cap-theorem- and-the-clouds/ • Embrace “eventual” consistency and asynchrony • Clearly articulate “eventual” to business stakeholders. Computer 5 “eventual” and human “eventual” are different scales entirely.
  • 7. Typical data flow into EC data stores IC Web Service Web Service Client 7 Client Web Service Client EC Web Service Web Service Client Orchestrator Service Client Resource Tier Resource Tier Resource Tier Batch layer (processes data on Hadoop and loads into faster serving Kafka Event driven updater Kafka Consumer for Solr datastore) Fire job and pull results Kafka Consumer for Hadoop SolrCloud Hadoop Web Service Client 70-80% of total load read write write
  • 8. Challenges • Messaging System: Kafka was already being used and supported by 8 our Big Fast Data team • Virtualization – Shared CPU and memory among compute tenants generally bad for Search engine infrastructure. If your use-case takes off, you will eventually move to dedicated hardware. – We started with big dedicated bare-metal hardware – Virtualization requires complete lifecycle management • Serialization format – Our choice Avro (Schema + Data) • Hierarchical Object to Flat – If you are familiar with ElasticSearch, you’d say “No problem..maybe” – If you are already using HBase or Cassandra or similar, you’d say “No problem..maybe” – For Solr people, lets talk about schema.xml and plugin based flattening
  • 9. SolrCloud 101 • Solr is the web app wrapper on Lucene • SolrCloud is the distributed search where a bunch of Solr nodes 9 coordinate using ZooKeeper Source: SolrCloud Wiki
  • 10. Solr schema.xml choices • Let each team build their own schema.xml from scratch 10 – This would require each customer team to intimately learn search engines, Solr etc. – This would also mean that each time there is a change in schema.xml, everything must be re-indexed. • Leverage Solr’s dynamic fields and create a naming convention – this gives the customer a kick-start – Schema.xml doesn’t need to change often and can be mostly used unchanged team to team
  • 11. Best possible (unrealistic) scenario • No writes • No scoring, sorting, faceting • 100% document cache hit ratio • 99.6% of 192GB physical memory usage • 2000+ select/sec • 0.3 ms/query 11
  • 14. Getting Worse.. • Hundreds of ms/query with close to 100% Doc cache hit ratio 14
  • 15. Most common causes of slowdowns • GC pauses. Cure: trial-and-error with help from experts 15
  • 16. More naïve mistakes.. • Zookeeper in the same Solr machine 16 – We did not experience this, as we knew this going in • Frequent commits (in our case was DB-style, 1 doc/update + commit) – DON’T commit after every update. Solr commit is very different from DBMS commit. It opens up a new searcher and warms it up in the background. “Too many on-deck searchers” warning is a telltale sign – Batch as many docs as your application can tolerate in a single update post – We chose batching docs for 1 sec • IO contention (Log level too high) – Easy fix
  • 17. Zookeeper • Prefer odd number of nodes for the ensemble as quorum is N/2 + 1 • More nodes are not necessarily better 17 – 3 nodes is too low as you can handle only 1 failure – 5 nodes is good balance between HA and write speed. More nodes creates slower writes and slower quorums. – We had to go with 9 = 3 nodes in each of 3 protects us from a complete outage in one cloud. • Pay good attention to Zookeeper availability as SolrCloud will only function for a little while after ZK is dead • CloudSolrServer (SolrJ client) completely relies on Zookeeper for talking to SolrCloud
  • 18. How do you do Disaster Recovery? • SolrCloud is CP model (CAP theorem) • You should not add replica from another data center. Every write will 18 get excruciatingly slow • Use Kafka or other messaging system to send data cross-DC • Get used to cross-DC eventual consistency. Monitor for tolerance thresholds
  • 19. Metrics Monitoring • We poll metrics from Mbeans and push to Graphite servers 19