SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Cloud Database Engineering
Making Non-Distributed Databases, Distributed
- Shailesh Birari
- Ioannis Papapanagiotou, PhD
Dynomite Ecosystem
● Dynomite
● Dynomite-manager
● Dyno client
Cloud Database Engg (CDE) Team
● Develop and operate data stores in AWS
- Cassandra, Dynomite, Elastic Search,
RDS, S3
● Ensure availability, scalability, durability and
latency SLAs
● Database expertise, client libraries, tools and
best practices
● Cassandra not a speed demon (reads)
● Needed a data store:
o Scalable & highly available
o High throughput, low latency
o Active-active multi datacenter replication
● Usage of Redis increasing:
o Netflix use case is active-active, highly available
o Does not have bi-directional replication
o Cannot withstand a Monkey attack
Problems & Observations
What is Dynomite?
● A framework that makes non-distributed data
stores, distributed.
o Can be used with many key-value storage engines
like Redis, Memcached, LMDB, etc.
o Focus: performance, cross-datacenter active-active
replication and high availability
o Features: node warmup (cold bootstrapping),
tunable consistency, S3 backups/restores
Dynomite @ Netflix
● Running around 1.5 years in PROD
● ~1000 customer facing nodes
● 1M OPS at peak
● Largest cluster: 6TB
● Quarterly upgrades in PROD
Dynomite Overview
● Layer on top of a non-distributed key value
data store
○ Peer-peer, Shared Nothing
○ Auto Sharding
○ Multi-datacenter
○ Linear scale
○ Replication(Encrypted)
○ Gossiping
● Each rack contains one
copy of data, partitioned
across multiple nodes in
that rack
● Multiple Racks == Higher
Availability (HA)
Topology
Replication
● A client can connect to any node on
the Dynomite cluster when sending
requests.
o If node owns the data,
▪ data are written in local
data-store and
asynchronously replicated.
o If node does not own the data
▪ node acts as a coordinator
and sends the data in the
same rack & replicates to
other nodes in other racks
and DC.
The Dynomite Ecosystem
RESP = Redis
Serialization Protocol
Consistency
● DC_ONE
o Reads and writes are propagated synchronously only to the
node in local rack and asynchronously replicated to other racks
and data centers
● DC_QUORUM
o Reads and writes are propagated synchronously to quorum
number of nodes in the local region and asynchronously to the
rest
● Consistency can be configured dynamically for read or write
operations separately (cluster-wide)
Performance Setup
● Instance Type:
○ Dynomite: r3.2xlarge (1Gbps)
○ Pappy/Dyno: m2.2xls (typical of an app@Netflix)
● Replication factor: 3
○ Deployed Dynomite in 3 zones in us-east-1
○ Every zone had the same number of servers
● Demo app used simple workloads key/value pairs
○ Redis: GET and SET
● Payload
○ Size: 1024 Bytes
○ 80%/20% reads over writes
Performance (Dynomite Speed)
● Throughput scales linearly with number of nodes.
● Dynomite can reach >1Million Client requests with ~24 nodes.
Performance (Latency - average/P50)
● Dynomite’s latency on average is 0.16ms.
● Client side latency is 0.6ms and does not increase as the cluster scales
up/down
Performance (Latency - P99)
● The major contributor to latency at P99 is the network.
● Dynomite affects <10%
Dynomite-manager
● Token management for multi-region deployments
● Support AWS environment
● Automated security group update in multi-region environment
● Monitoring of Dynomite and the underlying storage engine
● Node cold bootstrap (warm up)
● S3 backups and restores
● REST API
Dynomite-manager: warm up
1. Dynomite-manager identifies which node has the same token in the
same DC
2. Sets Redis to “Slave” mode of that node
3. Checks for peer syncing
a. difference between master and slave offset
4. Once master and slave are in sync, Dynomite is set to allow write only
5. Dynomite is set back to normal state
6. Checks for health of the node - Done!
Warm up (node terminated)
Warm up (auto-scale)
Warm up (node with same token)
Warm up (Redis replication)
Warm up (Streaming data)
Warm up (Nodes in sync)
Dynomite: S3 backups/restores
● Why?
o Disaster recovery
o Data corruption
● How?
o Redis dumps data on the instance drive
o Dynomite-manager sends data to S3 buckets
● Data per node are not large so no need for incrementals.
● Use case:
o clusters that use Dynomite as a storage layer
o Not enabled in clusters that have short TTL or use Dynomite as a
cache
Dynomite S3 backups (operation)
1. Perform backup
a. Dynomite-manager performs it on a pre-defined interval
b. Dynomite-manger REST call:
i. curl http://localhost:8080/REST/v1/admin/s3backup
2. Perform a Redis BGREWRITEAOF or BGSAVE.
a. Check the size of the persisted file. If the size is zero, which means that there
was an issue with Redis or no data are there, then we do not perform S3
backups
3. S3 backup key: backup/region/clustername-ASG/token/date
Dynomite S3 restores
1. Perform restore:
a. Dynomite-manager performs once it starts if configuration is enabled
b. Dynomite-manger REST call:
i. curl http://localhost:8080/REST/v1/admin/s3backup
2. Stop Dynomite process:
a. We perform this to notify Discovery that Dynomite is not accessible
b. Stop Redis process
3. Restore the data from a specific date
a. provided in the configuration
4. Start Redis process and check if the data has been loaded.
5. Start Dynomite and check if process is up
Dyno Client - Java API
● Connection Pooling
● Load Balancing
● Effective failover
● Pipelining
● Scatter/Gather
● Metrics, e.g. Netflix Insights
Dyno Load Balancing
● Dyno client employs token
aware load balancing.
● Dyno client is aware of the
cluster topology of Dynomite
within the region,
can write to specific node
using consistent
hashing.
Dyno Failover
● Dyno will route
requests to
different racks in
failure scenarios.
Roadmap
● Multi-threaded support for Dynomite
● Data reconciliation & repair v2
● Dynomite-spark connector
● Investigation for persistent stores
● Async Dyno Client
● Others….
More information
● Netflix OSS:
o https://github.com/Netflix/dynomite
o https://github.com/Netflix/dyno
● Contact:
o email: {sbirari, ipapapanagiotou}@netflix.com
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix

Contenu connexe

Tendances

A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon Cloud
Deepak Rao
 

Tendances (20)

Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Bloomfilter
BloomfilterBloomfilter
Bloomfilter
 
Comparison of MQTT and DDS as M2M Protocols for the Internet of Things
Comparison of MQTT and DDS as M2M Protocols for the Internet of ThingsComparison of MQTT and DDS as M2M Protocols for the Internet of Things
Comparison of MQTT and DDS as M2M Protocols for the Internet of Things
 
Démonstration : Comment la plateforme Denodo permet d'accélérer l'analyse de ...
Démonstration : Comment la plateforme Denodo permet d'accélérer l'analyse de ...Démonstration : Comment la plateforme Denodo permet d'accélérer l'analyse de ...
Démonstration : Comment la plateforme Denodo permet d'accélérer l'analyse de ...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Migrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQLMigrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQL
 
System Design Interviews.pdf
System Design Interviews.pdfSystem Design Interviews.pdf
System Design Interviews.pdf
 
Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory EngineRedis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
 
Process and Thread
Process and ThreadProcess and Thread
Process and Thread
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
GIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control SystemGIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control System
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon Cloud
 

En vedette

Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Igalia
 

En vedette (20)

What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 
Troubleshooting Redis- DaeMyung Kang, Kakao
Troubleshooting Redis- DaeMyung Kang, KakaoTroubleshooting Redis- DaeMyung Kang, Kakao
Troubleshooting Redis- DaeMyung Kang, Kakao
 
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
 
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops TeamManaging 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
 
Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - Redis
 
Redis cluster
Redis clusterRedis cluster
Redis cluster
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
WAN Replication: Hazelcast Enterprise Lightning Talk
WAN Replication: Hazelcast Enterprise Lightning TalkWAN Replication: Hazelcast Enterprise Lightning Talk
WAN Replication: Hazelcast Enterprise Lightning Talk
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
 
Getting Started with Redis
Getting Started with RedisGetting Started with Redis
Getting Started with Redis
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecase
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Honest performance testing with NDBench
Honest performance testing with NDBenchHonest performance testing with NDBench
Honest performance testing with NDBench
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
 
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaRedis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of Lua
 
Getting Started with Redis
Getting Started with RedisGetting Started with Redis
Getting Started with Redis
 
Use Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysUse Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual Ways
 

Similaire à Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 

Similaire à Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix (20)

Dynomite - PerconaLive 2017
Dynomite  - PerconaLive 2017Dynomite  - PerconaLive 2017
Dynomite - PerconaLive 2017
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Security sizing meetup
Security sizing meetupSecurity sizing meetup
Security sizing meetup
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
EQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdfEQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdf
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 

Plus de Redis Labs

SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 

Plus de Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix

  • 1. Cloud Database Engineering Making Non-Distributed Databases, Distributed - Shailesh Birari - Ioannis Papapanagiotou, PhD
  • 2. Dynomite Ecosystem ● Dynomite ● Dynomite-manager ● Dyno client
  • 3. Cloud Database Engg (CDE) Team ● Develop and operate data stores in AWS - Cassandra, Dynomite, Elastic Search, RDS, S3 ● Ensure availability, scalability, durability and latency SLAs ● Database expertise, client libraries, tools and best practices
  • 4. ● Cassandra not a speed demon (reads) ● Needed a data store: o Scalable & highly available o High throughput, low latency o Active-active multi datacenter replication ● Usage of Redis increasing: o Netflix use case is active-active, highly available o Does not have bi-directional replication o Cannot withstand a Monkey attack Problems & Observations
  • 5. What is Dynomite? ● A framework that makes non-distributed data stores, distributed. o Can be used with many key-value storage engines like Redis, Memcached, LMDB, etc. o Focus: performance, cross-datacenter active-active replication and high availability o Features: node warmup (cold bootstrapping), tunable consistency, S3 backups/restores
  • 6. Dynomite @ Netflix ● Running around 1.5 years in PROD ● ~1000 customer facing nodes ● 1M OPS at peak ● Largest cluster: 6TB ● Quarterly upgrades in PROD
  • 7. Dynomite Overview ● Layer on top of a non-distributed key value data store ○ Peer-peer, Shared Nothing ○ Auto Sharding ○ Multi-datacenter ○ Linear scale ○ Replication(Encrypted) ○ Gossiping
  • 8. ● Each rack contains one copy of data, partitioned across multiple nodes in that rack ● Multiple Racks == Higher Availability (HA) Topology
  • 9. Replication ● A client can connect to any node on the Dynomite cluster when sending requests. o If node owns the data, ▪ data are written in local data-store and asynchronously replicated. o If node does not own the data ▪ node acts as a coordinator and sends the data in the same rack & replicates to other nodes in other racks and DC.
  • 10. The Dynomite Ecosystem RESP = Redis Serialization Protocol
  • 11. Consistency ● DC_ONE o Reads and writes are propagated synchronously only to the node in local rack and asynchronously replicated to other racks and data centers ● DC_QUORUM o Reads and writes are propagated synchronously to quorum number of nodes in the local region and asynchronously to the rest ● Consistency can be configured dynamically for read or write operations separately (cluster-wide)
  • 12. Performance Setup ● Instance Type: ○ Dynomite: r3.2xlarge (1Gbps) ○ Pappy/Dyno: m2.2xls (typical of an app@Netflix) ● Replication factor: 3 ○ Deployed Dynomite in 3 zones in us-east-1 ○ Every zone had the same number of servers ● Demo app used simple workloads key/value pairs ○ Redis: GET and SET ● Payload ○ Size: 1024 Bytes ○ 80%/20% reads over writes
  • 13. Performance (Dynomite Speed) ● Throughput scales linearly with number of nodes. ● Dynomite can reach >1Million Client requests with ~24 nodes.
  • 14. Performance (Latency - average/P50) ● Dynomite’s latency on average is 0.16ms. ● Client side latency is 0.6ms and does not increase as the cluster scales up/down
  • 15. Performance (Latency - P99) ● The major contributor to latency at P99 is the network. ● Dynomite affects <10%
  • 16. Dynomite-manager ● Token management for multi-region deployments ● Support AWS environment ● Automated security group update in multi-region environment ● Monitoring of Dynomite and the underlying storage engine ● Node cold bootstrap (warm up) ● S3 backups and restores ● REST API
  • 17. Dynomite-manager: warm up 1. Dynomite-manager identifies which node has the same token in the same DC 2. Sets Redis to “Slave” mode of that node 3. Checks for peer syncing a. difference between master and slave offset 4. Once master and slave are in sync, Dynomite is set to allow write only 5. Dynomite is set back to normal state 6. Checks for health of the node - Done!
  • 18. Warm up (node terminated)
  • 20. Warm up (node with same token)
  • 21. Warm up (Redis replication)
  • 23. Warm up (Nodes in sync)
  • 24. Dynomite: S3 backups/restores ● Why? o Disaster recovery o Data corruption ● How? o Redis dumps data on the instance drive o Dynomite-manager sends data to S3 buckets ● Data per node are not large so no need for incrementals. ● Use case: o clusters that use Dynomite as a storage layer o Not enabled in clusters that have short TTL or use Dynomite as a cache
  • 25. Dynomite S3 backups (operation) 1. Perform backup a. Dynomite-manager performs it on a pre-defined interval b. Dynomite-manger REST call: i. curl http://localhost:8080/REST/v1/admin/s3backup 2. Perform a Redis BGREWRITEAOF or BGSAVE. a. Check the size of the persisted file. If the size is zero, which means that there was an issue with Redis or no data are there, then we do not perform S3 backups 3. S3 backup key: backup/region/clustername-ASG/token/date
  • 26. Dynomite S3 restores 1. Perform restore: a. Dynomite-manager performs once it starts if configuration is enabled b. Dynomite-manger REST call: i. curl http://localhost:8080/REST/v1/admin/s3backup 2. Stop Dynomite process: a. We perform this to notify Discovery that Dynomite is not accessible b. Stop Redis process 3. Restore the data from a specific date a. provided in the configuration 4. Start Redis process and check if the data has been loaded. 5. Start Dynomite and check if process is up
  • 27. Dyno Client - Java API ● Connection Pooling ● Load Balancing ● Effective failover ● Pipelining ● Scatter/Gather ● Metrics, e.g. Netflix Insights
  • 28. Dyno Load Balancing ● Dyno client employs token aware load balancing. ● Dyno client is aware of the cluster topology of Dynomite within the region, can write to specific node using consistent hashing.
  • 29. Dyno Failover ● Dyno will route requests to different racks in failure scenarios.
  • 30. Roadmap ● Multi-threaded support for Dynomite ● Data reconciliation & repair v2 ● Dynomite-spark connector ● Investigation for persistent stores ● Async Dyno Client ● Others….
  • 31. More information ● Netflix OSS: o https://github.com/Netflix/dynomite o https://github.com/Netflix/dyno ● Contact: o email: {sbirari, ipapapanagiotou}@netflix.com