ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime

•

0 j'aime•726 vues

We at ShareChat want to describe our journey of migrating our NoSQL use-cases from different databases to cloud agnostic | performant | cost effective ScyllaDB. We had this mammoth task of migrating all the data from existing databases to ScyllaDB with NO downtime. We built a framework which had minimal impact on the application in terms of latency and downtime during the migration process and that has the ability to fallback to the source table in case of any inconsistency during migration. Our framework solved for multiple use-cases including both counter and non-counter use-cases across 3 languages used in ShareChat (NodeJS, GoLang, Java) through shared library. We built recovery and auditing mechanisms to maintain consistency across source and destination tables. There are 3 main components. For the existing data, we have a analytical job based on apache beam which moves the data between source and destination tables. For the live traffic, we have a generic driver which applications use to perform dual writes in both source and destination tables. For any failure scenarios be it in dual writes or any consistency issues, we have a job to re-sync the inconsistent data. During this entire process, we faced many challenges and ended up pushing ScyllaDB to its limit. We tried different consistency levels, compaction strategies, optimised data models and were finally able to move 100 TB of raw data from multiple databases to ScyllaDB with minimal inconsistency.

Technologie

ShareChat's Journey
Migrating 100TB of Data to
ScyllaDB with NO Downtime
Chinmoy Mahapatra - Software Engineer, Platforms
Anuraj Jain - Software Engineer, Platforms

■ Live Migration Framework Overview
■ Dual Writes Deep Dive
■ Handling Conﬂicts
■ Handling Counters
■ Export Job Overview
■ Export Job Deep Dive
■ Challenges / Learnings
■ Results
Presentation Agenda

Migration Stages
Stage Read Write
Source DB ScyllaDB Source DB ScyllaDB
Dual writes enabled ✅ ❌ ✅ ✅
Export job Running ✅ ❌ ✅ ✅
Audit ✅ ❌ ✅ ✅
Validation ✅ ❌ ✅ ✅
Read Switch ❌ ✅ ✅ ✅
Write Switch ❌ ✅ ❌ ✅

Migration Stages (Counters)
Stage Read Write
Source DB ScyllaDB Source DB ScyllaDB
Write to dirty keystore ✅ ❌ ✅ ❌
Export job Running ✅ ❌ ✅ ❌
Dual writes ✅ ❌ ✅ ✔
Migrate dirty keys ✅ ❌ ✅ ✔
Audit ✅ ❌ ✅ ✅
Validation ✅ ❌ ✅ ✅
Read Switch ❌ ✅ ✅ ✅
Write Switch ❌ ✅ ❌ ✅

Exporting the Existing Data From
Source to Destination Table

Let’s See Some Code
Our job has majorly 3 parts - Read, Transform, Write

How To Do a Timestamped Query?
We provide a CustomMapperFactory to our CassandraIO PTransform, which then uses a
CustomObjectMapper (a mapper interface implementation) and writes to ScyllaDB with a
custom timestamped query using a accessor.

Challenges/Learnings
1. Write Consistency | Quorum vs One
2. Partial Data Export with Apache Beam
3. Segregation of base table and materialized
4. Choosing the right compaction strategy | Incremental vs Null
5. Validation for the migrated data from entire source and
destination table
6. Migrating Counters | ensuring consistency

ShareChat’s Usage of ScyllaDB
■ We have migrated around 65+ TB of data already. Plan to
onboard additional 50 TB
■ Almost 35-40 services are using ScyllaDB at scale
■ Our biggest cluster size is around 28 TB and our max throughput
for one of our cluster is 1.5 Million Ops/sec

Thank You
Stay in Touch
Chinmoy Mahapatra
chinmoymahapatra@sharechat.co
https://github.com/ChinmoyMahapatra
https://www.linkedin.com/in/chinmoy-
s-mahapatra-5458b517b
Anuraj Jain
anurajjain@sharechat.co
https://github.com/anuraj381
https://www.linkedin.com/in/anuraj-jain-3101

Recommandé

Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB

MongoDB at BaiduMat Keep

Scalable Stream Processing with Apache SamzaPrateek Maheshwari

Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand

Have your cake and eat it tooGwen (Chen) Shapira

Have your Cake and Eat it Too - Architecture for Batch and Real-time processingDataWorks Summit

Flink SQL: The Challenges to Build a Streaming SQL EngineHostedbyConfluent

Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit

Recommandé

Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB

MongoDB at BaiduMat Keep

Scalable Stream Processing with Apache SamzaPrateek Maheshwari

Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand

Have your cake and eat it tooGwen (Chen) Shapira

Have your Cake and Eat it Too - Architecture for Batch and Real-time processingDataWorks Summit

Flink SQL: The Challenges to Build a Streaming SQL EngineHostedbyConfluent

Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit

Replicate from Oracle to data warehouses and analyticsContinuent

Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks

Bootstrapping state in Apache FlinkDataWorks Summit

Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand

Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson

MongoDB: What, why, whenEugenio Minardi

Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services

Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko

DAT320_Moving a Galaxy into CloudAmazon Web Services

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services

Data Analysis on AWSPaolo latella

Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScyllaDB

Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...IT Arena

MongoDB World 2019: Lessons Learned: Migrating Buffer's Production Database t...MongoDB

Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks

Seattle Spark Meetup Mobius CSharp APIshareddatamsft

Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 confluent

Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson

ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...Amazon Web Services

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

What Developers Need to Unlearn for High Performance NoSQLScyllaDB

Contenu connexe

Similaire à ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime

Replicate from Oracle to data warehouses and analyticsContinuent

Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks

Bootstrapping state in Apache FlinkDataWorks Summit

Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand

Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson

MongoDB: What, why, whenEugenio Minardi

Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services

Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko

DAT320_Moving a Galaxy into CloudAmazon Web Services

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services

Data Analysis on AWSPaolo latella

Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScyllaDB

Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...IT Arena

MongoDB World 2019: Lessons Learned: Migrating Buffer's Production Database t...MongoDB

Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks

Seattle Spark Meetup Mobius CSharp APIshareddatamsft

Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 confluent

Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson

ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...Amazon Web Services

Similaire à ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime (20)

Replicate from Oracle to data warehouses and analytics

Headaches and Breakthroughs in Building Continuous Applications

Bootstrapping state in Apache Flink

Cloud Native Data Pipelines (DataEngConf SF 2017)

Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...

MongoDB: What, why, when

Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...

Storing State Forever: Why It Can Be Good For Your Analytics

DAT320_Moving a Galaxy into Cloud

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...

Data Analysis on AWS

Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward

Ruslan Gibaiev.Doing real-time stream processing in one of the fastest-growin...

MongoDB World 2019: Lessons Learned: Migrating Buffer's Production Database t...

Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring

Seattle Spark Meetup Mobius CSharp API

Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019

Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...

ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...

Plus de ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

What Developers Need to Unlearn for High Performance NoSQLScyllaDB

Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB

Dissecting Real-World Database Performance DilemmasScyllaDB

Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB

Dissecting Real-World Database Performance DilemmasScyllaDB

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB

Replacing Your Cache with ScyllaDBScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB

7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB

Getting the most out of ScyllaDBScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB

ScyllaDB Virtual WorkshopScyllaDB

DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB

Build Low-Latency Applications in Rust on ScyllaDBScyllaDB

NoSQL Data Modeling 101ScyllaDB

Plus de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB Virtual Workshop

DBaaS in the Real World: Risks, Rewards & Tradeoffs

Build Low-Latency Applications in Rust on ScyllaDB

NoSQL Data Modeling 101

Dernier

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

ICT role in 21st century education and its challengesrafiqahmad00786416

MINDCTI Revenue Release Quarter One 2024MIND CTI

Manulife - Insurer Transformation Award 2024The Digital Insurer

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Architecting Cloud Native ApplicationsWSO2

Exploring Multimodal Embeddings with MilvusZilliz

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

MS Copilot expands with MS Graph connectors

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

ICT role in 21st century education and its challenges

MINDCTI Revenue Release Quarter One 2024

Manulife - Insurer Transformation Award 2024

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

presentation ICT roal in 21st century education

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Apidays New York 2024 - The value of a flexible API Management solution for O...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Architecting Cloud Native Applications

Exploring Multimodal Embeddings with Milvus

ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime

1. ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime Chinmoy Mahapatra - Software Engineer, Platforms Anuraj Jain - Software Engineer, Platforms

2. ■ Live Migration Framework Overview ■ Dual Writes Deep Dive ■ Handling Conﬂicts ■ Handling Counters ■ Export Job Overview ■ Export Job Deep Dive ■ Challenges / Learnings ■ Results Presentation Agenda

3. Live Migration Architecture

4. DB Driver?

5. Dual Writes?

6. Resolving Conflicts

7. Migration Stages Stage Read Write Source DB ScyllaDB Source DB ScyllaDB Dual writes enabled ✅ ❌ ✅ ✅ Export job Running ✅ ❌ ✅ ✅ Audit ✅ ❌ ✅ ✅ Validation ✅ ❌ ✅ ✅ Read Switch ❌ ✅ ✅ ✅ Write Switch ❌ ✅ ❌ ✅

8. What About Counters?

9. What About Counters?

10. Migration Stages (Counters) Stage Read Write Source DB ScyllaDB Source DB ScyllaDB Write to dirty keystore ✅ ❌ ✅ ❌ Export job Running ✅ ❌ ✅ ❌ Dual writes ✅ ❌ ✅ ✔ Migrate dirty keys ✅ ❌ ✅ ✔ Audit ✅ ❌ ✅ ✅ Validation ✅ ❌ ✅ ✅ Read Switch ❌ ✅ ✅ ✅ Write Switch ❌ ✅ ❌ ✅

11. Exporting the Existing Data From Source to Destination Table

12. Let’s See Some Code Our job has majorly 3 parts - Read, Transform, Write

13. How To Do a Timestamped Query? We provide a CustomMapperFactory to our CassandraIO PTransform, which then uses a CustomObjectMapper (a mapper interface implementation) and writes to ScyllaDB with a custom timestamped query using a accessor.

14. How To Do a Timestamped Query?

15. Challenges/Learnings 1. Write Consistency | Quorum vs One 2. Partial Data Export with Apache Beam 3. Segregation of base table and materialized 4. Choosing the right compaction strategy | Incremental vs Null 5. Validation for the migrated data from entire source and destination table 6. Migrating Counters | ensuring consistency

16. ShareChat’s Usage of ScyllaDB ■ We have migrated around 65+ TB of data already. Plan to onboard additional 50 TB ■ Almost 35-40 services are using ScyllaDB at scale ■ Our biggest cluster size is around 28 TB and our max throughput for one of our cluster is 1.5 Million Ops/sec

17. Thank You Stay in Touch Chinmoy Mahapatra chinmoymahapatra@sharechat.co https://github.com/ChinmoyMahapatra https://www.linkedin.com/in/chinmoy- s-mahapatra-5458b517b Anuraj Jain anurajjain@sharechat.co https://github.com/anuraj381 https://www.linkedin.com/in/anuraj-jain-3101