SlideShare une entreprise Scribd logo
1  sur  51
Monal Daxini @ monaldax
11/11/2019 ApacheCon, Las Vegas, 2019
https://www.linkedin.com/in/monaldaxini
Declarative Benchmarking of
Cassandra and It's Data Models
● Cloud Data Engineering @ Netflix, work on many data stores
● Help engineers build scalable solutions
● Built scalable data platforms using Apache Flink / Kafka / Docker
● Working with distributed systems for 18+ years
Profile
@monaldax
• 100’s of applications using Cassandra
• (several unique data models / config)
• 10’s of thousands instances
• 100’s of global C* clusters
• > 6 PB of data
• Millions of requests/ seconds
Netflix Cassandra Footprint
@monaldax
• Challenges developing a scalable data model (Cassandra)
• Declarative Cassandra benchmarking tool in action
• Tool’s philosophy, how it works, & how it can apply to other data stores
Structure Of The Talk
@monaldax
1. Design data model & schema
2. Design application queries
3. Identify application load & query
distribution
4. Prepare test data
5. Prepare query parameter values to
run queries efficiently
Developing a Scalable Cassandra Data Model
For each application:
6. Code an app to execute queries, and
instrument to capture metrics
7. Generate load against application to run
queries with desired distribution
8. Analyze results (build dashboard)
9. If results unsatisfactory, iterate from step 1
@monaldax
In addition,
We may need to test application workload on different
versions of Cassandra and or data models.
@monaldax
That’s a lot of steps, duplicate effort, and its cumbersome!
@monaldax
We want it to be easy, quick, and ergonomic!
1. Design data model & schema
2. Design application queries
3. Identify the application load & query
distribution
4. Prepare test data (generate)
9. Config tool, run test, if results
unsatisfactory, iterate from step 1
Developing a Scalable Cassandra Data Model
With tooling for each application:
5. Prepare query parameter values to run
queries efficiently
6. Code an app to execute queries, and
instrument to capture metrics
7. Generate load against application to run
queries with desired distribution
8. Analyze results (build dashboard)
Heavy Lifting in a Tool
@monaldax
● Generic benchmarking tool
● Support different data stores via plugin (available plugins)
● Dynamically tunable RPS and configuration
● Load patterns - random, time window, zipfian
What is NDBench?
@monaldax
NDBench In Action
NDBench NodeNDBench Node
(EC2 Instance)
NDBench Node
NDBench Node
(EC2 Instance)
Test
Cassandra Cluster
Schema & Test Data
reads / writes
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
• Emulate application query logic runs against real or generated data
• Specify the traffic % distribution
• Basic data type coalescing for using query result in another query
• Run any CQL statement (Select, Update, Insert, Delete) & support all CQL types
• Support any Cassandra version with CQL support
Cassandra NDBench CQL plugin
@monaldax
• Validate scalability of data model and application query workload
• Compare the performance of data model for Cassandra version 3.x & 2.x
• Help certify Cassandra updates / upgrades - test different data models and
application workloads
• Use for data generation for given schema before running queries
What Do We Use It For / Plan To Use It For
@monaldax
Walkthrough of NDBench
CQL Plugin In Action
Steps 1-4, 9
@monaldax
Cassandra Schema Of Sample Application (step 1)
@monaldax
Application CQL Queries For API 1 (steps 2, 3)
Query Group 1: 70%
SELECT user_id, profile_id FROM user WHERE user_id = ?;
SELECT foreign_keys FROM user_index WHERE type =
'profile_id' AND value = ?;
@monaldax
Application CQL Queries For API 2 (steps 2, 3)
Query Group 2: 30%
SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?;
BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value)
VALUES (?, [ ?, ? ], ''profile_id'', ?); INSERT INTO user_index (create_time,
foreign_keys, type, value) VALUES (?, [ ? ], ''acc_guid'', ?); APPLY BATCH;
INSERT INTO map_test (id, uid_pid) VALUES (''1'', {user_id : ?, profile_id: ?});
INSERT INTO set_test(id, uid_pid) VALUES (''2'', {?});
@monaldax
NDBench CQL Plugin Overview
Test
Cassandra Cluster
Schema &
Test Data
Run Queries
ndb_perf_queries
Perf Test Profile
NDBench NodeNDBench NodeNDBench Node
With CQL Plugin
(EC2 Instance)
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
NDBench CQL Plugin Perf-Test-Profile Schema (step 9)
@monaldax
var_* columns point to
different sources for
query parameter values.
Only one is used
ordered CQL in group (id)
Modified App Query With Parameter Reference - Group 1 (70%)
SELECT user_id, profile_id FROM user WHERE user_id = ?user_id?;
SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value
= ?profile_id?;
@monaldax
Modified App Query With Reference - 2 (30%)
SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?user_id?;
BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value)
VALUES (?:TS?, ?[user_id, profile_id]?, ''profile_id'', ?profile_id?); INSERT
INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?,
?[user_id]?, ''acc_guid'', ?acc_guid?); APPLY BATCH;
INSERT INTO map_test (id, uid_pid) VALUES (''1'', ?{user_id : user_id,
profile_id: profile_id}?);
INSERT INTO set_test(id, uid_pid) VALUES (''2'', ?s{user_id}s?);
Type Coercion
@monaldax
00:00
(mm: ss)
@monaldax
NDBench CQL Plugin Perf Test Profile - 2 Query Groups
@monaldax
NDBench CQL Plugin Perf Test Profile - Select source
@monaldax
NDBench CQL Plugin Perf Test Profile - Source Precedence
• Total traffic % of query groups must add up to 100
• Support different consistency level for each statement
• Columns in cql statement inferred, and available from the parameter source
• Parameter source - Table, Previous query results, SELECT statement
• Support large number of parameters to perf test CQL queries
Summary - Ergonomic Perf Test Profile, & Comprehensive Validation
@monaldax
Run Load Test
Spinnaker Pipeline
@monaldax
Run Load Test
Spinnaker Pipeline
@monaldax
Run Load Test
Spinnaker Pipeline
Manual Judgement
@monaldax
Test Specific Link
NDBenchUI-CQLPlugin
@monaldax
CassCQLPlugin
NDBenchUI-CQLPlugin
@monaldax
CassCQLPlugin
NDBenchUI-CQLPlugin
CassCQLPlugin
@monaldax
30:00
(mm: ss)
25 min perf test profile table entry, 5 min run test
@monaldax
Run Load Test
Spinnaker Pipeline
Manual Judgement
@monaldax
Test Specific Link
Dashboard
@monaldax
Dashboard - CQL Plugin Specific
@monaldax
Dashboard - Query Execution Latency Per Group
@monaldax
• Test scale up to 1.2 million ops / second (1.2 billion parameter rows)
• 96 nodes i3.8xl, LCS (compaction), LZ4, mostly read heavy
• Found data model bug, slowly leading to wide rows
• Client wrapper bugs - slow memory leak, metrics, prepared statement
caching not working
Testing C* Data Model For A Critical Service On 2.x & 3.x
@monaldax
We Would Like To Use Plugin To Test Cassandra @ Netflix
Use restores from prod data backups and define of
CQL Perf Test Profiles, exercised by the NDBench
CQL plugin, and triggered by Cassandra builds
@monaldax
Under The Hood Of
The CQL Plugin
@monaldax
NDBench CQL Plugin Architecture
Test
Cassandra Cluster
Schema &
Test Data
ndb_perf_queries
Run QueriesNDBench NodeNDBench Node
(EC2 Instance)
NDBench NodeNDBench Node
With CQL Plugin
(EC2 Instance)
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
Perf Test Profile
@monaldax
NDBench NodeNDBench Node
Sqlite
Param store
Cassandra Cluster
ndb_perf_queries
Schema &
Test Data
Metadata could live on
any Cassandra cluster.
Parse metadata1
Load from user & Storeon node in Sqlite
2
Run queries with param values from Sqlite
& record metrics
4
NDBench UI
/init/
all nodes
0
REST
/start/ all nodes3
High-level Architecture
Randomize start
High-level Architecture (optimized)
@monaldax
NDBench NodeNDBench Node
Sqlite
Param store
Cassandra Cluster
Schema &
Test Data
Metadata could live on
any Cassandra cluster.
Parse metadata1
If ! user param on S3Load from & Store on1 node in Sqlite
2
Run queries with param values from Sqlite
& record metrics
7
Upload Sqllite file3
/init/ a node0
NDBench UI
/init/
all nodes
4
REST
/start/ all nodes6
Download Sqllite file
from each node
5
Randomize start
ndb_perf_queries
Dashboard - Parameters Values Uploaded and Shared
@monaldax
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 1 )
1 1 1 1 1 1 1 2 2 2 2
70 1s for Query Group 1 30 2s for Query Group 2
100 Element Array ↓
@monaldax
1 2 1 1 2 1 2 1 2 1 1
1 time Fisher-Yates Shuffle
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 2 )
@monaldax
1 2 1 1 2 1 2 1 2 1 1
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 3 )
Thread 1
︴ThreadLocal
Array Index
Thread n
︴ThreadLocal
Array Index
@monaldax
Data Generators And Generating Test Data
• ?:TS? - This is replaced by a timestamp.
• Add more generators (future)
• generation of non-collection (bigint, text, uuid, etc.) and collection types
• Use generators in INSERT to generate data for new schema
@monaldax
Wrap Up
@monaldax
• Declaratively benchmarking significantly reduces overhead in iterating over
schema and Cassandra config to achieve scale
• Used to test and benchmark against curated data sets and perf-test-profiles
• Support all data types & LWT Support (beta)
• Randomized deterministic percentage distribution of queries
Summary
@monaldax
• Open source NDBench CQL plugin (WIP)
• Add more generators
• Load sharded query parameter data on each NDBench node
• UDT Support in dynamic collections
• Build support for other data stores - leverage same philosophy & reuse code
Future Enhancements (Lazily)
@monaldax
@monaldax
End of Season 1
Q & A
@monaldax

Contenu connexe

Tendances

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 

Tendances (20)

Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Developing Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For ScalaDeveloping Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For Scala
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 

Similaire à Declarative benchmarking of cassandra and it's data models

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 

Similaire à Declarative benchmarking of cassandra and it's data models (20)

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
NextGenML
NextGenML NextGenML
NextGenML
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
 

Plus de Monal Daxini

Plus de Monal Daxini (11)

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Declarative benchmarking of cassandra and it's data models

  • 1. Monal Daxini @ monaldax 11/11/2019 ApacheCon, Las Vegas, 2019 https://www.linkedin.com/in/monaldaxini Declarative Benchmarking of Cassandra and It's Data Models
  • 2. ● Cloud Data Engineering @ Netflix, work on many data stores ● Help engineers build scalable solutions ● Built scalable data platforms using Apache Flink / Kafka / Docker ● Working with distributed systems for 18+ years Profile @monaldax
  • 3. • 100’s of applications using Cassandra • (several unique data models / config) • 10’s of thousands instances • 100’s of global C* clusters • > 6 PB of data • Millions of requests/ seconds Netflix Cassandra Footprint @monaldax
  • 4. • Challenges developing a scalable data model (Cassandra) • Declarative Cassandra benchmarking tool in action • Tool’s philosophy, how it works, & how it can apply to other data stores Structure Of The Talk @monaldax
  • 5. 1. Design data model & schema 2. Design application queries 3. Identify application load & query distribution 4. Prepare test data 5. Prepare query parameter values to run queries efficiently Developing a Scalable Cassandra Data Model For each application: 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) 9. If results unsatisfactory, iterate from step 1 @monaldax
  • 6. In addition, We may need to test application workload on different versions of Cassandra and or data models. @monaldax
  • 7. That’s a lot of steps, duplicate effort, and its cumbersome! @monaldax We want it to be easy, quick, and ergonomic!
  • 8. 1. Design data model & schema 2. Design application queries 3. Identify the application load & query distribution 4. Prepare test data (generate) 9. Config tool, run test, if results unsatisfactory, iterate from step 1 Developing a Scalable Cassandra Data Model With tooling for each application: 5. Prepare query parameter values to run queries efficiently 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) Heavy Lifting in a Tool @monaldax
  • 9. ● Generic benchmarking tool ● Support different data stores via plugin (available plugins) ● Dynamically tunable RPS and configuration ● Load patterns - random, time window, zipfian What is NDBench? @monaldax
  • 10. NDBench In Action NDBench NodeNDBench Node (EC2 Instance) NDBench Node NDBench Node (EC2 Instance) Test Cassandra Cluster Schema & Test Data reads / writes Record Metrics NDBench NodeNDBench APP UI @monaldax
  • 11. • Emulate application query logic runs against real or generated data • Specify the traffic % distribution • Basic data type coalescing for using query result in another query • Run any CQL statement (Select, Update, Insert, Delete) & support all CQL types • Support any Cassandra version with CQL support Cassandra NDBench CQL plugin @monaldax
  • 12. • Validate scalability of data model and application query workload • Compare the performance of data model for Cassandra version 3.x & 2.x • Help certify Cassandra updates / upgrades - test different data models and application workloads • Use for data generation for given schema before running queries What Do We Use It For / Plan To Use It For @monaldax
  • 13. Walkthrough of NDBench CQL Plugin In Action Steps 1-4, 9 @monaldax
  • 14. Cassandra Schema Of Sample Application (step 1) @monaldax
  • 15. Application CQL Queries For API 1 (steps 2, 3) Query Group 1: 70% SELECT user_id, profile_id FROM user WHERE user_id = ?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?; @monaldax
  • 16. Application CQL Queries For API 2 (steps 2, 3) Query Group 2: 30% SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ?, ? ], ''profile_id'', ?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ? ], ''acc_guid'', ?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', {user_id : ?, profile_id: ?}); INSERT INTO set_test(id, uid_pid) VALUES (''2'', {?}); @monaldax
  • 17. NDBench CQL Plugin Overview Test Cassandra Cluster Schema & Test Data Run Queries ndb_perf_queries Perf Test Profile NDBench NodeNDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax
  • 18. NDBench CQL Plugin Perf-Test-Profile Schema (step 9) @monaldax var_* columns point to different sources for query parameter values. Only one is used ordered CQL in group (id)
  • 19. Modified App Query With Parameter Reference - Group 1 (70%) SELECT user_id, profile_id FROM user WHERE user_id = ?user_id?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?profile_id?; @monaldax
  • 20. Modified App Query With Reference - 2 (30%) SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?user_id?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id, profile_id]?, ''profile_id'', ?profile_id?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id]?, ''acc_guid'', ?acc_guid?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', ?{user_id : user_id, profile_id: profile_id}?); INSERT INTO set_test(id, uid_pid) VALUES (''2'', ?s{user_id}s?); Type Coercion @monaldax
  • 22. NDBench CQL Plugin Perf Test Profile - 2 Query Groups @monaldax
  • 23. NDBench CQL Plugin Perf Test Profile - Select source @monaldax
  • 24. NDBench CQL Plugin Perf Test Profile - Source Precedence
  • 25. • Total traffic % of query groups must add up to 100 • Support different consistency level for each statement • Columns in cql statement inferred, and available from the parameter source • Parameter source - Table, Previous query results, SELECT statement • Support large number of parameters to perf test CQL queries Summary - Ergonomic Perf Test Profile, & Comprehensive Validation @monaldax
  • 26. Run Load Test Spinnaker Pipeline @monaldax
  • 27. Run Load Test Spinnaker Pipeline @monaldax
  • 28. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  • 32. 30:00 (mm: ss) 25 min perf test profile table entry, 5 min run test @monaldax
  • 33. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  • 35. Dashboard - CQL Plugin Specific @monaldax
  • 36. Dashboard - Query Execution Latency Per Group @monaldax
  • 37. • Test scale up to 1.2 million ops / second (1.2 billion parameter rows) • 96 nodes i3.8xl, LCS (compaction), LZ4, mostly read heavy • Found data model bug, slowly leading to wide rows • Client wrapper bugs - slow memory leak, metrics, prepared statement caching not working Testing C* Data Model For A Critical Service On 2.x & 3.x @monaldax
  • 38. We Would Like To Use Plugin To Test Cassandra @ Netflix Use restores from prod data backups and define of CQL Perf Test Profiles, exercised by the NDBench CQL plugin, and triggered by Cassandra builds @monaldax
  • 39. Under The Hood Of The CQL Plugin @monaldax
  • 40. NDBench CQL Plugin Architecture Test Cassandra Cluster Schema & Test Data ndb_perf_queries Run QueriesNDBench NodeNDBench Node (EC2 Instance) NDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax Perf Test Profile
  • 41. @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster ndb_perf_queries Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 Load from user & Storeon node in Sqlite 2 Run queries with param values from Sqlite & record metrics 4 NDBench UI /init/ all nodes 0 REST /start/ all nodes3 High-level Architecture Randomize start
  • 42. High-level Architecture (optimized) @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 If ! user param on S3Load from & Store on1 node in Sqlite 2 Run queries with param values from Sqlite & record metrics 7 Upload Sqllite file3 /init/ a node0 NDBench UI /init/ all nodes 4 REST /start/ all nodes6 Download Sqllite file from each node 5 Randomize start ndb_perf_queries
  • 43. Dashboard - Parameters Values Uploaded and Shared @monaldax
  • 44. Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 1 ) 1 1 1 1 1 1 1 2 2 2 2 70 1s for Query Group 1 30 2s for Query Group 2 100 Element Array ↓ @monaldax
  • 45. 1 2 1 1 2 1 2 1 2 1 1 1 time Fisher-Yates Shuffle Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 2 ) @monaldax
  • 46. 1 2 1 1 2 1 2 1 2 1 1 Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 3 ) Thread 1 ︴ThreadLocal Array Index Thread n ︴ThreadLocal Array Index @monaldax
  • 47. Data Generators And Generating Test Data • ?:TS? - This is replaced by a timestamp. • Add more generators (future) • generation of non-collection (bigint, text, uuid, etc.) and collection types • Use generators in INSERT to generate data for new schema @monaldax
  • 49. • Declaratively benchmarking significantly reduces overhead in iterating over schema and Cassandra config to achieve scale • Used to test and benchmark against curated data sets and perf-test-profiles • Support all data types & LWT Support (beta) • Randomized deterministic percentage distribution of queries Summary @monaldax
  • 50. • Open source NDBench CQL plugin (WIP) • Add more generators • Load sharded query parameter data on each NDBench node • UDT Support in dynamic collections • Build support for other data stores - leverage same philosophy & reuse code Future Enhancements (Lazily) @monaldax
  • 51. @monaldax End of Season 1 Q & A @monaldax