SlideShare une entreprise Scribd logo
1  sur  37
Designing & Optimizing micro batching systems with
Cassandra, Spark and
Kafka
Accenture
Copyright © 2014 Accenture. All rights reserved.
Designing & Optimizing
micro batching systems
with Cassandra, Spark and
Kafka
Copyright © 2012 Accenture All rights reserved. 3
• Ananth Ram
– Big Data & Oracle Solution Architect , Accenture
– (Accenture Enkitec Group)
– Ananth.ram@Accenture.com
• Rumeel Kazi
– Big Data Solution Architect , Accenture
– (Accenture Federal)
– rumeel.k.kazi@accenturefederal.com
• Rich Rein
– Solution Architect , Datastax
– Rich.rein@datastax.com
Speaker Details and Contact
Copyright © 2012 Accenture All rights reserved. 4
• Data Acceleration and Micro Batching
• Big data Architecture
– Technical Architecture
– Application Architecture
– Data Supply Chain Approach & Framework
• Application Design & Operations
– Design Considerations
– Data Flow
– Optimizations and Operations
• Application Access Patterns
– The Problems and Physics
– Idempotency
– Partition per Read
• Takeaways
Agenda
Copyright © 2012 Accenture All rights reserved. 5
• Data as Value Chain
• Data Acceleration
– Movement
– Processing
– Insights
• High throughput with Micro Batch
Data Acceleration & Micro Batch !
Copyright © 2012 Accenture All rights reserved. 6
Big Data Architecture
Copyright © 2012 Accenture All rights reserved. 7
IV Hardware Architecture
Oracle 12c
Technical Architecture – Sample
Cassandra
Spark
Solr
Hadoop
Kafka
Spark
Big Data
Interfaces
NAS
Clustered
MQ
Files
External
Databases
Prod
(A)
Prod
(B)
Prod
(C)
Prod
(D)
Oracle 12c
12 Blades
288 Cores
6TBRAM
12 Blades
288 Cores
6TBRAM
12 Blades
288 Cores
6TB RAM
12 Blades
288 Cores
6TB RAM
4 x 10G– RAC
Interconnect
Copyright © 2012 Accenture All rights reserved. 8
Data
Enrichment
44 nodes
RAC
4 nodes
Data Ingest
16 Nodes 23 Nodes
112 Nodes
Interfaces
12 Nodes
Technical Architecture – Additional Details
• Separate Datacenters for Cassandra and Solr.
• Spark is running in the same node as Cassandra for data locality.
• Kafka , java spring batch and spark streaming are used to enrich billions of
records a day
Java
Copyright © 2012 Accenture All rights reserved. 9
Application Architecture
• Data enriched using java spring batch and spark streaming using kafka as temporary
staging area.
• Cassandra is used for faster lookups, summary views and persistence storage.
Data Ingestion
&
Business Rules
Application Cache
External
System
Interfaces
TXN DATA
(MQ, FILES, DB LINK)
OPERATIONAL
EVENTS
(MQ)
REFERENCE DATA
(MQ, FILES)
Java spring batch
Enriched Data
Aggregated Views
Reference Data
DataStore
Events
Data
IN-MEMORY TABLES
Reporting
WEB PORTAL
(CANNEDREPORTS)
&
PUSH ALERTS
AD-HOCQUERIES
Spark
Streaming
& Kafka
Enrichment
Process 2
Enrichment
Process 1
Enriched
Data
EVENTSDATA
Aggregated Views
Cassandra , Solr
HDFS, HIVE, Spark,
Spark R
Copyright © 2012 Accenture All rights reserved. 10
• Cassandra
– Cassandra 400K/sec read/writes
– Cassandra - 1ms – 3ms Read Latency, 0.2 – 0.3 ms write.
Spark
– Spark Streaming processes 200K events/sec.
– Spark Streaming runs in the same host as Cassandra for data locality
• Kafka
– 800K/second total messages processed through 30 brokers.
– Kafka broker throughput is 30k/messages per broker.
– Snappy Compression gives up to 5X throughput in Benchmark. Yet to be tested in our apps.
• Java Apps
– Java spring batch processes 400K records/sec using 1000’s of threads in apps server.
– 32GB JVM with GC1 garbage collection with application cache gives this throughput.
Cassandra, Spark and Kafka Metrics
Copyright © 2012 Accenture All rights reserved. 11
Big Data Architecture Approach
Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf
*Accenture Labs Paper – Carl Dukatz
Copyright © 2012 Accenture All rights reserved. 12
Big Data Architecture Design Considerations - Criteria
Sample
Copyright © 2012 Accenture All rights reserved. 13
Big Data Design Considerations - Approach
Copyright © 2012 Accenture All rights reserved. 14
Design Considerations &
UseCases
Big Data Design Considerations
Copyright © 2012 Accenture All rights reserved. 15
Application Design and
Operations
Copyright © 2012 Accenture All rights reserved. 16
High Level Design Pattern
Copyright © 2012 Accenture All rights reserved. 17
Pipeline Stage 0 (Partial Data Enrichment)
Kafka Cluster
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition N
DSE Cassandra / Spark Cluster
Executor 0 Cache
Executor 1 Cache
Executor N Cache
Pipeline Stage 1 (Partial Data Enrichment)
Kafka Cluster
Topic B
Partition 0
Topic B
Partition 1
Topic B
Partition N
DSE Cassandra / Spark Cluster
Executor 0 Cache
Executor 1 Cache
Executor N Cache
Pipeline Stage 1 (Partial Data Enrichment)
Kafka Cluster
Topic C
Partition 0
Topic C
Partition 1
Topic C
Partition N
DSE Cassandra / Spark Cluster
Executor 0 Cache
Executor 1 Cache
Executor N Cache
Data Processing Pipeline
Copyright © 2012 Accenture All rights reserved. 18
 Application Metric Collection / Diagnostic Logging
 Include application level operational metrics as part of design. Collect Cassandra and Kafka
processing metrics including response times at object level.
 Executors report application functionality specific throughput and backlog metric to the driver
that then keeps aggregated count of point in time metrics for the process.
 Kafka / Cassandra data partitioning strategies
 Distribute partitioning keys evenly across the nodes on Cassandra and Kafka brokers. For
scenarios where this can’t be done easily when data is skewed to certain data entities that
need to be part of partitioning keys, add time windows as part of partitioning keys to avoid
data skewed to few nodes.
 Time-based partitioning to avoid data skewed to few nodes.
 Spark Executor Configurations
 Define Spark number of executors to match the number of partitions on the topic. Can have
more than one partition per executor depending on the throughput/latency need - keep it low
for reduced latency.
 Web / Solr Interface Consideration
 For consistency requirement, write at consistency-level of ALL on Solr data centers, if it fails
write local quorum. Additional sub-second overhead to be considered based on functional
needs.
Application Design Considerations
Copyright © 2012 Accenture All rights reserved. 19
 Compaction Strategies
 Date Tiered v/s Size Tiered Compactions – High resource utilization on over 50 TB sized
tables running size tiered compactions on high velocity data and need to consider Date
Tiered for time series data.
 “Hot Spots” monitoring and actions
 Partition keys are chosen to ensure hot data is distributed evenly over the nodes
 Application logs with query, keys, and duration for exceeded SLAs can make problems with
specific keys known.
 Instrument application to rerun the query with CQL trace enabled to see where time was
spent.
 OpsCenter table metrics can show which nodes contain hotspots
 Nodetool toppartitions also shows the hot partition keys on a node
Performance Considerations
Copyright © 2012 Accenture All rights reserved. 20
 Spark batch window optimization and max messages per partitions
 Optimize batch duration to not have wasted batch processing time.
 Define max messages per partition when executor spans multiple partitions. Prevents OOM
exceptions as well as keeps batch processing rate balanced.
 Dynamically change max rate based on wasted batch processing time.
 DataStax Driver Settings
 Separate Transaction data and Searched data into right sized data centers
 Search data to be read and written to the same DC.
 Use local data center aware strategy in conjunction with token aware.
 In-memory tables and Local caching
 Limit the number of in-memory tables to constantly changing but smaller tables that are
accessed very frequently.
 Consider local application caching for frequently accessed data.
Performance Considerations
Copyright © 2012 Accenture All rights reserved. 21
 Latency & Throughput Monitoring
 Application should drive the data instead of technology stacks
 Use Splunk or ELK to aggregate, correlate data across nodes
 Co-relating Errors
 Use tools like Splunk or ELK
 Build Custom tools
 For Cassandra ( Nodetool, data from Opscenter)
 JMX from kafka
 Aggregated data in metrics table
 Use Java profiler like Yourkit
 For Cassandra latency Debugging
 Java memory , CPU and contentions
 Identify bottlenecks causing by specific methods/calls
Application Operations
Copyright © 2012 Accenture All rights reserved. 22
Access Patterns
Copyright © 2012 Accenture All rights reserved. 23
High Speed, Never Stop
1. The pipeline should never stop or wait
2. No stopping to upgrade software or hardware
3. No time for rollback. Roll forward.
4. No delays that will disrupt the write pipeline or read
throughput.
5. No time for locks, slow reads, large reads, joins, or
read-modify-write.
6. All frequent operations are short.
Copyright © 2012 Accenture All rights reserved. 24
• Cost prohibits the frequent unnecessary
• No unnecessary frequently read data.
• No unnecessary frequently written data.
Affordable
Copyright © 2012 Accenture All rights reserved. 25
No
• Long operations – Use the correct access patterns
• Client congestion
– Threads, sockets, heap, CPU, Memory, NUMA Cache
• Node congestion
– Threads, sockets, heap, CPU, Memory, NUMA Cache
– Storage channels
– Un-tuned or inconsistently tuned Cassandra nodes
• Network and NIC congestion
Pipeline Delays
Copyright © 2012 Accenture All rights reserved. 26
If 2 ms is your target
• Think about how many requests can a node process
in that time window without congesting the client or
node.
• Web and IoT tend to be evenly distributed over time,
avoiding timeslot contention.
• Batch size that can be processed in the time slot?
• Careful parallelization may be needed.
Physics of the SLA Time Slot
Copyright © 2012 Accenture All rights reserved. 27
Hot Partitions
Physics of Partitions
Hot Batch or Traffic
Copyright © 2012 Accenture All rights reserved. 28
• Correct table partition keys and access patterns
– Scale from 6 nodes to 1000’s
• Incorrect
– Does not scale by adding nodes
– Will not handle more load
Get the Partition Access Patterns Right
Copyright © 2012 Accenture All rights reserved. 29
Physics of a single Partition
Microseconds Operation
0.1 Read and Write RAM
100 Write Partition
100 Read Partition from memory
2,000 Read Hash access to partition in memory and read SSD
20,000 Read Hash access to partition in memory and read Spindle
Copyright © 2012 Accenture All rights reserved. 30
• Avoid
– Lists (collection)
– Read-modify-write Updates
– Counters
– GUIDs only identification of real world objects or actions
• Allows client retry (roll-forward)
• Allows pipelining of updates without waits
Idempotency
Copyright © 2012 Accenture All rights reserved. 31
• Replace read-modify-write operations
– Counters
– Updated aggregates
– Lists (collection)
With
– data increment values which get aggregated in
microbatches
– Cassandra 3.0 Aggregates
– Sets (collection)
Replace Read-Modify-Write
Copyright © 2012 Accenture All rights reserved. 32
• Reads must wait
• API Reads are 25-50x slower than writes
• Reads consume 5x the resource bandwidth of a write
• Disk is far cheaper than RAM, CPU, and Rack
• So
– Design writes for reads
– De-normalization the same as for relational
• Multiple materialized views and temp tables
• Summary tables
Denormalize
Copyright © 2012 Accenture All rights reserved. 33
Nesting Rows in the Partitions – 1 of 3
Copyright © 2012 Accenture All rights reserved. 34
Write nested data to further reduce the read to 1 partition
Nesting Rows in the Partitions – 2 of 3
Copyright © 2012 Accenture All rights reserved. 35
Cassandra allows 3 levels to be nested in a single partition
Nesting Rows in the Partitions – 3 of 3
Copyright © 2012 Accenture All rights reserved. 36
Summary / Takeaways
Copyright © 2012 Accenture All rights reserved. 37
• Treat data pipeline as value chain and accelerate movement
using fit-for-purpose Bigdata stack.
• Design your apps to drive latency/throughput visibility
• Micro batch in every layer possible to get high throughput
• Enrich data in Kafka using spark/spark streaming as process
engine.
• Cache frequently accessed data closer to code to get best
throughput.
• Focus on datamodel and Access patterns
• Review distinct features of Bigdata technology platform for
data acceleration (Accenture Approach white paper).
Summary / Take Away

Contenu connexe

Tendances

Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkBen Slater
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...DataStax
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applicationsBen Slater
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016DataStax
 

Tendances (20)

Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 

En vedette

A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...DataStax
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
Optimizing Cassandra in AWS
Optimizing Cassandra in AWSOptimizing Cassandra in AWS
Optimizing Cassandra in AWSgreggulrich
 
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...DataStax
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 

En vedette (9)

A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Optimizing Cassandra in AWS
Optimizing Cassandra in AWSOptimizing Cassandra in AWS
Optimizing Cassandra in AWS
 
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 

Similaire à Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, Rumeel Kazi, Accenture / Rich Rein, DataStax) | C* Summit 2016

Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 

Similaire à Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, Rumeel Kazi, Accenture / Rich Rein, DataStax) | C* Summit 2016 (20)

Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 

Plus de DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

Plus de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Dernier

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 

Dernier (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 

Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, Rumeel Kazi, Accenture / Rich Rein, DataStax) | C* Summit 2016

  • 1. Designing & Optimizing micro batching systems with Cassandra, Spark and Kafka Accenture
  • 2. Copyright © 2014 Accenture. All rights reserved. Designing & Optimizing micro batching systems with Cassandra, Spark and Kafka
  • 3. Copyright © 2012 Accenture All rights reserved. 3 • Ananth Ram – Big Data & Oracle Solution Architect , Accenture – (Accenture Enkitec Group) – Ananth.ram@Accenture.com • Rumeel Kazi – Big Data Solution Architect , Accenture – (Accenture Federal) – rumeel.k.kazi@accenturefederal.com • Rich Rein – Solution Architect , Datastax – Rich.rein@datastax.com Speaker Details and Contact
  • 4. Copyright © 2012 Accenture All rights reserved. 4 • Data Acceleration and Micro Batching • Big data Architecture – Technical Architecture – Application Architecture – Data Supply Chain Approach & Framework • Application Design & Operations – Design Considerations – Data Flow – Optimizations and Operations • Application Access Patterns – The Problems and Physics – Idempotency – Partition per Read • Takeaways Agenda
  • 5. Copyright © 2012 Accenture All rights reserved. 5 • Data as Value Chain • Data Acceleration – Movement – Processing – Insights • High throughput with Micro Batch Data Acceleration & Micro Batch !
  • 6. Copyright © 2012 Accenture All rights reserved. 6 Big Data Architecture
  • 7. Copyright © 2012 Accenture All rights reserved. 7 IV Hardware Architecture Oracle 12c Technical Architecture – Sample Cassandra Spark Solr Hadoop Kafka Spark Big Data Interfaces NAS Clustered MQ Files External Databases Prod (A) Prod (B) Prod (C) Prod (D) Oracle 12c 12 Blades 288 Cores 6TBRAM 12 Blades 288 Cores 6TBRAM 12 Blades 288 Cores 6TB RAM 12 Blades 288 Cores 6TB RAM 4 x 10G– RAC Interconnect
  • 8. Copyright © 2012 Accenture All rights reserved. 8 Data Enrichment 44 nodes RAC 4 nodes Data Ingest 16 Nodes 23 Nodes 112 Nodes Interfaces 12 Nodes Technical Architecture – Additional Details • Separate Datacenters for Cassandra and Solr. • Spark is running in the same node as Cassandra for data locality. • Kafka , java spring batch and spark streaming are used to enrich billions of records a day Java
  • 9. Copyright © 2012 Accenture All rights reserved. 9 Application Architecture • Data enriched using java spring batch and spark streaming using kafka as temporary staging area. • Cassandra is used for faster lookups, summary views and persistence storage. Data Ingestion & Business Rules Application Cache External System Interfaces TXN DATA (MQ, FILES, DB LINK) OPERATIONAL EVENTS (MQ) REFERENCE DATA (MQ, FILES) Java spring batch Enriched Data Aggregated Views Reference Data DataStore Events Data IN-MEMORY TABLES Reporting WEB PORTAL (CANNEDREPORTS) & PUSH ALERTS AD-HOCQUERIES Spark Streaming & Kafka Enrichment Process 2 Enrichment Process 1 Enriched Data EVENTSDATA Aggregated Views Cassandra , Solr HDFS, HIVE, Spark, Spark R
  • 10. Copyright © 2012 Accenture All rights reserved. 10 • Cassandra – Cassandra 400K/sec read/writes – Cassandra - 1ms – 3ms Read Latency, 0.2 – 0.3 ms write. Spark – Spark Streaming processes 200K events/sec. – Spark Streaming runs in the same host as Cassandra for data locality • Kafka – 800K/second total messages processed through 30 brokers. – Kafka broker throughput is 30k/messages per broker. – Snappy Compression gives up to 5X throughput in Benchmark. Yet to be tested in our apps. • Java Apps – Java spring batch processes 400K records/sec using 1000’s of threads in apps server. – 32GB JVM with GC1 garbage collection with application cache gives this throughput. Cassandra, Spark and Kafka Metrics
  • 11. Copyright © 2012 Accenture All rights reserved. 11 Big Data Architecture Approach Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf *Accenture Labs Paper – Carl Dukatz
  • 12. Copyright © 2012 Accenture All rights reserved. 12 Big Data Architecture Design Considerations - Criteria Sample
  • 13. Copyright © 2012 Accenture All rights reserved. 13 Big Data Design Considerations - Approach
  • 14. Copyright © 2012 Accenture All rights reserved. 14 Design Considerations & UseCases Big Data Design Considerations
  • 15. Copyright © 2012 Accenture All rights reserved. 15 Application Design and Operations
  • 16. Copyright © 2012 Accenture All rights reserved. 16 High Level Design Pattern
  • 17. Copyright © 2012 Accenture All rights reserved. 17 Pipeline Stage 0 (Partial Data Enrichment) Kafka Cluster Topic A Partition 0 Topic A Partition 1 Topic A Partition N DSE Cassandra / Spark Cluster Executor 0 Cache Executor 1 Cache Executor N Cache Pipeline Stage 1 (Partial Data Enrichment) Kafka Cluster Topic B Partition 0 Topic B Partition 1 Topic B Partition N DSE Cassandra / Spark Cluster Executor 0 Cache Executor 1 Cache Executor N Cache Pipeline Stage 1 (Partial Data Enrichment) Kafka Cluster Topic C Partition 0 Topic C Partition 1 Topic C Partition N DSE Cassandra / Spark Cluster Executor 0 Cache Executor 1 Cache Executor N Cache Data Processing Pipeline
  • 18. Copyright © 2012 Accenture All rights reserved. 18  Application Metric Collection / Diagnostic Logging  Include application level operational metrics as part of design. Collect Cassandra and Kafka processing metrics including response times at object level.  Executors report application functionality specific throughput and backlog metric to the driver that then keeps aggregated count of point in time metrics for the process.  Kafka / Cassandra data partitioning strategies  Distribute partitioning keys evenly across the nodes on Cassandra and Kafka brokers. For scenarios where this can’t be done easily when data is skewed to certain data entities that need to be part of partitioning keys, add time windows as part of partitioning keys to avoid data skewed to few nodes.  Time-based partitioning to avoid data skewed to few nodes.  Spark Executor Configurations  Define Spark number of executors to match the number of partitions on the topic. Can have more than one partition per executor depending on the throughput/latency need - keep it low for reduced latency.  Web / Solr Interface Consideration  For consistency requirement, write at consistency-level of ALL on Solr data centers, if it fails write local quorum. Additional sub-second overhead to be considered based on functional needs. Application Design Considerations
  • 19. Copyright © 2012 Accenture All rights reserved. 19  Compaction Strategies  Date Tiered v/s Size Tiered Compactions – High resource utilization on over 50 TB sized tables running size tiered compactions on high velocity data and need to consider Date Tiered for time series data.  “Hot Spots” monitoring and actions  Partition keys are chosen to ensure hot data is distributed evenly over the nodes  Application logs with query, keys, and duration for exceeded SLAs can make problems with specific keys known.  Instrument application to rerun the query with CQL trace enabled to see where time was spent.  OpsCenter table metrics can show which nodes contain hotspots  Nodetool toppartitions also shows the hot partition keys on a node Performance Considerations
  • 20. Copyright © 2012 Accenture All rights reserved. 20  Spark batch window optimization and max messages per partitions  Optimize batch duration to not have wasted batch processing time.  Define max messages per partition when executor spans multiple partitions. Prevents OOM exceptions as well as keeps batch processing rate balanced.  Dynamically change max rate based on wasted batch processing time.  DataStax Driver Settings  Separate Transaction data and Searched data into right sized data centers  Search data to be read and written to the same DC.  Use local data center aware strategy in conjunction with token aware.  In-memory tables and Local caching  Limit the number of in-memory tables to constantly changing but smaller tables that are accessed very frequently.  Consider local application caching for frequently accessed data. Performance Considerations
  • 21. Copyright © 2012 Accenture All rights reserved. 21  Latency & Throughput Monitoring  Application should drive the data instead of technology stacks  Use Splunk or ELK to aggregate, correlate data across nodes  Co-relating Errors  Use tools like Splunk or ELK  Build Custom tools  For Cassandra ( Nodetool, data from Opscenter)  JMX from kafka  Aggregated data in metrics table  Use Java profiler like Yourkit  For Cassandra latency Debugging  Java memory , CPU and contentions  Identify bottlenecks causing by specific methods/calls Application Operations
  • 22. Copyright © 2012 Accenture All rights reserved. 22 Access Patterns
  • 23. Copyright © 2012 Accenture All rights reserved. 23 High Speed, Never Stop 1. The pipeline should never stop or wait 2. No stopping to upgrade software or hardware 3. No time for rollback. Roll forward. 4. No delays that will disrupt the write pipeline or read throughput. 5. No time for locks, slow reads, large reads, joins, or read-modify-write. 6. All frequent operations are short.
  • 24. Copyright © 2012 Accenture All rights reserved. 24 • Cost prohibits the frequent unnecessary • No unnecessary frequently read data. • No unnecessary frequently written data. Affordable
  • 25. Copyright © 2012 Accenture All rights reserved. 25 No • Long operations – Use the correct access patterns • Client congestion – Threads, sockets, heap, CPU, Memory, NUMA Cache • Node congestion – Threads, sockets, heap, CPU, Memory, NUMA Cache – Storage channels – Un-tuned or inconsistently tuned Cassandra nodes • Network and NIC congestion Pipeline Delays
  • 26. Copyright © 2012 Accenture All rights reserved. 26 If 2 ms is your target • Think about how many requests can a node process in that time window without congesting the client or node. • Web and IoT tend to be evenly distributed over time, avoiding timeslot contention. • Batch size that can be processed in the time slot? • Careful parallelization may be needed. Physics of the SLA Time Slot
  • 27. Copyright © 2012 Accenture All rights reserved. 27 Hot Partitions Physics of Partitions Hot Batch or Traffic
  • 28. Copyright © 2012 Accenture All rights reserved. 28 • Correct table partition keys and access patterns – Scale from 6 nodes to 1000’s • Incorrect – Does not scale by adding nodes – Will not handle more load Get the Partition Access Patterns Right
  • 29. Copyright © 2012 Accenture All rights reserved. 29 Physics of a single Partition Microseconds Operation 0.1 Read and Write RAM 100 Write Partition 100 Read Partition from memory 2,000 Read Hash access to partition in memory and read SSD 20,000 Read Hash access to partition in memory and read Spindle
  • 30. Copyright © 2012 Accenture All rights reserved. 30 • Avoid – Lists (collection) – Read-modify-write Updates – Counters – GUIDs only identification of real world objects or actions • Allows client retry (roll-forward) • Allows pipelining of updates without waits Idempotency
  • 31. Copyright © 2012 Accenture All rights reserved. 31 • Replace read-modify-write operations – Counters – Updated aggregates – Lists (collection) With – data increment values which get aggregated in microbatches – Cassandra 3.0 Aggregates – Sets (collection) Replace Read-Modify-Write
  • 32. Copyright © 2012 Accenture All rights reserved. 32 • Reads must wait • API Reads are 25-50x slower than writes • Reads consume 5x the resource bandwidth of a write • Disk is far cheaper than RAM, CPU, and Rack • So – Design writes for reads – De-normalization the same as for relational • Multiple materialized views and temp tables • Summary tables Denormalize
  • 33. Copyright © 2012 Accenture All rights reserved. 33 Nesting Rows in the Partitions – 1 of 3
  • 34. Copyright © 2012 Accenture All rights reserved. 34 Write nested data to further reduce the read to 1 partition Nesting Rows in the Partitions – 2 of 3
  • 35. Copyright © 2012 Accenture All rights reserved. 35 Cassandra allows 3 levels to be nested in a single partition Nesting Rows in the Partitions – 3 of 3
  • 36. Copyright © 2012 Accenture All rights reserved. 36 Summary / Takeaways
  • 37. Copyright © 2012 Accenture All rights reserved. 37 • Treat data pipeline as value chain and accelerate movement using fit-for-purpose Bigdata stack. • Design your apps to drive latency/throughput visibility • Micro batch in every layer possible to get high throughput • Enrich data in Kafka using spark/spark streaming as process engine. • Cache frequently accessed data closer to code to get best throughput. • Focus on datamodel and Access patterns • Review distinct features of Bigdata technology platform for data acceleration (Accenture Approach white paper). Summary / Take Away

Notes de l'éditeur

  1. Based on Accenture Lab Research Paper: http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf
  2. Based on Accenture Lab Research Paper: http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf