SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
ProtectWise optimizes performance of
Cassandra and Kafka workloads with
Amazon EBS
G E N E S T E V E N S , C T O & C O - F O U N D E R , P R O T E C T W I S E
R O B E R T T A R R A L L , D I R E C T O R O F D E V O P S , P R O T E C T W I S E
A N D R E Y Z A Y C H I K O V , S R . S O L U T I O N S A R C H I T E C T , A W S
STG329
N o v e m b e r , 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AGENDA
• Intro to NoSQL on AWS
• Intro to Apache Cassandra and Apache Kafka on AWS
• Best practices for Cassandra and Kafka deployments on AWS
• ProtectWise Use Case
• Use Case & Optimizations for Kafka
• Use Case & Optimizations for Cassandra
• Use Case & Optimizations for Amazon S3
• Lessons Learned
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NoSQL as a technology
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Database per Workload
Penatho
Talend
Vertica
Aerospike
Cassandra
MongoDB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Database options on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data movement
OnlineOffline
Data security
and management
Complete set of data building blocks
Amazon
EFS
Amazon
EBS
AWS Snow family
AWS Storage Gateway
Family
AWS Direct Connect
Amazon EFS File Sync
Amazon S3
Transfer Acceleration
Storage Partners
Amazon Kinesis
Data Streams
Amazon Kinesis
Video Streams
Amazon
S3
Amazon
Glacier
AWS KMS
AWS IAM
AWS CloudWatch
AWS CloudTrail
AWS Cloud Formation
AWS Lambda
Amazon Macie
AWS QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Cassandra and why to use it?
• Apache Cassandra is an open-
source database based on
Dynamo model
• It massively scalable geo-
distributed high-performance key
value database
• Among most often use cases for
Cassandra we can name:
• Time-series data
• Social media
• User sessions (aka shopping
carts, etc.)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Cassandra works on a cluster level
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Cassandra works on a node level
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application interactions with Cassandra
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best practices for Cassandra on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Apache Kafka and why to use it?
• Apache Kafka is an open-source distributed
streaming platform
• It allows you to:
• Publish and subscribe to streams of records
• Store streams in a fault-tolerant way
• Process streams of records
• Most common use cases for Kafka are:
• Build data pipelines to capture and transfer
data between systems & applications
• Build real-time apps which react to the
streams of data
• Kafka is often used as a means to capture fast
arriving data and put before database, for
example, Cassandra. Such a setup can reduce
amount of pressure on the database.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Kafka works
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best practices for Kafka on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Choosing proper instance and storage types
Database implementation, data schema, and access patterns should always be
considered. Compute and storage types should always be adapted to particular
situation and can change during DB lifetime.
Cassandra Kafka
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CASE STUDY: PROTECTWISE
GENE STEVENS, CTO & CO-FOUNDER, PROTECTWISE
ROBERT TARRALL, DIRECTOR OF DEVOPS, PROTECTWISE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROTECTWISE OVERVIEW
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GOALS
• Very low end-to-end latency (~1 second)
• Very high availability
• Over 1 billion writes per hour
• High tolerance for bursts (10x-100x
normal volume)
• Trillions of records per year
• Less than 10-second response time to
searches
• Arbitrary queries: “all non-HTTP traffic on
port 80”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
INITIAL ARCHITECTURE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SOLUTION: Amazon S3, Kafka, Amazon EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SOLUTION: Amazon S3, Kafka, Amazon EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SOLUTION: Amazon S3, Kafka, Amazon EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA
Our Kafka clusters:
• 1,000 topics
• Up to 200 partitions per topic
• 45 c4.2xlarge
• 2x 1 TB gp2 EBS volumes
• Peak consumption > 100
MB/sec/server
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA: WINS
• Retention: 24 hours of “buffer”
• Pub/sub with “at least once”
guarantee
• Fanout means we can test in
production:
• New engines publish to
“profiling” topic, confirm useful
detections
• Significant code changes can be
performance-tested
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA: NOTES
• Partition is your fundamental unit of
scaling – use lots of partitions
• Use round robin partition
assignment, not range
• Be sure to test “edge” cases:
recovery times, backlog
• As broker recovers each
partition, consumers
rebalance
• “At least once” = “sometimes more
than once”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA: WARNINGS/CAVEATS
• Beware cross-AZ replication costs!
• Kafka has only limited “rack
awareness”
• Producers and consumers talk to
the “leader” of a partition
• With RF=2, data may cross AZs 3
(or more) times
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA – Cross-AZ Traffic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA – Cross-AZ Traffic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA: WARNINGS/CAVEATS
• Monthly costs in perspective:
• Instances: $8,000
• gp2 EBS volumes: $8,000
• Network traffic: $40,000
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KAFKA: WARNINGS/CAVEATS
• Single broker failure impacts the
whole cluster
• “Let’s bump that timeout” often
has unexpected consequences
• Mostly trust default settings (but
use round-robin!)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DATABASE
• Sustain 250K writes/sec (bursts > 1 million/sec)
• 1 year of data
• Support arbitrary queries
• < 10-second response time for search
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DATABASE (v1)
• We use DataStax Enterprise
Search
• Supports over 1 TB of data +
index on an i2.2xlarge
• Handled the load, but one
month of data = 100x
i2.2xlarge
• We keep a year of data…
• Sharded by time, migrated
older data to r4.2xlarge with
gp2 EBS volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DATABASE (v1) – Lessons Learned
• Use DSE 5.0 or later – much better indexing throughput
• Don’t use vnodes
• Do use large heap (20-30 GB is fine) and G1GC
• Beware outdated blog posts! (Amazon EBS has come a LONG
way)
• High write throughput + search leads to high operational burden
• Use Amazon EBS if at all possible
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EBS!
Migrating data to Amazon EBS taught us that
we really want to use Amazon EBS:
• Instances without ephemeral fail less
frequently
• Amazon EBS volumes do fail, but very rarely
• Decoupling state from compute is a huge
win:
• Need more CPU in your Cassandra
cluster? Stop one AZ, change instance
type, start; repeat for all AZs
• Modify Amazon EBS volume to expand
storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 for Full-text Search
Today’s architecture (write path):
• Data written to Cassandra with
a TTL
• Once final (a few hours), a
Spark job on Amazon EMR:
• Reads data from C*
• Writes Parquet files to
Amazon S3
• Writes Bloom filters to
Solr
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 for Full-text Search
Read path:
• Very recent data
answered from C*/Solr
• Bloom filters tell us
which parquet files to
lift from Amazon S3 for
older data
• Spark on Amazon EMR
reads the Parquet files –
highly parallel
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LESSONS LEARNED - KAFKA
Overall Kafka is a very valuable part of our platform and works great
on EBS.
If expecting massive scale, keep the following in mind:
• Cross-AZ replication cost adds up. 2 GB/sec for 1 month is 5
petabytes.
• A single broker can cause availability problems (not data loss) for
the whole cluster.
• Small clusters are very easy to operate; larger clusters have more
issues and higher mean time to recovery.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LESSONS LEARNED - CASSANDRA
• Cassandra and Amazon EBS have
come a long way in a short time
• Ignore most of what was written
before 2016!
• 2014: “unless you want to
add more complexity for
your operations team…
choose ephemeral”
(DataStax blog)
• Today: “gp2 volumes… best
choice for most workloads”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LESSONS LEARNED - CASSANDRA
• Cassandra is REALLY good at handling bursts
• Take the time to run benchmarks matching your expected
workload:
• Run long enough to reach “steady state” (hours to days)
• Object sizes, read/write ratios, key distribution
• Compaction strategy
• Watch for:
• Pending compactions
• Blocked native transport requests
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LESSONS LEARNED – Amazon EBS &
Amazon S3
• Higher latency (vs. ephemeral) doesn’t
mean lower throughput!
• Mitigate latency impact by increasing
parallelism
• Major operational wins:
• Much higher reliability (both
storage and compute)
• Decoupling state from compute
allows each to be independently
adjusted
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LESSONS LEARNED – Amazon S3
Planning for high Amazon S3 request rate:
• Add random prefix to avoid hotspots:
• s3://bucket/ApD4J. <object_name>
• If you have sufficient randomness, you’re
not going to run into Amazon S3 limits…
• We’ve had over 1 billion objects and
5 petabytes in a bucket
• We made 1600 API calls/sec against
that bucket for 2 weeks on top of
regular production workload with
zero impact
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Contenu connexe

Tendances

STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsAmazon Web Services
 
How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017Amazon Web Services
 
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonAmazon Web Services
 
CTD302_How Hulu reinvented Television using the AWS Cloud
CTD302_How Hulu reinvented Television using the AWS CloudCTD302_How Hulu reinvented Television using the AWS Cloud
CTD302_How Hulu reinvented Television using the AWS CloudAmazon Web Services
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)Amazon Web Services
 
CTD301_Amazon CloudFront Flash Talks
CTD301_Amazon CloudFront Flash TalksCTD301_Amazon CloudFront Flash Talks
CTD301_Amazon CloudFront Flash TalksAmazon Web Services
 
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...Amazon Web Services
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSAmazon Web Services
 
NET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesNET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesAmazon Web Services
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon Web Services
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersAmazon Web Services
 
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudNET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudAmazon Web Services
 
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...Amazon Web Services
 
MBL310_Building Hybrid and Web Apps with AWS Mobile CLI
MBL310_Building Hybrid and Web Apps with AWS Mobile CLIMBL310_Building Hybrid and Web Apps with AWS Mobile CLI
MBL310_Building Hybrid and Web Apps with AWS Mobile CLIAmazon Web Services
 
MCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionMCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionAmazon Web Services
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerAmazon Web Services
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfAmazon Web Services
 

Tendances (20)

STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data Workloads
 
How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017
 
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
 
CTD302_How Hulu reinvented Television using the AWS Cloud
CTD302_How Hulu reinvented Television using the AWS CloudCTD302_How Hulu reinvented Television using the AWS Cloud
CTD302_How Hulu reinvented Television using the AWS Cloud
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
 
CTD301_Amazon CloudFront Flash Talks
CTD301_Amazon CloudFront Flash TalksCTD301_Amazon CloudFront Flash Talks
CTD301_Amazon CloudFront Flash Talks
 
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...
CMP319_Easily Coordinate Microservices, Build Serverless Apps, and Automate T...
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWS
 
NET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesNET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use Cases
 
STG306_Deep Dive on Amazon EBS
STG306_Deep Dive on Amazon EBSSTG306_Deep Dive on Amazon EBS
STG306_Deep Dive on Amazon EBS
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudNET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
 
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
 
MBL310_Building Hybrid and Web Apps with AWS Mobile CLI
MBL310_Building Hybrid and Web Apps with AWS Mobile CLIMBL310_Building Hybrid and Web Apps with AWS Mobile CLI
MBL310_Building Hybrid and Web Apps with AWS Mobile CLI
 
MCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionMCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon Rekognition
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load Balancer
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 Service
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdf
 

Similaire à STG329_ProtectWise optimizes performance of Cassandra and Kafka workloads with Amazon EBS

ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSAmazon Web Services
 
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùng
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùngXây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùng
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùngAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...Amazon Web Services
 
Amazon Relational Database Service – How is it different to what you do today ?
Amazon Relational Database Service – How is it different to what you do today ?Amazon Relational Database Service – How is it different to what you do today ?
Amazon Relational Database Service – How is it different to what you do today ?Amazon Web Services
 
Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)AWS Germany
 
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...Amazon Web Services
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSDAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSAmazon Web Services
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAPI Talent
 
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaScale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaAmazon Web Services
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Amazon Web Services
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Amazon Web Services
 

Similaire à STG329_ProtectWise optimizes performance of Cassandra and Kafka workloads with Amazon EBS (20)

ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùng
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùngXây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùng
Xây dựng website và ứng dụng mobile đáp ứng 10 triệu người dùng
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...
STG309_Deep Dive Using Hybrid Storage with AWS Storage Gateway to Solve On-Pr...
 
Amazon Relational Database Service – How is it different to what you do today ?
Amazon Relational Database Service – How is it different to what you do today ?Amazon Relational Database Service – How is it different to what you do today ?
Amazon Relational Database Service – How is it different to what you do today ?
 
Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)Amazon Aurora (MySQL, Postgres)
Amazon Aurora (MySQL, Postgres)
 
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
Case Study: Learn how to Choose and Optimize Storage for Media and Entertainm...
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...
A Practitioner’s Guide on Migrating to, and Running on Amazon Aurora - DAT315...
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWS
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSDAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
 
AWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 RecapAWS User Group Wellington - re:Invent 2017 Recap
AWS User Group Wellington - re:Invent 2017 Recap
 
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaScale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS Cloud
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
AWS Storage Stage of Union
AWS Storage Stage of UnionAWS Storage Stage of Union
AWS Storage Stage of Union
 
Data Migration Best Practices
Data Migration Best PracticesData Migration Best Practices
Data Migration Best Practices
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

STG329_ProtectWise optimizes performance of Cassandra and Kafka workloads with Amazon EBS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT ProtectWise optimizes performance of Cassandra and Kafka workloads with Amazon EBS G E N E S T E V E N S , C T O & C O - F O U N D E R , P R O T E C T W I S E R O B E R T T A R R A L L , D I R E C T O R O F D E V O P S , P R O T E C T W I S E A N D R E Y Z A Y C H I K O V , S R . S O L U T I O N S A R C H I T E C T , A W S STG329 N o v e m b e r , 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AGENDA • Intro to NoSQL on AWS • Intro to Apache Cassandra and Apache Kafka on AWS • Best practices for Cassandra and Kafka deployments on AWS • ProtectWise Use Case • Use Case & Optimizations for Kafka • Use Case & Optimizations for Cassandra • Use Case & Optimizations for Amazon S3 • Lessons Learned
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NoSQL as a technology
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Database per Workload Penatho Talend Vertica Aerospike Cassandra MongoDB
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Database options on AWS
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data movement OnlineOffline Data security and management Complete set of data building blocks Amazon EFS Amazon EBS AWS Snow family AWS Storage Gateway Family AWS Direct Connect Amazon EFS File Sync Amazon S3 Transfer Acceleration Storage Partners Amazon Kinesis Data Streams Amazon Kinesis Video Streams Amazon S3 Amazon Glacier AWS KMS AWS IAM AWS CloudWatch AWS CloudTrail AWS Cloud Formation AWS Lambda Amazon Macie AWS QuickSight
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Cassandra and why to use it? • Apache Cassandra is an open- source database based on Dynamo model • It massively scalable geo- distributed high-performance key value database • Among most often use cases for Cassandra we can name: • Time-series data • Social media • User sessions (aka shopping carts, etc.)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Cassandra works on a cluster level
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Cassandra works on a node level
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application interactions with Cassandra
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best practices for Cassandra on AWS
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Apache Kafka and why to use it? • Apache Kafka is an open-source distributed streaming platform • It allows you to: • Publish and subscribe to streams of records • Store streams in a fault-tolerant way • Process streams of records • Most common use cases for Kafka are: • Build data pipelines to capture and transfer data between systems & applications • Build real-time apps which react to the streams of data • Kafka is often used as a means to capture fast arriving data and put before database, for example, Cassandra. Such a setup can reduce amount of pressure on the database.
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Kafka works
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best practices for Kafka on AWS
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Choosing proper instance and storage types Database implementation, data schema, and access patterns should always be considered. Compute and storage types should always be adapted to particular situation and can change during DB lifetime. Cassandra Kafka
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CASE STUDY: PROTECTWISE GENE STEVENS, CTO & CO-FOUNDER, PROTECTWISE ROBERT TARRALL, DIRECTOR OF DEVOPS, PROTECTWISE
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PROTECTWISE OVERVIEW
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GOALS • Very low end-to-end latency (~1 second) • Very high availability • Over 1 billion writes per hour • High tolerance for bursts (10x-100x normal volume) • Trillions of records per year • Less than 10-second response time to searches • Arbitrary queries: “all non-HTTP traffic on port 80”
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. INITIAL ARCHITECTURE
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SOLUTION: Amazon S3, Kafka, Amazon EBS
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SOLUTION: Amazon S3, Kafka, Amazon EBS
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SOLUTION: Amazon S3, Kafka, Amazon EBS
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA Our Kafka clusters: • 1,000 topics • Up to 200 partitions per topic • 45 c4.2xlarge • 2x 1 TB gp2 EBS volumes • Peak consumption > 100 MB/sec/server
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA: WINS • Retention: 24 hours of “buffer” • Pub/sub with “at least once” guarantee • Fanout means we can test in production: • New engines publish to “profiling” topic, confirm useful detections • Significant code changes can be performance-tested
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA: NOTES • Partition is your fundamental unit of scaling – use lots of partitions • Use round robin partition assignment, not range • Be sure to test “edge” cases: recovery times, backlog • As broker recovers each partition, consumers rebalance • “At least once” = “sometimes more than once”
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA: WARNINGS/CAVEATS • Beware cross-AZ replication costs! • Kafka has only limited “rack awareness” • Producers and consumers talk to the “leader” of a partition • With RF=2, data may cross AZs 3 (or more) times
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA – Cross-AZ Traffic
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA – Cross-AZ Traffic
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA: WARNINGS/CAVEATS • Monthly costs in perspective: • Instances: $8,000 • gp2 EBS volumes: $8,000 • Network traffic: $40,000
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KAFKA: WARNINGS/CAVEATS • Single broker failure impacts the whole cluster • “Let’s bump that timeout” often has unexpected consequences • Mostly trust default settings (but use round-robin!)
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DATABASE • Sustain 250K writes/sec (bursts > 1 million/sec) • 1 year of data • Support arbitrary queries • < 10-second response time for search
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DATABASE (v1) • We use DataStax Enterprise Search • Supports over 1 TB of data + index on an i2.2xlarge • Handled the load, but one month of data = 100x i2.2xlarge • We keep a year of data… • Sharded by time, migrated older data to r4.2xlarge with gp2 EBS volumes
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DATABASE (v1) – Lessons Learned • Use DSE 5.0 or later – much better indexing throughput • Don’t use vnodes • Do use large heap (20-30 GB is fine) and G1GC • Beware outdated blog posts! (Amazon EBS has come a LONG way) • High write throughput + search leads to high operational burden • Use Amazon EBS if at all possible
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EBS! Migrating data to Amazon EBS taught us that we really want to use Amazon EBS: • Instances without ephemeral fail less frequently • Amazon EBS volumes do fail, but very rarely • Decoupling state from compute is a huge win: • Need more CPU in your Cassandra cluster? Stop one AZ, change instance type, start; repeat for all AZs • Modify Amazon EBS volume to expand storage
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 for Full-text Search Today’s architecture (write path): • Data written to Cassandra with a TTL • Once final (a few hours), a Spark job on Amazon EMR: • Reads data from C* • Writes Parquet files to Amazon S3 • Writes Bloom filters to Solr
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 for Full-text Search Read path: • Very recent data answered from C*/Solr • Bloom filters tell us which parquet files to lift from Amazon S3 for older data • Spark on Amazon EMR reads the Parquet files – highly parallel
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LESSONS LEARNED - KAFKA Overall Kafka is a very valuable part of our platform and works great on EBS. If expecting massive scale, keep the following in mind: • Cross-AZ replication cost adds up. 2 GB/sec for 1 month is 5 petabytes. • A single broker can cause availability problems (not data loss) for the whole cluster. • Small clusters are very easy to operate; larger clusters have more issues and higher mean time to recovery.
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LESSONS LEARNED - CASSANDRA • Cassandra and Amazon EBS have come a long way in a short time • Ignore most of what was written before 2016! • 2014: “unless you want to add more complexity for your operations team… choose ephemeral” (DataStax blog) • Today: “gp2 volumes… best choice for most workloads”
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LESSONS LEARNED - CASSANDRA • Cassandra is REALLY good at handling bursts • Take the time to run benchmarks matching your expected workload: • Run long enough to reach “steady state” (hours to days) • Object sizes, read/write ratios, key distribution • Compaction strategy • Watch for: • Pending compactions • Blocked native transport requests
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LESSONS LEARNED – Amazon EBS & Amazon S3 • Higher latency (vs. ephemeral) doesn’t mean lower throughput! • Mitigate latency impact by increasing parallelism • Major operational wins: • Much higher reliability (both storage and compute) • Decoupling state from compute allows each to be independently adjusted
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LESSONS LEARNED – Amazon S3 Planning for high Amazon S3 request rate: • Add random prefix to avoid hotspots: • s3://bucket/ApD4J. <object_name> • If you have sufficient randomness, you’re not going to run into Amazon S3 limits… • We’ve had over 1 billion objects and 5 petabytes in a bucket • We made 1600 API calls/sec against that bucket for 2 weeks on top of regular production workload with zero impact
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!