SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Cassandra @ eBay
Jay Patel
Architect, Platform Systems
@pateljay3001
eBay Marketplaces
Thousands of servers
Petabytes of data
Billions of SQLs/day24x7x365
99.98+% Availability
turning over a TBevery second
Multiple Datacenters
Near-Real-time
Always online
400+ million items for sale
$75 billion+ per year in goods are sold on eBay
Big Data
112 million active users
Billions of page views/day
3
eBay Site Data Infrastructure
Don’t force!
One size does not fit all.
It’s a mixture of
multiple SQL &
NoSQL databases.
We use the right
database for the
right problem.
eBay Site Data Infrastructure
A heterogeneous mixture
Thousands of nodes
> 2K sharded logical host
> 16K tables
> 27K indexes
> 140 billion SQLs/day
> 5 PB provisioned
Hundreds of nodes
Persistent & in-memory
> 40 billion SQLs/day
10+ clusters, 100+ nodes
> 250 TB provisioned
(local HDD + shared SSD)
> 9 billion writes/day
> 5 billion reads/day
Hundreds of nodes
> 50 TB
> 2 billion ops/day
Thousands of nodes
The world largest
cluster with 2K+ nodes
Dozens of nodes
How do we scale RDBMS?
 Shard
– Patterns: Modulus, lookup-based, range, etc.
– Application sees only logical shard/database
 Replicate
– Disaster recovery, read availability & read scalability
 Big NOs
– No transactions
– No joins
– No referential integrity constraints 5
Why Cassandra?
 Multi-datacenter (active-active)
 Always Available - No SPOF
 Easy to scale up & down
6
 Write performance
 Distributed counters
 Hadoop support
Not replacing RDBMS, but complementing!
 Some use cases don’t fit well in RDBMS - sparse data, big data,
flexible schema, real-time analytics, …
 Many use cases don’t need top-tier set-ups.
Cassandra Growth
Aug,2011
Aug,2012
ay,2013
1
2
3
4
5
6
7
Billions
(per day)
writes
async. reads
sync. site reads
Terabytes
50
100
200
250
300
350
storage capacity
Doesn’t predict
business
7
eBay Use Cases on Cassandra
 Time-series data, real-time insights & immediate actions
• Fraud detection & prevention
• Quality Click Pricing for affiliates
• Order & shipment tracking and insights
• Mobile notification logging & tracking
• Cloud CMS change history storage
• RedLaser server logs and analytics
 Server metrics collection for monitoring & alerting
 Taste graph based next-gen recommendation system
 Personalization Data Service
 Social Signals on eBay Product & Item pages
 Milo’s store-item availability inventory (evaluation phase) 8
Real-time insights & actions for
9
Fraud Prevention Reporting
Quality Click Pricing More…
10
System Overview
Business Event Stream
Checkout Shipping Refund & Recoup …
Order placed
(bin/bid)
Paid Shipped Refunded
Rawdata
Simple in-memory aggregations +/
Complex Event Processing +/
Cassandra’s distributed counters
Label printed per day per user
User segmentation for affiliate pricing
Orders per hour, …
Multiple Cassandra clusters
Payment
Actinreal-time
Fraud Prevention
Affiliate Pricing Engine
(eBay Partner Network)
Order tracking
Real-time reporting
…
(Kept from several months to years)
A glimpse on Data Model
11
Historic & real-time insights per user per carrier.
Sudden & drastic change might be suspicious.
User bucketing based on historic
& real-time buying activity.
A glimpse on Data Model
12
Fraud Detection & Prevention
13
Shop with Confidence
System Overview
14
Cassandra
Fraud Detection & Prevention System
Sign-ininfo
Business events
(checkout, sell,…)
StaaSOracle
Checkout Shipping …PaymentSelling
Real-time
Beacons data
Real-time
Insights
Other data
Machine
Learned Models
15
A glimpse on Data Model
Collected at sign-in
& stored as key-value.
Pulled periodically to StaaS for
training machine learned models.
Metrics collection for monitoring & alerting
16
System Overview
17
Transport (HTTP, …)
Scalable NIO
servers based
on Netty
Thousands of
production
machines
Cassandra
Stats for CPU, Memory, Disk, ..
…
agent agent agent agent …
Server Server Server Server Server
In-memory grid (hazelcast) for rollups
A glimpse on Data Model
18
Granular data points
Rolled up metrics
for various time intervals
Taste graph based recommendation system
19
Data Model
20
TasteGraph
TasteVector
50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD
System Overview
21
Business Event Stream
Recommendation system
Taste GraphTaste Vector
1. Item purchased.
2a. Write purchase edge.
2b. Read other edges for this user & item.
4. Req. recommendations.
5. Finds other items close to
user’s coordinates.
6. Reco. shown to user
More, http://www.slideshare.net/planetcassandra/e-bay-nyc
Real-time Personalization Data Service
22
User performs search using keyword User gets personalized pages based on
implicit/explicit profile
System Overview
23
Personalization Data Service
CacheMesh
(write-back cache)
Heavy writes
eBay site pages (personalized)
Every few mins
in-memory
MySQL
& XMP DB
CassandraOracle
(scaled out) Heavyreads
Cache miss
user profiles
Application SOA services (multiple)
Data
Warehouse
Data Model
24
• Keep column names short.
• Don’t overload one CF with all the data:
- Split hot & cold data in separate CF.
- Splitting & sharding can help compaction.
Static column families
25
Served by
Cassandra
Social Signals
Manage signals via “Your Favorites”
26
Whole page is
served by
Cassandra
More, http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Multi-Datacenter Deployment
27
Topology - NTS
RF - 1:1 or 2:2 or 3:3
Read CL - ONE/QUORUM
Write CL - ONE
Data is backed up periodically
to protect against human or
software error
User request has no datacenter affinity
Non-sticky load balancing
Multi-Datacenter Deployment
Topology - NTS
RF – 1:1:1 or
2:2:2
Lessons & Best Practices
• One size does not fit all
– Use Cassandra for the right use cases.
• Choose proper Replication Factor and Consistency Level
– They alter latency, availability, durability, consistency and cost.
– Cassandra supports tunable consistency, but remember strong consistency is not free.
• Many ways to model data in Cassandra
– The best way depends on your use case and query patterns.
• De-normalize and duplicate for read performance
– But don’t de-normalize if you don’t need to.
http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices
29
Are you excited? Come Join Us!
30
Thank You
@pateljay3001
#cassandra13

Contenu connexe

Tendances

AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudAmazon Web Services
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...Amazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerAmazon Web Services
 
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Amazon Web Services
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Amazon Web Services
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceAmazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAmazon Web Services Korea
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS SecurityAmazon Web Services
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNicholas Vossburg
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 

Tendances (20)

AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to Cloud
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and Docker
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
AWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell NashAWS Innovate: Running Databases in AWS- Russell Nash
AWS Innovate: Running Databases in AWS- Russell Nash
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch Deck
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 

En vedette

Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
 
Arabian literature
Arabian literatureArabian literature
Arabian literatureGenevieve Oh
 
Presentacion mercy angulo
Presentacion  mercy anguloPresentacion  mercy angulo
Presentacion mercy angulomercynatalia1
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningChase Aucoin
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudOlivia Klose
 
3. los derechos auxilares del acreedor[1]
3. los derechos auxilares del acreedor[1]3. los derechos auxilares del acreedor[1]
3. los derechos auxilares del acreedor[1]Clara Miranda
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practicesOmid Vahdaty
 
Enterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesEnterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesDavid Veksler
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Creation of cloud application using microsoft azure by vaishali sahare [katkar]
Creation of cloud application using microsoft azure by vaishali sahare [katkar]Creation of cloud application using microsoft azure by vaishali sahare [katkar]
Creation of cloud application using microsoft azure by vaishali sahare [katkar]vaishalisahare123
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Selena gomez power point
Selena gomez power pointSelena gomez power point
Selena gomez power pointAzulTomas
 

En vedette (20)

Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 
Arabian literature
Arabian literatureArabian literature
Arabian literature
 
Presentacion mercy angulo
Presentacion  mercy anguloPresentacion  mercy angulo
Presentacion mercy angulo
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the Cloud
 
Cassandra queuing
Cassandra queuingCassandra queuing
Cassandra queuing
 
3. los derechos auxilares del acreedor[1]
3. los derechos auxilares del acreedor[1]3. los derechos auxilares del acreedor[1]
3. los derechos auxilares del acreedor[1]
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
 
Enterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best PracticesEnterprise Cloud Architecture Best Practices
Enterprise Cloud Architecture Best Practices
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Creation of cloud application using microsoft azure by vaishali sahare [katkar]
Creation of cloud application using microsoft azure by vaishali sahare [katkar]Creation of cloud application using microsoft azure by vaishali sahare [katkar]
Creation of cloud application using microsoft azure by vaishali sahare [katkar]
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Microservices in Clojure
Microservices in ClojureMicroservices in Clojure
Microservices in Clojure
 
Selena gomez power point
Selena gomez power pointSelena gomez power point
Selena gomez power point
 

Similaire à Cassandra scales eBay's petabytes of data serving billions daily

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoAmazon Web Services LATAM
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...DataStax Academy
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 

Similaire à Cassandra scales eBay's petabytes of data serving billions daily (20)

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Cassandra scales eBay's petabytes of data serving billions daily

  • 1. Cassandra @ eBay Jay Patel Architect, Platform Systems @pateljay3001
  • 2. eBay Marketplaces Thousands of servers Petabytes of data Billions of SQLs/day24x7x365 99.98+% Availability turning over a TBevery second Multiple Datacenters Near-Real-time Always online 400+ million items for sale $75 billion+ per year in goods are sold on eBay Big Data 112 million active users Billions of page views/day
  • 3. 3 eBay Site Data Infrastructure Don’t force! One size does not fit all. It’s a mixture of multiple SQL & NoSQL databases. We use the right database for the right problem.
  • 4. eBay Site Data Infrastructure A heterogeneous mixture Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned Hundreds of nodes Persistent & in-memory > 40 billion SQLs/day 10+ clusters, 100+ nodes > 250 TB provisioned (local HDD + shared SSD) > 9 billion writes/day > 5 billion reads/day Hundreds of nodes > 50 TB > 2 billion ops/day Thousands of nodes The world largest cluster with 2K+ nodes Dozens of nodes
  • 5. How do we scale RDBMS?  Shard – Patterns: Modulus, lookup-based, range, etc. – Application sees only logical shard/database  Replicate – Disaster recovery, read availability & read scalability  Big NOs – No transactions – No joins – No referential integrity constraints 5
  • 6. Why Cassandra?  Multi-datacenter (active-active)  Always Available - No SPOF  Easy to scale up & down 6  Write performance  Distributed counters  Hadoop support Not replacing RDBMS, but complementing!  Some use cases don’t fit well in RDBMS - sparse data, big data, flexible schema, real-time analytics, …  Many use cases don’t need top-tier set-ups.
  • 7. Cassandra Growth Aug,2011 Aug,2012 ay,2013 1 2 3 4 5 6 7 Billions (per day) writes async. reads sync. site reads Terabytes 50 100 200 250 300 350 storage capacity Doesn’t predict business 7
  • 8. eBay Use Cases on Cassandra  Time-series data, real-time insights & immediate actions • Fraud detection & prevention • Quality Click Pricing for affiliates • Order & shipment tracking and insights • Mobile notification logging & tracking • Cloud CMS change history storage • RedLaser server logs and analytics  Server metrics collection for monitoring & alerting  Taste graph based next-gen recommendation system  Personalization Data Service  Social Signals on eBay Product & Item pages  Milo’s store-item availability inventory (evaluation phase) 8
  • 9. Real-time insights & actions for 9 Fraud Prevention Reporting Quality Click Pricing More…
  • 10. 10 System Overview Business Event Stream Checkout Shipping Refund & Recoup … Order placed (bin/bid) Paid Shipped Refunded Rawdata Simple in-memory aggregations +/ Complex Event Processing +/ Cassandra’s distributed counters Label printed per day per user User segmentation for affiliate pricing Orders per hour, … Multiple Cassandra clusters Payment Actinreal-time Fraud Prevention Affiliate Pricing Engine (eBay Partner Network) Order tracking Real-time reporting … (Kept from several months to years)
  • 11. A glimpse on Data Model 11 Historic & real-time insights per user per carrier. Sudden & drastic change might be suspicious. User bucketing based on historic & real-time buying activity.
  • 12. A glimpse on Data Model 12
  • 13. Fraud Detection & Prevention 13 Shop with Confidence
  • 14. System Overview 14 Cassandra Fraud Detection & Prevention System Sign-ininfo Business events (checkout, sell,…) StaaSOracle Checkout Shipping …PaymentSelling Real-time Beacons data Real-time Insights Other data Machine Learned Models
  • 15. 15 A glimpse on Data Model Collected at sign-in & stored as key-value. Pulled periodically to StaaS for training machine learned models.
  • 16. Metrics collection for monitoring & alerting 16
  • 17. System Overview 17 Transport (HTTP, …) Scalable NIO servers based on Netty Thousands of production machines Cassandra Stats for CPU, Memory, Disk, .. … agent agent agent agent … Server Server Server Server Server In-memory grid (hazelcast) for rollups
  • 18. A glimpse on Data Model 18 Granular data points Rolled up metrics for various time intervals
  • 19. Taste graph based recommendation system 19
  • 20. Data Model 20 TasteGraph TasteVector 50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD
  • 21. System Overview 21 Business Event Stream Recommendation system Taste GraphTaste Vector 1. Item purchased. 2a. Write purchase edge. 2b. Read other edges for this user & item. 4. Req. recommendations. 5. Finds other items close to user’s coordinates. 6. Reco. shown to user More, http://www.slideshare.net/planetcassandra/e-bay-nyc
  • 22. Real-time Personalization Data Service 22 User performs search using keyword User gets personalized pages based on implicit/explicit profile
  • 23. System Overview 23 Personalization Data Service CacheMesh (write-back cache) Heavy writes eBay site pages (personalized) Every few mins in-memory MySQL & XMP DB CassandraOracle (scaled out) Heavyreads Cache miss user profiles Application SOA services (multiple) Data Warehouse
  • 24. Data Model 24 • Keep column names short. • Don’t overload one CF with all the data: - Split hot & cold data in separate CF. - Splitting & sharding can help compaction. Static column families
  • 26. Manage signals via “Your Favorites” 26 Whole page is served by Cassandra More, http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
  • 27. Multi-Datacenter Deployment 27 Topology - NTS RF - 1:1 or 2:2 or 3:3 Read CL - ONE/QUORUM Write CL - ONE Data is backed up periodically to protect against human or software error User request has no datacenter affinity Non-sticky load balancing
  • 28. Multi-Datacenter Deployment Topology - NTS RF – 1:1:1 or 2:2:2
  • 29. Lessons & Best Practices • One size does not fit all – Use Cassandra for the right use cases. • Choose proper Replication Factor and Consistency Level – They alter latency, availability, durability, consistency and cost. – Cassandra supports tunable consistency, but remember strong consistency is not free. • Many ways to model data in Cassandra – The best way depends on your use case and query patterns. • De-normalize and duplicate for read performance – But don’t de-normalize if you don’t need to. http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices 29
  • 30. Are you excited? Come Join Us! 30 Thank You @pateljay3001 #cassandra13