GumGum: Multi-Region Cassandra in AWS

CASSANDRA SUMMIT 2015CASSANDRA SUMMIT 2015
Mario Lazaro
September 24th 2015
#CassandraSummit2015
MULTI-REGION CASSANDRA IN AWSMULTI-REGION CASSANDRA IN AWS

WHOAMIWHOAMI
Mario Cerdan Lazaro
Big Data Engineer
Born and raised in Spain
Joined GumGum 18 months ago
About a year and a half experience with
Cassandra

#5 Ad Platform in the U.S
10B impressions / month
2,000 brand-safe premium
publisher partners
1B+ global unique visitors
Daily inventory Impressions
processed - 213M
Monthly Image Impressions
processed - 2.6B
123 employees in seven ofﬁces

AGENDAAGENDA
Old cluster
International Expansion
Challenges
Testing
Modus Operandi
Tips
Questions & Answers

25 Classic nodes cluster
1 Region / 1 Rack hosted in
AWS EC2 US East
Version 2.0.8
Datastax CQL driver
GumGum's metadata including
visitors, images, pages, and ad
performance
Usage: realtime data access
and analytics (MR jobs)
OLD C* CLUSTER - MARCH 2015OLD C* CLUSTER - MARCH 2015

OLD C* CLUSTER - REALTIME USE CASEOLD C* CLUSTER - REALTIME USE CASE
Billions of rows
Heavy read workload
60/40
TTLs everywhere - Tombstones
Heavy and critical use of counters
RTB - Read Latency constraints
(total execution time ~50 ms)

OLD C* CLUSTER - ANALYTICS USE CASEOLD C* CLUSTER - ANALYTICS USE CASE
Daily ETL jobs to extract / join data
from C*
Hadoop MR jobs
AdHoc queries with Presto

INTERNATIONAL EXPANSIONINTERNATIONAL EXPANSION

FIRST STEPSFIRST STEPS
Start C* test datacenters in US
East & EU West and test how C*
multi region works in AWS
Run capacity/performance tests.
We expect 3x times more trafﬁc
in 2015 Q4
FIRST THOUGHTSFIRST THOUGHTS
Use AWS Virtual Private
Cloud (VPC)
Cassandra & VPC present
some connectivity
issues challenges
Replicate entire data
with same number of replicas

TOO GOOD TO BE TRUE ...TOO GOOD TO BE TRUE ...

CHALLENGESCHALLENGES
Problems between Cassandra in EC2
Classic / VPC and Datastax Java
driver
EC2MultiRegionSnitch uses public
IPs. EC2 instances do not have an
interface with public IP address -
Cannot connect between instances in
the same region using Public IPs
/**
* Implementation of {@link AddressTranslater} used by the driver that
* translate external IPs to internal IPs.
* @author Mario <mario@gumgum.com>
*/
public class Ec2ClassicTranslater implements AddressTranslater {
private static final Logger LOGGER = LoggerFactory.getLogger(Ec2Cla
private ClusterService clusterService;
private Cluster cluster;
private List<Instance> publicDnss;
@PostConstruct
public void build() {
publicDnss = clusterService.getInstances(cluster);
}
/**
* Translates a Cassandra {@code rpc_address} to another address if
* <p>
*
* @param address the address of a node as returned by Cassandra.
* @return {@code address} translated IP address of the source.
*/
public InetSocketAddress translate(InetSocketAddress address) {
for (final Instance server : publicDnss) {
if (server.getPublicIpAddress().equals(address.getHostStrin
LOGGER.info("IP address: {} translated to {}", address.
return new InetSocketAddress(server.getPrivateIpAddress
}
}
return null;
}
public void setClusterService(ClusterService clusterService) {
this.clusterService = clusterService;
}
public void setCluster(Cluster cluster) {
this.cluster = cluster;
}
}
driver
Region to Region connectivity will use
public IPs - Trust those IPs or use
software/hardware VPN
driver
Region to Region connectivity will use
public IPs - Trust those IPs or use
software/hardware VPN
Your application needs to connect to
C* using private IPs - Custom EC2
translator

Datastax Java Driver Load Balancing
Multiple choices
DCAware + TokenAware
Multiple choices
DCAware + TokenAware + ?
Multiple choices
CHALLENGES
“ Clients in one AZ attempt to always communicate with C*
nodes in the same AZ. We call this zone-aware connections. This
feature is built into , Netﬂix’s C* Java client library.Astyanax

Zone Aware Connection:
Webapps in 3 different AZs: 1A, 1B, and 1C
C* datacenter spanning 3 AZs with 3 replicas
1A
1B
1C
1B1B

We added it! - Rack/AZ awareness to
TokenAware Policy

Third Datacenter: Analytics
Do not impact realtime data access
Spark on top of Cassandra
Spark-Cassandra Datastax connector
Replicate speciﬁc keyspaces
Less nodes with larger disk space
Settings are different
Ex: Bloom ﬁlter chance

Third Datacenter: Analytics
Cassandra Only DC
Realtime
Cassandra + Spark DC
Analytics

Upgrade from 2.0.8 to 2.1.5
Counters implementation is buggy in
pre-2.1 versions
“ My code never has bugs. It just develops random
unexpected features

“ To choose, or not to choose VNodes. That is the question.
(M. Lazaro, 1990 - 2500)
Previous DC using Classic Nodes
Works with MR jobs
Complexity for adding/removing nodes
Manual manage token ranges
New DCs will use VNodes
Apache Spark + Spark Cassandra Datastax
connector
Easy to add/remove new nodes as trafﬁc
increases

TESTINGTESTING
Testing requires creating and modifying
many C* nodes
Create and conﬁguring a C* cluster
is time-consuming / repetitive task
Create fully automated process for
creating/modifying/destroying
Cassandra clusters with Ansible
# Ansible settings for provisioning the EC2 instance
---
ec2_instance_type: r3.2xlarge
ec2_count:
- 0 # How many in us-east-1a ?
- 7 # How many in us-east-1b ?
ec2_vpc_subnet:
- undefined
- subnet-c51241b2
- undefined
- subnet-80f085d9
- subnet-f9138cd2
ec2_sg:
- va-ops
- va-cassandra-realtime-private

TESTING - PERFORMANCETESTING - PERFORMANCE
Performance tests using new
Cassandra 2.1 Stress Tool:
Recreate GumGum metadata /
schemas
Recreate workload and make
it 3 times bigger
Try to ﬁnd limits / Saturate
clients
# Keyspace Name
keyspace: stresscql
#keyspace_definition: |
# CREATE KEYSPACE stresscql WITH replication = {'class': #'
### Column Distribution Specifications ###
columnspec:
- name: visitor_id
size: gaussian(32..32) #domain names are relatively
population: uniform(1..999M) #10M possible domains to pic
- name: bidder_code
cluster: fixed(5)
- name: bluekai_category_id
- name: bidder_custom
size: fixed(32)
- name: bidder_id
size: fixed(32)
- name: bluekai_id
size: fixed(32)
- name: dt_pd
- name: rt_exp_dt
- name: rt_opt_out
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # Our partition key is the v
select: fixed(1)/5 # We have 5 bidder_code per dom
batchtype: UNLOGGED # Unlogged batches
#
# A list of queries you wish to run against the schema
#

Main worry:
Latency and replication overseas
Use LOCAL_X consistency levels in your client
Only one C* node will contact only one C* node in a
different DC for sending replicas/mutations

Main worries:
Latency

TESTING - INSTANCE TYPETESTING - INSTANCE TYPE
Test all kind of instance types. We decided to
go with r3.2xlarge machines for our cluster:
60 GB RAM
8 Cores
160GB Ephemeral SSD Storage for commit logs and
saved caches
RAID 0 over 4 SSD EBS Volumes for data
Performance / Cost and GumGum use
case makes r3.2xlarge the best option
Disclosure: I2 instance family is the best if
you can afford it

TESTING - UPGRADETESTING - UPGRADE
Upgrade C* Datacenter from 2.0.8 to 2.1.5
Both versions can cohabit in the same DC
New settings and features tried
DateTieredCompactionStrategy:
Compaction for Time Series Data
Incremental repairs
Counters new architecture

MODUS OPERANDIMODUS OPERANDI
Sum up
From: One cluster / One DC in US East
To: One cluster / Two DCs in US East and one DC
in EU West

First step:
Upgrade old cluster snitch from EC2Snitch to
EC2MultiRegionSnitch
Upgrade clients to handle it (aka translators)
Make sure your clients do not lose connection to upgraded
C* nodes (JIRA DataStax - )JAVA-809

Second step:
Upgrade old datacenter from 2.0.8 to 2.1.5
nodetool upgradesstables (multiple nodes at a time)
Not possible to rebuild a 2.1.X C* node from a 2.0.X C*
datacenter.
rebuild
WARN [Thread-12683] 2015-06-17 10:17:22,845 IncomingTcpConnection.java:91 -
UnknownColumnFamilyException reading from socket;
closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=XXX

Third step:
Start EU West and new US East DCs within the same
cluster
Replication factor in new DCs: 0
Use dc_sufﬁx to differentiate new Virginia DC from old one
Clients do not talk to new DCs. Only C* knows they exist
Replication factor to 3 on all except analytics 1
Start receiving new data
Nodetool rebuild <old-datacenter>
Old data

RF 3
Clients
RF 3:0:0:0RF 3:3:3:1
US East Realtime
EU West Realtime
US East Analytics
Rebuild
Rebuild
Rebuild

From 39d8f76d9cae11b4db405f5a002e2a4f6f764b1d Mon Sep 17 00:00:00 2001
From: mario <mario@gumgum.com>
Date: Wed, 17 Jun 2015 14:21:32 -0700
Subject: [PATCH] AT-3576 Start using new Cassandra realtime cluster
---
src/main/java/com/gumgum/cassandra/Client.java | 30 ++++------------------
.../com/gumgum/cassandra/Ec2ClassicTranslater.java | 30 ++++++++++++++--------
src/main/java/com/gumgum/cluster/Cluster.java | 3 ++-
.../resources/applicationContext-cassandra.xml | 13 ++++------
src/main/resources/dev.properties | 2 +-
src/main/resources/eu-west-1.prod.properties | 3 +++
src/main/resources/prod.properties | 3 +--
src/main/resources/us-east-1.prod.properties | 3 +++
.../CassandraAdPerformanceDaoImplTest.java | 2 --
.../asset/cassandra/CassandraImageDaoImplTest.java | 2 --
.../CassandraExactDuplicatesDaoTest.java | 2 --
.../com/gumgum/page/CassandraPageDoaImplTest.java | 2 --
.../cassandra/CassandraVisitorDaoImplTest.java | 2 --
13 files changed, 39 insertions(+), 58 deletions(-)
Start using new Cassandra DCs

RF 3:3:3:1
Clients
US East Realtime
EU West Realtime
US East Analytics
RF 0:3:3:1

Clients
US East Realtime
EU West Realtime
US East Analytics
RF 0:3:3:1
RF 3:3:1Decomission

TIPS - AUTOMATED MAINTENANCETIPS - AUTOMATED MAINTENANCE
Maintenance in a multi-region C* cluster:
Ansible + Cassandra maintenance keyspace +
email report = zero human intervention!
CREATE TABLE maintenance.history (
dc text,
op text,
ts timestamp,
ip text,
PRIMARY KEY ((dc, op), ts)
) WITH CLUSTERING ORDER BY (ts ASC) AND
bloom_filter_fp_chance=0.010000 AND
caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.000000 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX history_kscf_idx ON maintenance.history (kscf);
3-133-65:/opt/scripts/production/groovy$ groovy CassandraMaintenanceCheck.groovy -dc us-east-va-realtime -op compaction -e mari

TIPS - SPARKTIPS - SPARK
Number of workers above number of total C*
nodes in analytics
Each worker uses:
1/4 number of cores of each instance
1/3 total available RAM of each instance
Cassandra-Spark connector
SpanBy
.joinWithCassandraTable(:x, :y)
Spark.cassandra.output.batch.size.bytes
Spark.cassandra.output.concurrent.writes

val conf = new SparkConf()
.set("spark.cassandra.connection.host", cassandraNodes)
.set("spark.cassandra.connection.local_dc", "us-east-va-analytics")
.set("spark.cassandra.connection.factory", "com.gumgum.spark.bluekai.DirectLinkConnectionFactory")
.set("spark.driver.memory","4g")
.setAppName("Cassandra presidential candidates app")
TIPS - SPARKTIPS - SPARK
Create "translator" if using EC2MultiRegionSnitch
Spark.cassandra.connection.factory

SINCE C* IN EU WEST ...SINCE C* IN EU WEST ...
US West Datacenter!
EU West DC US East DC Analytics DC US West DC

Q&AQ&A
GumGum is hiring!
http://gumgum.com/careers

GumGum: Multi-Region Cassandra in AWS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à GumGum: Multi-Region Cassandra in AWS

Similaire à GumGum: Multi-Region Cassandra in AWS (20)

Plus de DataStax Academy

Plus de DataStax Academy (20)

Dernier

Dernier (20)

GumGum: Multi-Region Cassandra in AWS