MySQL Cluster at PayPal Optimizes Global Data Distribution

<Insert Picture Here> MySQL Cluster @ PayPal

Henrique Leandro
henrique.leandro@oracle.com
Consultor Senior MySQL

Dec-2012

Wednesday, December 12, 12

Big Data is a Big Scam (Most of the Time)

Daniel Austin, PayPal Technical Staff

MySQL Connect Conference Brazil
December 04, 2012 v1.3


Today’s Agenda

Big Myths about Big Data

YESQL: A Counterexample

Q&A

Confidential and Proprietary 3


THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS
“How Do We Manage Reliable Distribution
of Data Across Geographical Distances?”

Confidential and Proprietary


Big Data Myth #1: Big Data = NoSQL

• ‘Big Data’ Refers to a Common Set of Problems
– Large Volumes
– High Rates of Change
• Of Data
• Of Data Models
• Of Data Presentation and Output
– Often Require ‘Fast Data’ as well as ‘Big’
• Near-real Time Analytics
• Mapping Complex Structures

Takeaway: Big Data is the problem, NoSQL is one
(proposed) solution


The NoSQL Solution

• NoSQL Systems provide a solution that relaxes
many of the common constraints of typical RDBMS
systems
– Slow - RDBMS has not scaled with CPUs
– Often require complex data management (SOX,
SOR)
– Costly to build and maintain, slow to change and
adapt
– Intolerant of CAP models (more on this later)
• Non-relational models, usually key-value
• May be batched or streaming
• Not necessarily distributed geographically



3 Kinds of Big Data Systems

1. Columnar K-V Systems
– Hadoop, Hbase, Cassandra, PNUTs
2. Document-Based
– MongoDB, TerraCotta
3. Graph-Based
– FlockDB, Voldemort

Takeaway: These were originally designed as
solutions to specific problems because no
commercial solution would work.



Big Data Myth #2: The CAP Theorem Doesn’t Say
What You Think It Does

• Consistency, Availability, (Network) Partition
• The Real Story: These are not Independent
Variables
• AP =CP (Um, what? But…A != C )
• Variations:
– PACELC (adds latency tolerance)

Takeaway: the real story here is about the tradeoffs
made by designers of different systems, and the
main tradeoff is between consistency and
availability, usually in favor of the latter.



Big Data Hype Cycle: Where Are We Now?

There are currently more than 120+ NoSQL
databases listed at nosql-databases.com!

You Are Here ?

As the pace of new technology solutions has slowed, some clear winners have emerged.



BIG DATA MYTH #3: BIG DATA AND NOSQL ARE
NEW IDEAS

• The first and most successful
such system is DNS, created in
1983.
• Began with flat files
• Currently serves the entire
Internet (!)
• DNS is an AP system, availability
is #1
• Many extensions complicate a
simple design
• Suggests a new term for CAP-like
ideas: variability
• DNS variability is very high,
often 2-3x the mean



Big Data Myth: You Need A Big Data System

Well, Maybe….But Before You Go There…
There are essentially two ‘Big Data Problems’:
“I have too much data and it’s coming in too fast to
handle with any RDBMS.”

“I have a lot of data distributed geographically and
need to be able to read and write from anywhere in
near real-time.”

Takeaway: if you have one of these Big Data
problems, a NoSQL solution might work for you.
But there are also other alternatives…


Today’s Agenda

Big Myths About Big Data

YESQL: A Counterexample

Q&A



Mission YESQL

“Develop a globally distributed DB For
user-related data.”
• Must Not Fail (99.999%)
• Must Not Lose Data. Period.
• Must Support Transactions
• Must Support (some) SQL
• Must WriteRead 32-bit integer globally in
1000ms
• Maximum Data Volume: 100 TB
• Must Scale Linearly with Costs


What about “High Performance”?

• Maximum lightspeed distance on Earth’s
Surface: ~67 ms
• Target: data available worldwide in < 1000 ms

Sound Easy?

Think Again!



WHY MYSQL CLUSTER?
Pro Con
• True HA by design • Some semantic
– Fast recovery limitations on fields
• Supports (some) X- • Size constraints (2
actions TB?)
• Relational Model Hardware limits also
• In-memory • Higher cost/byte
architecture = high • Requires reasonable
performance data partitioning
• Disk storage for • Higher complexity
non-indexed data
(since 5.1)
• APIs, APIs, APIs


How MySQL Cluster Works in One Slide

Graphics courtesy dev.mysql.com



CIRCULAR REPLICATION/FAILOVER

Graphics courtesy O’Reilly OnLamp.com



• Click to edit Master text styles



AWS Meets MySQL Cluster

• Why AWS?
– Cheap and easy infrastructure-in-a-box
(Or so I thought! Ha!)
• Services Used:
– EC2 (Centos 5.3, small instances for mgm
& query nodes, XL for data
– Elastic IPs/ELB
– EBS Volumes
– S3
– Cloudwatch



ARCHITECTURAL TILES
AWS Availability Zones
Tiling Rules
A B
• Never separate NDB & SQL
• Ndb:2-SQL:1-MGM:1
• Scale by adding more tiles
• Failover 1st to nearest AZ
• Then to nearest DC
• At least 1 replica/AZ
C ELB
• Don’t share nodes
• Mgmt nodes are redundant
Limitations Unused (not present in all locations)
• AWS is network-bound @ 250
MBPS – ouch!
• Need specific ACL across AZ
boundaries
• AZs not uniform!
• No GSLB NDB MGM SQL



Architecture Stack

Scale by Tiling

A B A B A B
A B
A B

A B A B

5 AWS Data Centers:
US-E, US-W, TK, EU, AS



SYSTEM READ/WRITE PERFORMANCE (!)

What we tested:
• 32 & 256 byte char fields In-region replication tests
• Reads, writes, query speed vs. volume
• Data replication speeds
Results:
• Global replication < 350 ms
• 256 byte read < 1000 ms worldwide

06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011



Data Models and Query Optimization for NDB

• Network Latency is an obvious issue
• Data model requires all segments present
in each geo-region
• Parameterized (Linked) Joins
– SPJ technique from Clustra (see Clement
Frazer’s blog for details)



Hard Lessons, Shared

• Be Careful…
– With “Eventual Consistency”-related concepts
– ACID, CAP are not really as well-defined as we’d like
considering how often we invoke them
• MySQL Cluster is a good solution
– Real HA, real SQL
– Notable limitations around fields, datatypes
– Successfully competes with NoSQL systems for most use
cases – better in many cases
• NoSQL Systems
– All have relatively low levels of maturity
– More suitable for simpler key-value models
– Victim of Tech Fashion



MySQL Enterprise Edition

28

MySQL Enterprise Edition

Mais produtividade, menores riscos e maior capacidade
para o MySQL.

Oracle Premier
Lifetime Support
MySQL Enterprise Oracle Product
Security Certifications/Integrations
MySQL Enterprise MySQL Enterprise
Audit Monitor/Query Analyzer
MySQL Enterprise MySQL Enterprise
Scalability Backup
MySQL Enterprise
MySQL Workbench
High Availability

28

Future Directions

• Alternate solution using Pacemaker,
Heartbeat
– Uses InnoDB, not NDB
• Implement Memcached plugin
– To test NoSQL functionality from MySQL
• Add simple connection-based persistence to
preserve connections during failover
• Better data node distribution
• Better testing & monitoring



Summing Up On YESQL v0.85

• It works! Far better than expected.
• Very fast, very reliable
• Reduced complexity since v0.7
• AWS poses challenges that private data
centers may not experience
• You can achieve high performance and
availability without giving up relational
models and read consistency!



The Big Picture on Big Data

• Only use Big Data solutions when you have a real
Big Data problem.
– Don’t be a Dedicated Follower of Tech Fashion!
• Not all Big Data solutions are created equal
– What tradeoffs are most important to you?
– Consistency, Fault Tolerance, Availability,
Performance, Variability
• Is your data model a fit for NoSQL?
– You don’t have to give up the relational model in
most cases, so don’t!
• You can achieve high performance and availability
without giving up relational models and read
consistency! Just say YESQL!


“In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases

Twitter: @daniel_b_austin
Emai: daaustin@paypal.com

With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who
contributed. It really works!


<Insert Picture Here> Obrigado

Henrique Leandro
Consultor Senior MySQL

Dec-2012


MySQL Cluster at PayPal Optimizes Global Data Distribution

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à MySQL Cluster at PayPal Optimizes Global Data Distribution

Similaire à MySQL Cluster at PayPal Optimizes Global Data Distribution (20)

Plus de MySQL Brasil

Plus de MySQL Brasil (20)

MySQL Cluster at PayPal Optimizes Global Data Distribution