SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
<Insert Picture Here>   MySQL Cluster @ PayPal




                                                      Henrique Leandro
                                           henrique.leandro@oracle.com
                                               Consultor Senior MySQL
                                                  henrique.leandro@oracle.com



                                                                      Dec-2012

Wednesday, December 12, 12
Big Data is a Big Scam (Most of the Time)

      Daniel Austin, PayPal Technical Staff


      MySQL Connect Conference Brazil
      December 04, 2012 v1.3




Wednesday, December 12, 12
Today’s Agenda


                     Big Myths about Big Data


                       YESQL: A Counterexample


                                Q&A



                                          Confidential and Proprietary   3

Wednesday, December 12, 12
THE FUNDAMENTAL PROBLEM IN
       DISTRIBUTED DATA SYSTEMS
       “How Do We Manage Reliable Distribution
       of Data Across Geographical Distances?”




                                     Confidential and Proprietary



Wednesday, December 12, 12
Big Data Myth #1: Big Data = NoSQL


     • ‘Big Data’ Refers to a Common Set of Problems
          – Large Volumes
          – High Rates of Change
             • Of Data
             • Of Data Models
             • Of Data Presentation and Output
          – Often Require ‘Fast Data’ as well as ‘Big’
             • Near-real Time Analytics
             • Mapping Complex Structures


     Takeaway: Big Data is the problem, NoSQL is one
     (proposed) solution
                                                  Confidential and Proprietary



Wednesday, December 12, 12
The NoSQL Solution


     • NoSQL Systems provide a solution that relaxes
       many of the common constraints of typical RDBMS
       systems
          – Slow - RDBMS has not scaled with CPUs
          – Often require complex data management (SOX,
            SOR)
          – Costly to build and maintain, slow to change and
            adapt
          – Intolerant of CAP models (more on this later)
     • Non-relational models, usually key-value
     • May be batched or streaming
     • Not necessarily distributed geographically

                                                Confidential and Proprietary



Wednesday, December 12, 12
3 Kinds of Big Data Systems


     1. Columnar K-V Systems
          – Hadoop, Hbase, Cassandra, PNUTs
     2. Document-Based
          – MongoDB, TerraCotta
     3. Graph-Based
          – FlockDB, Voldemort

     Takeaway: These were originally designed as
     solutions to specific problems because no
     commercial solution would work.

                                        Confidential and Proprietary



Wednesday, December 12, 12
Big Data Myth #2: The CAP Theorem Doesn’t Say
  What You Think It Does


     • Consistency, Availability, (Network) Partition
     • The Real Story: These are not Independent
       Variables
     • AP =CP (Um, what? But…A != C )
     • Variations:
          – PACELC (adds latency tolerance)


     Takeaway: the real story here is about the tradeoffs
     made by designers of different systems, and the
     main tradeoff is between consistency and
     availability, usually in favor of the latter.

                                              Confidential and Proprietary



Wednesday, December 12, 12
Big Data Hype Cycle: Where Are We Now?

     There are currently more than 120+ NoSQL
     databases listed at nosql-databases.com!



                                        You Are Here ?




             As the pace of new technology solutions has slowed, some clear winners have emerged.


                                                                            Confidential and Proprietary



Wednesday, December 12, 12
BIG DATA MYTH #3: BIG DATA AND NOSQL ARE
       NEW IDEAS

       •    The first and most successful
            such system is DNS, created in
            1983.
       •    Began with flat files
       •    Currently serves the entire
            Internet (!)
       •    DNS is an AP system, availability
            is #1
       •    Many extensions complicate a
            simple design
       •    Suggests a new term for CAP-like
            ideas: variability
             • DNS variability is very high,
                often 2-3x the mean


                                                Confidential and Proprietary



Wednesday, December 12, 12
Big Data Myth: You Need A Big Data System


     Well, Maybe….But Before You Go There…
     There are essentially two ‘Big Data Problems’:
     “I have too much data and it’s coming in too fast to
     handle with any RDBMS.”

     “I have a lot of data distributed geographically and
     need to be able to read and write from anywhere in
     near real-time.”

     Takeaway: if you have one of these Big Data
     problems, a NoSQL solution might work for you.
     But there are also other alternatives…
                                            Confidential and Proprietary



Wednesday, December 12, 12
Today’s Agenda


                        Big Myths About Big Data


                   YESQL: A Counterexample


                                 Q&A



                                            Confidential and Proprietary   12

Wednesday, December 12, 12
Mission YESQL

     “Develop a globally distributed DB For
     user-related data.”
     • Must Not Fail (99.999%)
     • Must Not Lose Data. Period.
     • Must Support Transactions
     • Must Support (some) SQL
     • Must WriteRead 32-bit integer globally in
       1000ms
     • Maximum Data Volume: 100 TB
     • Must Scale Linearly with Costs
                                      Confidential and Proprietary



Wednesday, December 12, 12
What about “High Performance”?

       • Maximum lightspeed distance on Earth’s
         Surface: ~67 ms
       • Target: data available worldwide in < 1000 ms

                             Sound Easy?

                             Think Again!




                                            Confidential and Proprietary



Wednesday, December 12, 12
WHY MYSQL CLUSTER?
                     Pro                       Con
       •    True HA by design     •   Some semantic
             – Fast recovery          limitations on fields
       •    Supports (some) X-    •   Size constraints (2
            actions                   TB?)
       •    Relational Model          Hardware limits also
       •    In-memory             •   Higher cost/byte
            architecture = high   •   Requires reasonable
            performance               data partitioning
       •    Disk storage for      •   Higher complexity
            non-indexed data
             (since 5.1)
        •   APIs, APIs, APIs
                                              Confidential and Proprietary



Wednesday, December 12, 12
How MySQL Cluster Works in One Slide




           Graphics courtesy dev.mysql.com

                                             Confidential and Proprietary



Wednesday, December 12, 12
Confidential and Proprietary   17

Wednesday, December 12, 12
CIRCULAR REPLICATION/FAILOVER




       Graphics courtesy O’Reilly OnLamp.com


                                               Confidential and Proprietary



Wednesday, December 12, 12
•   Click to edit Master text styles




                                                Confidential and Proprietary   19

Wednesday, December 12, 12
AWS Meets MySQL Cluster

     • Why AWS?
          – Cheap and easy infrastructure-in-a-box
             (Or so I thought! Ha!)
     • Services Used:
          – EC2 (Centos 5.3, small instances for mgm
            & query nodes, XL for data
          – Elastic IPs/ELB
          – EBS Volumes
          – S3
          – Cloudwatch

                                          Confidential and Proprietary



Wednesday, December 12, 12
ARCHITECTURAL TILES
                                                AWS Availability Zones
       Tiling Rules
                                       A                        B
       • Never separate NDB & SQL
       • Ndb:2-SQL:1-MGM:1
       • Scale by adding more tiles
       • Failover 1st to nearest AZ
       • Then to nearest DC
       • At least 1 replica/AZ
                                       C                 ELB
       • Don’t share nodes
       • Mgmt nodes are redundant
       Limitations                         Unused (not present in all locations)
       • AWS is network-bound @ 250
          MBPS – ouch!
       • Need specific ACL across AZ
          boundaries
       • AZs not uniform!
       • No GSLB                             NDB               MGM                 SQL




                                                        Confidential and Proprietary



Wednesday, December 12, 12
Architecture Stack

          Scale by Tiling




                             A           B               A   B       A        B
                                 A           B
                                     A           B




                                                 A   B           A   B




                                                                          5 AWS Data Centers:
                                                                         US-E, US-W, TK, EU, AS

                                                                             Confidential and Proprietary



Wednesday, December 12, 12
SYSTEM READ/WRITE PERFORMANCE (!)

       What we tested:
       • 32 & 256 byte char fields                                            In-region replication tests
       • Reads, writes, query speed vs. volume
       • Data replication speeds
       Results:
       • Global replication < 350 ms
       • 256 byte read < 1000 ms worldwide




                             06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011

                                                                        Confidential and Proprietary



Wednesday, December 12, 12
Data Models and Query Optimization for NDB


     • Network Latency is an obvious issue
     • Data model requires all segments present
       in each geo-region
     • Parameterized (Linked) Joins
          – SPJ technique from Clustra (see Clement
            Frazer’s blog for details)




                                         Confidential and Proprietary



Wednesday, December 12, 12
Hard Lessons, Shared

     • Be Careful…
        – With “Eventual Consistency”-related concepts
        – ACID, CAP are not really as well-defined as we’d like
          considering how often we invoke them
     • MySQL Cluster is a good solution
        – Real HA, real SQL
        – Notable limitations around fields, datatypes
        – Successfully competes with NoSQL systems for most use
          cases – better in many cases
     • NoSQL Systems
        – All have relatively low levels of maturity
        – More suitable for simpler key-value models
        – Victim of Tech Fashion


                                                Confidential and Proprietary



Wednesday, December 12, 12
Confidential and Proprietary   26

Wednesday, December 12, 12
Confidential and Proprietary   27

Wednesday, December 12, 12
MySQL Enterprise Edition




                             28
Wednesday, December 12, 12
MySQL Enterprise Edition

        Mais produtividade, menores riscos e maior capacidade
        para o MySQL.


                                             Oracle Premier
                                            Lifetime Support
                             MySQL Enterprise                 Oracle Product
                                 Security                Certifications/Integrations
                    MySQL Enterprise                                MySQL Enterprise
                             Audit                               Monitor/Query Analyzer
              MySQL Enterprise                                           MySQL Enterprise
                   Scalability                                                 Backup
          MySQL Enterprise
                                                                            MySQL Workbench
            High Availability



                                                    28
Wednesday, December 12, 12
Future Directions

     • Alternate solution using Pacemaker,
       Heartbeat
          – Uses InnoDB, not NDB
     • Implement Memcached plugin
          – To test NoSQL functionality from MySQL
     • Add simple connection-based persistence to
       preserve connections during failover
     • Better data node distribution
     • Better testing & monitoring


                                      Confidential and Proprietary



Wednesday, December 12, 12
Summing Up On YESQL v0.85


     • It works! Far better than expected.
     • Very fast, very reliable
     • Reduced complexity since v0.7
     • AWS poses challenges that private data
       centers may not experience
     • You can achieve high performance and
       availability without giving up relational
       models and read consistency!


                                       Confidential and Proprietary



Wednesday, December 12, 12
The Big Picture on Big Data

     • Only use Big Data solutions when you have a real
       Big Data problem.
          – Don’t be a Dedicated Follower of Tech Fashion!
     • Not all Big Data solutions are created equal
          – What tradeoffs are most important to you?
          – Consistency, Fault Tolerance, Availability,
            Performance, Variability
     • Is your data model a fit for NoSQL?
          – You don’t have to give up the relational model in
            most cases, so don’t!
     • You can achieve high performance and availability
       without giving up relational models and read
       consistency! Just say YESQL!
                                                 Confidential and Proprietary



Wednesday, December 12, 12
“In the long run, we are all dead
      eventually consistent.”
      Maynard Keynes on NoSQL Databases


      Twitter: @daniel_b_austin
      Emai: daaustin@paypal.com




      With apologies and thanks to the real DB experts, Andrew Goodman, Yves
      Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who
      contributed. It really works!




Wednesday, December 12, 12
<Insert Picture Here>   Obrigado




                                                   Henrique Leandro
                                        henrique.leandro@oracle.com
                                            Consultor Senior MySQL
                                               henrique.leandro@oracle.com



                                                                   Dec-2012

Wednesday, December 12, 12

Contenu connexe

Tendances

Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSatya Pal
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseLaine Campbell
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101Bernd Ocklin
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Zohar Elkayam
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-ConceptsBhaskar Gunda
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 

Tendances (20)

Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explained
 
NoSQL
NoSQLNoSQL
NoSQL
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 

Similaire à MySQL Cluster at PayPal Optimizes Global Data Distribution

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageDebasish Ghosh
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Daniel Austin
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMongoDB
 
The Rules of Scalable database
The Rules of Scalable databaseThe Rules of Scalable database
The Rules of Scalable databaseDahui Feng
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptChris Richardson
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
A Morning with MongoDB Barcelona: Introduction
A Morning with MongoDB Barcelona: IntroductionA Morning with MongoDB Barcelona: Introduction
A Morning with MongoDB Barcelona: IntroductionMongoDB
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBMongoDB
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Mike King
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyScyllaDB
 
1. Lecture1_NOSQL_Introduction.pdf
1. Lecture1_NOSQL_Introduction.pdf1. Lecture1_NOSQL_Introduction.pdf
1. Lecture1_NOSQL_Introduction.pdfShaimaaMohamedGalal
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 

Similaire à MySQL Cluster at PayPal Optimizes Global Data Distribution (20)

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new age
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
The Rules of Scalable database
The Rules of Scalable databaseThe Rules of Scalable database
The Rules of Scalable database
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
A Morning with MongoDB Barcelona: Introduction
A Morning with MongoDB Barcelona: IntroductionA Morning with MongoDB Barcelona: Introduction
A Morning with MongoDB Barcelona: Introduction
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
 
1. Lecture1_NOSQL_Introduction.pdf
1. Lecture1_NOSQL_Introduction.pdf1. Lecture1_NOSQL_Introduction.pdf
1. Lecture1_NOSQL_Introduction.pdf
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
No sql
No sqlNo sql
No sql
 

Plus de MySQL Brasil

MySQL como Document Store PHP Conference 2017
MySQL como Document Store PHP Conference 2017MySQL como Document Store PHP Conference 2017
MySQL como Document Store PHP Conference 2017MySQL Brasil
 
MySQL no Paypal Tesla e Uber
MySQL no Paypal Tesla e UberMySQL no Paypal Tesla e Uber
MySQL no Paypal Tesla e UberMySQL Brasil
 
Alta disponibilidade com MySQL Enterprise
Alta disponibilidade com MySQL EnterpriseAlta disponibilidade com MySQL Enterprise
Alta disponibilidade com MySQL EnterpriseMySQL Brasil
 
MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Brasil
 
Segurança no MySQL
Segurança no MySQLSegurança no MySQL
Segurança no MySQLMySQL Brasil
 
5 razões estratégicas para usar MySQL
5 razões estratégicas para usar MySQL5 razões estratégicas para usar MySQL
5 razões estratégicas para usar MySQLMySQL Brasil
 
Alta disponibilidade no MySQL 5.7 GUOB 2016
Alta disponibilidade no MySQL 5.7 GUOB 2016Alta disponibilidade no MySQL 5.7 GUOB 2016
Alta disponibilidade no MySQL 5.7 GUOB 2016MySQL Brasil
 
MySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL Brasil
 
Enabling digital transformation with MySQL
Enabling digital transformation with MySQLEnabling digital transformation with MySQL
Enabling digital transformation with MySQLMySQL Brasil
 
Alta Disponibilidade no MySQL 5.7 para aplicações em PHP
Alta Disponibilidade no MySQL 5.7 para aplicações em PHPAlta Disponibilidade no MySQL 5.7 para aplicações em PHP
Alta Disponibilidade no MySQL 5.7 para aplicações em PHPMySQL Brasil
 
Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7MySQL Brasil
 
NoSQL no MySQL 5.7
NoSQL no MySQL 5.7NoSQL no MySQL 5.7
NoSQL no MySQL 5.7MySQL Brasil
 
10 Razões para Usar MySQL em Startups
10 Razões para Usar MySQL em Startups10 Razões para Usar MySQL em Startups
10 Razões para Usar MySQL em StartupsMySQL Brasil
 
Novidades do MySQL para desenvolvedores ago15
Novidades do MySQL para desenvolvedores ago15Novidades do MySQL para desenvolvedores ago15
Novidades do MySQL para desenvolvedores ago15MySQL Brasil
 
Estratégias de Segurança e Gerenciamento para MySQL
Estratégias de Segurança e Gerenciamento para MySQLEstratégias de Segurança e Gerenciamento para MySQL
Estratégias de Segurança e Gerenciamento para MySQLMySQL Brasil
 
Novidades do Universo MySQL julho-15
Novidades do Universo MySQL julho-15Novidades do Universo MySQL julho-15
Novidades do Universo MySQL julho-15MySQL Brasil
 
Serviços Escaláveis e de Alta Performance com MySQL e Java
Serviços Escaláveis e de Alta Performance com MySQL e JavaServiços Escaláveis e de Alta Performance com MySQL e Java
Serviços Escaláveis e de Alta Performance com MySQL e JavaMySQL Brasil
 
MySQL The State of the Dolphin - jun15
MySQL The State of the Dolphin - jun15MySQL The State of the Dolphin - jun15
MySQL The State of the Dolphin - jun15MySQL Brasil
 

Plus de MySQL Brasil (20)

MySQL como Document Store PHP Conference 2017
MySQL como Document Store PHP Conference 2017MySQL como Document Store PHP Conference 2017
MySQL como Document Store PHP Conference 2017
 
MySQL no Paypal Tesla e Uber
MySQL no Paypal Tesla e UberMySQL no Paypal Tesla e Uber
MySQL no Paypal Tesla e Uber
 
MySQL 8.0.1 DMR
MySQL 8.0.1 DMRMySQL 8.0.1 DMR
MySQL 8.0.1 DMR
 
Alta disponibilidade com MySQL Enterprise
Alta disponibilidade com MySQL EnterpriseAlta disponibilidade com MySQL Enterprise
Alta disponibilidade com MySQL Enterprise
 
MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17MySQL Roadmap NoSQL HA Fev17
MySQL Roadmap NoSQL HA Fev17
 
Segurança no MySQL
Segurança no MySQLSegurança no MySQL
Segurança no MySQL
 
5 razões estratégicas para usar MySQL
5 razões estratégicas para usar MySQL5 razões estratégicas para usar MySQL
5 razões estratégicas para usar MySQL
 
Alta disponibilidade no MySQL 5.7 GUOB 2016
Alta disponibilidade no MySQL 5.7 GUOB 2016Alta disponibilidade no MySQL 5.7 GUOB 2016
Alta disponibilidade no MySQL 5.7 GUOB 2016
 
MySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL 5.7 como Document Store
MySQL 5.7 como Document Store
 
Enabling digital transformation with MySQL
Enabling digital transformation with MySQLEnabling digital transformation with MySQL
Enabling digital transformation with MySQL
 
Alta Disponibilidade no MySQL 5.7 para aplicações em PHP
Alta Disponibilidade no MySQL 5.7 para aplicações em PHPAlta Disponibilidade no MySQL 5.7 para aplicações em PHP
Alta Disponibilidade no MySQL 5.7 para aplicações em PHP
 
Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7
 
NoSQL no MySQL 5.7
NoSQL no MySQL 5.7NoSQL no MySQL 5.7
NoSQL no MySQL 5.7
 
OpenStack & MySQL
OpenStack & MySQLOpenStack & MySQL
OpenStack & MySQL
 
10 Razões para Usar MySQL em Startups
10 Razões para Usar MySQL em Startups10 Razões para Usar MySQL em Startups
10 Razões para Usar MySQL em Startups
 
Novidades do MySQL para desenvolvedores ago15
Novidades do MySQL para desenvolvedores ago15Novidades do MySQL para desenvolvedores ago15
Novidades do MySQL para desenvolvedores ago15
 
Estratégias de Segurança e Gerenciamento para MySQL
Estratégias de Segurança e Gerenciamento para MySQLEstratégias de Segurança e Gerenciamento para MySQL
Estratégias de Segurança e Gerenciamento para MySQL
 
Novidades do Universo MySQL julho-15
Novidades do Universo MySQL julho-15Novidades do Universo MySQL julho-15
Novidades do Universo MySQL julho-15
 
Serviços Escaláveis e de Alta Performance com MySQL e Java
Serviços Escaláveis e de Alta Performance com MySQL e JavaServiços Escaláveis e de Alta Performance com MySQL e Java
Serviços Escaláveis e de Alta Performance com MySQL e Java
 
MySQL The State of the Dolphin - jun15
MySQL The State of the Dolphin - jun15MySQL The State of the Dolphin - jun15
MySQL The State of the Dolphin - jun15
 

MySQL Cluster at PayPal Optimizes Global Data Distribution

  • 1. <Insert Picture Here> MySQL Cluster @ PayPal Henrique Leandro henrique.leandro@oracle.com Consultor Senior MySQL henrique.leandro@oracle.com Dec-2012 Wednesday, December 12, 12
  • 2. Big Data is a Big Scam (Most of the Time) Daniel Austin, PayPal Technical Staff MySQL Connect Conference Brazil December 04, 2012 v1.3 Wednesday, December 12, 12
  • 3. Today’s Agenda Big Myths about Big Data YESQL: A Counterexample Q&A Confidential and Proprietary 3 Wednesday, December 12, 12
  • 4. THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?” Confidential and Proprietary Wednesday, December 12, 12
  • 5. Big Data Myth #1: Big Data = NoSQL • ‘Big Data’ Refers to a Common Set of Problems – Large Volumes – High Rates of Change • Of Data • Of Data Models • Of Data Presentation and Output – Often Require ‘Fast Data’ as well as ‘Big’ • Near-real Time Analytics • Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution Confidential and Proprietary Wednesday, December 12, 12
  • 6. The NoSQL Solution • NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems – Slow - RDBMS has not scaled with CPUs – Often require complex data management (SOX, SOR) – Costly to build and maintain, slow to change and adapt – Intolerant of CAP models (more on this later) • Non-relational models, usually key-value • May be batched or streaming • Not necessarily distributed geographically Confidential and Proprietary Wednesday, December 12, 12
  • 7. 3 Kinds of Big Data Systems 1. Columnar K-V Systems – Hadoop, Hbase, Cassandra, PNUTs 2. Document-Based – MongoDB, TerraCotta 3. Graph-Based – FlockDB, Voldemort Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work. Confidential and Proprietary Wednesday, December 12, 12
  • 8. Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does • Consistency, Availability, (Network) Partition • The Real Story: These are not Independent Variables • AP =CP (Um, what? But…A != C ) • Variations: – PACELC (adds latency tolerance) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter. Confidential and Proprietary Wednesday, December 12, 12
  • 9. Big Data Hype Cycle: Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-databases.com! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged. Confidential and Proprietary Wednesday, December 12, 12
  • 10. BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS • The first and most successful such system is DNS, created in 1983. • Began with flat files • Currently serves the entire Internet (!) • DNS is an AP system, availability is #1 • Many extensions complicate a simple design • Suggests a new term for CAP-like ideas: variability • DNS variability is very high, often 2-3x the mean Confidential and Proprietary Wednesday, December 12, 12
  • 11. Big Data Myth: You Need A Big Data System Well, Maybe….But Before You Go There… There are essentially two ‘Big Data Problems’: “I have too much data and it’s coming in too fast to handle with any RDBMS.” “I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.” Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you. But there are also other alternatives… Confidential and Proprietary Wednesday, December 12, 12
  • 12. Today’s Agenda Big Myths About Big Data YESQL: A Counterexample Q&A Confidential and Proprietary 12 Wednesday, December 12, 12
  • 13. Mission YESQL “Develop a globally distributed DB For user-related data.” • Must Not Fail (99.999%) • Must Not Lose Data. Period. • Must Support Transactions • Must Support (some) SQL • Must WriteRead 32-bit integer globally in 1000ms • Maximum Data Volume: 100 TB • Must Scale Linearly with Costs Confidential and Proprietary Wednesday, December 12, 12
  • 14. What about “High Performance”? • Maximum lightspeed distance on Earth’s Surface: ~67 ms • Target: data available worldwide in < 1000 ms Sound Easy? Think Again! Confidential and Proprietary Wednesday, December 12, 12
  • 15. WHY MYSQL CLUSTER? Pro Con • True HA by design • Some semantic – Fast recovery limitations on fields • Supports (some) X- • Size constraints (2 actions TB?) • Relational Model Hardware limits also • In-memory • Higher cost/byte architecture = high • Requires reasonable performance data partitioning • Disk storage for • Higher complexity non-indexed data (since 5.1) • APIs, APIs, APIs Confidential and Proprietary Wednesday, December 12, 12
  • 16. How MySQL Cluster Works in One Slide Graphics courtesy dev.mysql.com Confidential and Proprietary Wednesday, December 12, 12
  • 17. Confidential and Proprietary 17 Wednesday, December 12, 12
  • 18. CIRCULAR REPLICATION/FAILOVER Graphics courtesy O’Reilly OnLamp.com Confidential and Proprietary Wednesday, December 12, 12
  • 19. Click to edit Master text styles Confidential and Proprietary 19 Wednesday, December 12, 12
  • 20. AWS Meets MySQL Cluster • Why AWS? – Cheap and easy infrastructure-in-a-box (Or so I thought! Ha!) • Services Used: – EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data – Elastic IPs/ELB – EBS Volumes – S3 – Cloudwatch Confidential and Proprietary Wednesday, December 12, 12
  • 21. ARCHITECTURAL TILES AWS Availability Zones Tiling Rules A B • Never separate NDB & SQL • Ndb:2-SQL:1-MGM:1 • Scale by adding more tiles • Failover 1st to nearest AZ • Then to nearest DC • At least 1 replica/AZ C ELB • Don’t share nodes • Mgmt nodes are redundant Limitations Unused (not present in all locations) • AWS is network-bound @ 250 MBPS – ouch! • Need specific ACL across AZ boundaries • AZs not uniform! • No GSLB NDB MGM SQL Confidential and Proprietary Wednesday, December 12, 12
  • 22. Architecture Stack Scale by Tiling A B A B A B A B A B A B A B 5 AWS Data Centers: US-E, US-W, TK, EU, AS Confidential and Proprietary Wednesday, December 12, 12
  • 23. SYSTEM READ/WRITE PERFORMANCE (!) What we tested: • 32 & 256 byte char fields In-region replication tests • Reads, writes, query speed vs. volume • Data replication speeds Results: • Global replication < 350 ms • 256 byte read < 1000 ms worldwide 06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011 Confidential and Proprietary Wednesday, December 12, 12
  • 24. Data Models and Query Optimization for NDB • Network Latency is an obvious issue • Data model requires all segments present in each geo-region • Parameterized (Linked) Joins – SPJ technique from Clustra (see Clement Frazer’s blog for details) Confidential and Proprietary Wednesday, December 12, 12
  • 25. Hard Lessons, Shared • Be Careful… – With “Eventual Consistency”-related concepts – ACID, CAP are not really as well-defined as we’d like considering how often we invoke them • MySQL Cluster is a good solution – Real HA, real SQL – Notable limitations around fields, datatypes – Successfully competes with NoSQL systems for most use cases – better in many cases • NoSQL Systems – All have relatively low levels of maturity – More suitable for simpler key-value models – Victim of Tech Fashion Confidential and Proprietary Wednesday, December 12, 12
  • 26. Confidential and Proprietary 26 Wednesday, December 12, 12
  • 27. Confidential and Proprietary 27 Wednesday, December 12, 12
  • 28. MySQL Enterprise Edition 28 Wednesday, December 12, 12
  • 29. MySQL Enterprise Edition Mais produtividade, menores riscos e maior capacidade para o MySQL. Oracle Premier Lifetime Support MySQL Enterprise Oracle Product Security Certifications/Integrations MySQL Enterprise MySQL Enterprise Audit Monitor/Query Analyzer MySQL Enterprise MySQL Enterprise Scalability Backup MySQL Enterprise MySQL Workbench High Availability 28 Wednesday, December 12, 12
  • 30. Future Directions • Alternate solution using Pacemaker, Heartbeat – Uses InnoDB, not NDB • Implement Memcached plugin – To test NoSQL functionality from MySQL • Add simple connection-based persistence to preserve connections during failover • Better data node distribution • Better testing & monitoring Confidential and Proprietary Wednesday, December 12, 12
  • 31. Summing Up On YESQL v0.85 • It works! Far better than expected. • Very fast, very reliable • Reduced complexity since v0.7 • AWS poses challenges that private data centers may not experience • You can achieve high performance and availability without giving up relational models and read consistency! Confidential and Proprietary Wednesday, December 12, 12
  • 32. The Big Picture on Big Data • Only use Big Data solutions when you have a real Big Data problem. – Don’t be a Dedicated Follower of Tech Fashion! • Not all Big Data solutions are created equal – What tradeoffs are most important to you? – Consistency, Fault Tolerance, Availability, Performance, Variability • Is your data model a fit for NoSQL? – You don’t have to give up the relational model in most cases, so don’t! • You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL! Confidential and Proprietary Wednesday, December 12, 12
  • 33. “In the long run, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases Twitter: @daniel_b_austin Emai: daaustin@paypal.com With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works! Wednesday, December 12, 12
  • 34. <Insert Picture Here> Obrigado Henrique Leandro henrique.leandro@oracle.com Consultor Senior MySQL henrique.leandro@oracle.com Dec-2012 Wednesday, December 12, 12