More Related Content
Similar to Oracle my sql cluster cge
Similar to Oracle my sql cluster cge (20)
Oracle my sql cluster cge
- 1. MySQL Cluster Carrier Grade Edition
Alexander Yu
Principal Sales Consultant | MySQL Asia Pacific & Japan
2011-07-20
- 2. Agenda / Topics
• Oracle MySQL Strategy
• MySQL Server
Pluggable Storage Engine Architecture
• High Availability Solutions
• MySQL Cluster Carrier Grade
– Internals
– Geographical Replication
– Scale Out
– Backup & Restore
•Q&A
© 2011 Oracle Corporation 2
- 3. About MySQL
• Founded, first release in 1995
• MySQL Acquired by Sun Microsystems Feb 2008
• Oracle Acquires Sun Microsystems Jan 2010
• +12M Product Installations
• 65K+ Downloads Per Day
• Part of the rapidly growing open source LAMP stack
Customers across every major operating system, hardware
vendor, geography, industry, and application type
High Performance ▪ Reliable ▪ Easy to Use
© 2011 Oracle Corporation 3
- 4. Oracle’s Strategy:
Complete. Open. Integrated.
• Built together
• Tested together
• Managed together
• Serviced together
• Based on open standards
• Lower cost
• Lower risk
• More reliable
© 2011 Oracle Corporation 4
- 5. Complete. Open. Integrated.
MySQL Completes The Stack
• Oracle never settles for being second
best at any level of the stack
• “Complete” means we meet most
customer requirements at every level
That’s why MySQL matters to
Oracle and Oracle customers
© 2011 Oracle Corporation 5
- 6. The “M” in the LAMP Stack
Operating
System L
Application
Server A
Database M
Scripting P
For© 2011 Oracle Only -- Oracle Confidential & Proprietary
Internal Use Corporation 6
- 7. Investment in MySQL
Rapid Innovation
• Make MySQL a Better MySQL
• #1 Open Source Database for Web Applications
• Most Complete LAMP Stack
• Telecom & Embedded
• Develop, Promote and Support MySQL
• Improve engineering, consulting and support
• Leverage 24x7, World-Class Oracle Support
• MySQL Community Edition
• Source and binary releases
• GPL license
© 2011 Oracle Corporation 7
- 8. Oracle + MySQL Customers
• Product Integration
• Oracle GoldenGate (Complete!)
• Oracle Enterprise Linux + Oracle VM (Complete!)
HA Template Available
• Oracle Secure Backup (CY 2011)
• Oracle Audit Vault (CY 2011)
• Oracle Enterprise Manager (CY 2011)
• Support
• Leverage 24x7, World-Class Oracle Support
• MyOracle Support
© 2011 Oracle Corporation 8
- 9. Serving Key Markets and Industry Leaders
Powering Data Management on the Web & in the Network
Web OEM / ISV’s
SaaS, Hosting Telecommunications Enterprise 2.0
© 2011 Oracle Corporation 9
- 10. MySQL in Communications
http://www.mysql.com/industry/communications/resources.html#customer_case_studies
© 2011 Oracle Corporation 10
- 11. MySQL Server
Pluggable Storage Engine Architecture
© 2011 Oracle Corporation 11
- 12. Pluggable Storage Engine Architecture
MySQL Server
Connectors
Clients and Apps Native C API, JDBC, ODBC, .Net, PHP, Ruby, Python, VB, Perl
Enterprise Management
Services and Utilities Connection Pool
Backup & Recovery Authentication – Thread Reuse – Connection Limits – Check Memory – Caches
Security
Replication
Cluster
Partitioning SQL Interface Parser Caches
Optimizer
Instance Manager DDL, DML, Stored Query Translation, Global and Engine
Access Paths, Statistics
Information_Schema Procedures, Views, Object Privileges Specific Caches and
MySQL Workbench Triggers, Etc.. Buffers
Pluggable Storage Engines
Memory, Index and Storage Management
InnoDB MyISAM Cluster Etc… Partners Community More..
Filesystems, Files and Logs
Redo, Undo, Data, Index, Binary, Error, Query and Slow
© 2011 Oracle Corporation 12
- 13. MySQL Cluster Architecture
Shared-nothing distributed database with no SPOF: JDBC (Java)
High Read & Write Performance & 99.999% uptime NDB API (C++)
ClusterJ (Java)
Clients OpenJPA (Java)
PHP/P*/ODBC
OpenLDAP
MySQL Cluster Application Nodes
SQL Nodes
ClusterJ MGM Client
NDB API (C++) MGM API (C)
NDB API
NDB API
MGM Node MGM Node
MySQL Cluster Data Nodes
© 2011 Oracle Corporation 13
- 14. Workload Qualification InnoDB vs MySQL Cluster
Workload InnoDB MySQL Cluster
No. Unless mainly
Packaged Applications (i.e. standard business applications) Yes
PK access
Custom Applications Yes Yes
OLTP Applications Yes Yes
DSS Applications (i.e. Data Marts, Analytics, etc.) Yes No
Content Management Yes Limited Support
In-Network Telecoms Applications (HLR, HSS, SDP, etc) No Yes
Web Session Management Yes Yes
User Profile Management & AAA Yes Yes
eCommerce Databases Yes Yes
© 2011 Oracle Corporation 14
- 15. Feature Comparison InnoDB vs MySQL Cluster
Feature Qualification InnoDB MySQL Cluster
Latest MySQL 5.5 & InnoDB 1.1 Performance Enhancements Yes No
Storage Limits 64TB 2TB (a)
Foreign Keys Yes No
MVCC Non-Blocking Reads Yes No
Optimized for Complex Multi-Table JOINs with Thousands of Accesses Yes No (b)
Hash Indexes No Yes
Compressed Data Yes No
Support for 8KB+ Row Sizes Yes Only via BLOBs ( c )
Built-in Clustering Support for 99.999% HA No Yes
Minimum Number of Physical Hosts for Redundancy 2 (Active / Passive) 2 + 1 ( A/A & Mgmt) (d)
Time to Recovery After Node Failure 30s - hours Sub-Second
Real-Time Performance No Yes
Option for In-Memory Storage of Tables with Disk Persistence No Yes
Non-SQL Access Methods to Data (i.e. NDB API) No Yes
Write Scalability without Application Partitioning No Yes (e)
Max Number of Nodes for Parallel Write Performance 1 48 (f)
Conflict Resolution & Detection across Multiple Replication Masters No Yes
Virtualization Support Yes No
© 2011 Oracle Corporation 15
- 16. Storage Engines
Feature MyISAM NDB Archive InnoDB Memory
Storage limits No Yes No 64TB Yes
Transactions No Yes No Yes No
Locking granularity Table Row Row Row Table
MVCC snapshot read No No No Yes No
Geospatial support Yes No Yes Yes No
Data caches No Yes No Yes NA
Index caches Yes Yes No Yes NA
Compressed data Yes No Yes No No
Storage cost (relative to other engines) Small Med Small Med NA
Memory cost (relative to other engines) Low High Low High High
Bulk insert speed High High Highest Med High
Replication support Yes Yes Yes Yes Yes
Foreign Key support No No No Yes No
Built-in Cluster/High-availability support No Yes No No No
Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …”
© 2011 Oracle Corporation 16
- 17. Why Users Adopt MySQL Cluster
MySQL Already in Use
High Read/Write 99.999% MySQL
Throughput
Real Time Performance Scale-Out, On-Demand
© 2011 Oracle Corporation 17
- 18. Why Users Buy MySQL Cluster CGE
Standardized on Open Source
Blend of Web & Deploying Mission Critical Applications
Telecoms Capabilities HA MySQL
Management & Monitoring
Global 24x7 support Tools
Embedding MySQL Cluster
Real-Time, High Read/
Write Performance Scale-Out, Shared
Nothing
© 2011 Oracle Corporation 18
- 21. Mapping HA Architecture to Applications
Shared-Nothing,
Data Clustered /
Applications Geo-Replicated
Replication Virtualized
Cluster
E-Commerce / Trading
Session Management
User Authentication / Accounting
Feeds, Blogs, Wikis
Data Refinery
OLTP
Data Warehouse/BI
Content Management
CRM / SCM
Collaboration
Packaged Software
Telco Apps (HLR/HSS/SDP…)
© 2011 Oracle Corporation 21
- 22. MySQL High Availability Solutions
9 5. 0 0 0 % • MySQL Replication
9 9. 0 0 0 % • MySQL Replication with Clustering Software
9 9. 9 0 0 % • DRBD with Clustering Software
9 9. 9 0 0 % • Shared Storage with Clustering Software (A/P - A/A)
9 9. 9 9 0 % • DRBD and Replication with Clustering Software
9 9. 9 9 0 % • Shared Storage and Replication with Clustering SW
9 9. 9 9 0 % • Shared Storage Replication
9 9. 9 9 0 % • Virtualised Environment
9 9. 9 9 9 % • MySQL Cluster
9 9. 9 9 9 % • MySQL Cluster & Replication
9 9. 9 9 9 % • MySQL Cluster Carrier Grade Edition
© 2011 Oracle Corporation 22
- 23. MySQL Replication
• Native in MySQL
• Used for Scalability and HA
• Asynchronous as standard
• Semi-Synchronous support
added in MySQL 5.5
• Each slave adds minimal
load on master
Relay Log
© 2011 Oracle Corporation 22
- 24. Replication Topologies
Single Chain Circular
Multiple Multi - Master Multi - Circular
© 2011 Oracle Corporation 24
- 25. MySQL Replication
Read Scalability
Clients
MySQL Replication
Slaves Master
• Used by leading web properties for scale-out
• Reads are directed to slaves, writes to master
• Delivers higher performance & scale with efficient resource utilization
© 2011 Oracle Corporation 22
- 26. MySQL Replication
Failure Detection & Failover
• Linux Heartbeat implements heartbeat protocol between nodes
• Failover initiated by Cluster Resource Manager (Pacemaker) if heartbeat message is not
received
• Virtual IP address failed over to ensure failover is transparent to apps
© 2011 Oracle Corporation 22
- 27. Shared Disk Clusters
A/P - A/A
READS/WRITES
Applications
VIP
Shared
Storage
• Reliability • High Availability
- Commonly used solution - Data handled by a SAN or NAS
and always available
• Fault Tolerance - Automatic fail-over
- No single point of failure with
appropriate hardware • Simplified Management
© 2011 Oracle Corporation 27
- 28. Distributed Replicated Block Device
• DRBD creates transaction-safe hot standby configuration
• MySQL updates written to block device on the Active Server
• DRBD synchronously replicates updates to the Passive Server
• Linux Heartbeat fails over from Active to Passive in event of failure
© 2011 Oracle Corporation 28
- 29. Sharding aka Application Partitioning
Master
Clients
Slave
Reads
Writes
Partitioning Logic
1 2 3 4 5
Shards
Slaves
© 2011 Oracle Corporation 29
- 30. Oracle VM Template for MySQL
Integrated & Tested OS, VM and Database Stack
Oracle VM Oracle VM Oracle VM
Fastest, simplest & most reliable
way to deploy virtualized, cloud-
ready MySQL instances, certified Oracle VM Oracle VM
for production use
• Rapid DEPLOYMENT Oracle VM Server Pool
• Increased RELIABILITY
• Higher AVAILABILITY Oracle VM Servers
• Lower COST
© 2011 Oracle Corporation 30
- 31. Template Components
Certified for Production Deployment
Oracle VM Oracle VM
Automatic Fault
Detection &
Recovery
• Oracle Linux 5 Update 6 with the
Unbreakable Enterprise Kernel
• Oracle VM 2.2.1 Secure Live Migration
(SSL)
• Oracle VM Manager 2.1.5
Oracle VM Server Pool
• Oracle Cluster File System 2 (OCFS2) Oracle VM
Manager
• MySQL Database 5.5.10 (Enterprise Edition)
Oracle VM Servers
Pre-Installed & Pre-Configured
ocfs2
Full Integration & QA Testing
SAN / iSCSI
Single Point of Support
© 2011 Oracle Corporation 31
- 32. Positioning Current Solutions
Requirement MySQL Replication Heartbeat + DRBD Oracle VM Template MySQL Cluster
Availability
All Supported by MySQL All Supported by MySQL
Platform Support Linux Oracle Linux
Server Cluster
Depends on Connector and
Automated IP Failover No Yes Yes
Configuration
Automated Database
No Yes Yes Yes
Failover
Automatic Data
No Yes N/A - Shared Storage Yes
Resynchronization
Configuration Dependent, 60 Configuration Dependent, 60
Typical Failover Time User / Script Dependent 1 Second and Less
seconds and Above seconds and Above
No, Asynchronous and Semi-
Synchronous Replication Yes N/A - Shared Storage Yes
Synchronous
Geographic Redundancy
Yes Yes, via MySQL Replication Yes, via MySQL Replication Yes, via MySQL Replication
Support
Scalability
One Active (primary), one One Active (primary), one
Number of Nodes One Master, Multiple Slaves 255
Passive (secondary) Node Passive (secondary) Node
Reads, via MySQL Reads, via MySQL Reads, via MySQL
Built-in Load Balancing Yes, Reads and Writes
Replication Replication Replication & During Failover
Read-Intensive Workloads Yes Yes Yes Yes
Yes, via Application-Level Yes, via Application-Level
Yes, via Application-Level
Write-Intensive Workloads Sharding to Multiple Active/ Sharding to Multiple Active/ Yes, via Auto-Sharding
Sharding
Passive Pairs Passive Pairs
Scale On-Line (add nodes,
No No No Yes
repartition, etc.)
© 2011 Oracle Corporation 32
- 33. MySQL Cluster
Real-time Carrier Grade Database
© 2011 Oracle Corporation 33
- 34. Customers & Applications
• Web
– User profile management
– Session stores
– eCommerce
– On-Line Gaming
– Application Servers
• Telecoms
– Subscriber Databases (HLR/HSS)
– Service Delivery Platforms
– VoIP, IPTV & VoD
– Mobile Content Delivery
– On-Line app stores and portals
– IP Management
– Payment Gateways
http://www.mysql.com/industry/telecom/
© 2011 Oracle Corporation 34
- 36. MySQL Cluster Architecture
Shared-nothing distributed database with no SPOF: JDBC (Java)
High Read & Write Performance & 99.999% uptime NDB API (C++)
ClusterJ (Java)
Clients OpenJPA (Java)
PHP/P*/ODBC
OpenLDAP
MySQL Cluster Application Nodes
SQL Nodes
ClusterJ MGM Client
NDB API (C++) MGM API (C)
NDB API
NDB API
MGM Node MGM Node
MySQL Cluster Data Nodes
© 2011 Oracle Corporation 36
- 37. MySQL Cluster Nodes
SQL Based Applications
JDBC/ODBC
MySQL/ API API
API Node Management
SQL Node Node Node
Client
NDB API
Data
MySQL Cluster Data MGM API
Node Node
Management
Node
NDB API
Data Data
Node Node
© 2011 Oracle Corporation 37
- 38. MySQL Cluster Nodes
• Standard SQL Interface
SQL Node • Scale-out for Performance
(MySQL)
• Enables Replication
• High Performance
NDB API • C, C++ & Java, LDAP, HTTP API
(Application) • Developer’s Guide
• Data Storage (Memory/Disk)
Data Node • Automatic & User-Defined Partitioning
• Local & Global Checkpoints
(NDB Storage Engine)
• Scale-out or scale-up for Capacity & Redundancy
• Scale dynamically with on-line add node
• Administration and Configuration
Management • Arbitration
• Use Two for Redundancy
Node
© 2011 Oracle Corporation 38
- 39. Replication Flexibility
• Synchronous replication within a
Cluster node group for HA
• Bi-Direction asynchronous
Cluster 1 Cluster 2 replication to remote Cluster for
geographic redundancy
• Asynchronous replication to non-
Cluster databases for specialised
activities such as report
generation
• Mix and match replication types
MyISAM MyISAM InnoDB
Synchronous replication
Asynchronous replication
© 2011 Oracle Corporation 39
- 40. MySQL Cluster Loads
MySQL MySQL MySQL
Community Cluster Cluster
Server (GPL) CGE
• MySQL Cluster software (Management &
MySQL MySQL MySQL
Server ≠ Server = Server Data Nodes) included with MySQL
Community Server should not be used
InnoDB
≠ InnoDB = InnoDB
• MySQL Server included with MySQL
≠ Data Node = Data Node
Cluster loads is different to regular
≠ Mgmt Node = Mgmt Node
MySQL Server
• Always use this special version of MySQL Server when
accessing MySQL Cluster data
• MySQL Cluster CGE downloaded from
oem.mysql.com
• GA GPL Community versions downloaded
from www.mysql.com/downloads
• In-development GPL Community versions
downloaded from dev.mysql.com/
downloads/
© 2011 Oracle Corporation 40
- 41. MySQL Cluster System Requirements
System Component Requirement
Hosts Maximum of 255 total nodes (48 Data Nodes)
COTS – Advanced TCA
Hardware
32 & 64-bit x86 & SPARC
Memory Varies on size of database, # of hosts, # of replicas
Shared-Nothing - Memory & Disk Data
Storage
SCSI or RAID for I/O performance
Network >1 Gigabit recommended, SCI supported
Linux (Red Hat, SuSE), Solaris, HP-UX, Mac OSX,
Operating System
Windows, others…
© 2011 Oracle Corporation 41
- 43. MySQL Cluster 6.3
http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-changes-5-1-ndb-6-3.html
© 2011 Oracle Corporation 43
- 44. MySQL Cluster 7.0 –GA April 2009
http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php
© 2011 Oracle Corporation 44
- 45. Scale out – multi core environments
© 2011 Oracle Corporation 45
- 46. MySQL Cluster vs MySQL MEMORY:
30x Higher Throughput / 1/3rd the Latency on a single node
• Table level locking inhibits MEMORY scalability beyond a single client
connection
• Check-pointing & logging enabled, MySQL Cluster still delivers durability
• 4 socket server, 64GB RAM, running Linux
© 2011 Oracle Corporation 46
- 47. Scale-Out Reads & Writes on Commodity Hardware
• NDB API Performance 4.33 M
Queries per second!
• 8 Intel servers, dual-6-core CPUs
@2.93 GHz, 24GB RAM
• 2 Data Nodes per server
• flexAsync benchmark
– 16 parallel threads, each issuing 256
simultaneous transactions
– Read / Write 100KB attribute
• Interim results from 2 days testing –
watch this space:
mikaelronstrom.blogspot.com
© 2011 Oracle Corporation 47
- 48. MySQL Cluster CGE 7.1 – Key Enhancements
http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php
© 2011 Oracle Corporation 48
- 49. MySQL Cluster 7.1 Momentum
1,000 Downloads per Day
Windows GA
Pro-active Cluster
Monitoring
Fully Automated “MySQL Cluster 7.1 gave us the
Management perfect combination of extreme levels
of transaction throughput, low
10x Higher Java latency & carrier-grade availability,
Performance while reducing TCO”
Phani Naik, Pyro Group
© 2011 Oracle Corporation 49
- 50. MySQL Cluster 7.1: ndbinfo
mysql> use ndbinfo • New database (ndbinfo) which
mysql> show tables; presents real-time metric data
+-------------------+ in the form of tables
| Tables_in_ndbinfo |
+-------------------+ • Exposes new information
| blocks | together with providing a
| config_params | simpler, more consistent way to
| counters | access existing data
| logbuffers |
| logspaces |
• Examples include:
| memoryusage | • Resource usage (memory, buffers)
| nodes | • Event counters (such as number of
| resources | READ operations since last restart)
| transporters | • Data node status and connection
+-------------------+ status
© 2011 Oracle Corporation 50
- 51. MySQL Cluster 7.1: ndbinfo
• Example 1: Check memory usage/availability
mysql> select * from ndbinfo.memoryusage;
+---------+--------------+--------+------------+-----------+-------------+
| node_id | memory_type | used | used_pages | total | total_pages |
+---------+--------------+--------+------------+-----------+-------------+
| 3 | Data memory | 917504 | 28 | 104857600 | 3200 |
| 3 | Index memory | 221184 | 27 | 11010048 | 1344 |
| 4 | Data memory | 917504 | 28 | 104857600 | 3200 |
| 4 | Index memory | 221184 | 27 | 11010048 | 1344 |
+---------+--------------+--------+------------+-----------+-------------+
• Note that there is a DATA_MEMORY and INDEX_MEMORY row for each data node in the
cluster
• If the Cluster is nearing the configured limit then increase the DataMemory and/or
IndexMemory parameters in config.ini and then perform a rolling restart
© 2011 Oracle Corporation 51
- 52. MySQL Cluster 7.1: ndbinfo
• Example 2: Check how many table scans performed on each data node since the last restart
mysql> select node_id as 'data node', val as 'Table Scans' from ndbinfo.counters
where counter_name='TABLE_SCANS';
+-----------+-------------+
| data node | Table Scans |
+-----------+-------------+
| 3 | 3 |
| 4 | 4 |
+-----------+-------------+
• You might check this if your database performance is lower than anticipated
• If this figure is rising faster than you expected then examine your application to understand
why there are so many table scans
© 2011 Oracle Corporation 52
- 53. Latest news on MySQL Cluster 7.1
• As of MySQL Cluster 7.1.9a:
• InnoDB plugin included
• New view in ndbinfo:
mysql> SELECT node_id, page_requests_direct_return AS hit, page_requests_wait_io AS miss,
100*page_requests_direct_return/(page_requests_direct_return+page_requests_wait_io) AS
hit_rate FROM ndbinfo.diskpagebuffer;
+---------+------+------+----------+
| node_id | hit | miss | hit_rate |
+---------+------+------+----------+
| 3 | 6 | 3 | 66.6667 |
| 4 | 10 | 3 | 76.9231 |
+---------+------+------+----------+
• MEM2.3 includes new Cluster Advisor/graphs
© 2011 Oracle Corporation 53
- 55. Online Operations
• Scale the cluster for throughput or capacity
– Data and SQL Nodes
• Repartition tables
• Recover failed nodes
• Upgrade / patch servers & OS
• Upgrade / patch MySQL Cluster
• Back-Up
• Evolve the schema on-line, in real-time
© 2011 Oracle Corporation 55
- 56. Real-Time, On-Line Schema Changes
CREATE OFFLINE INDEX b ON t1(b);
• Fully online – transaction response
Query OK, 1356 rows affected (2.20 sec)
times unchanged
• Add and remove indexes, add new
columns and tables DROP OFFLINE INDEX b ON t1;
• No temporary table creation Query OK, 1356 rows affected (2.03 sec)
• No recreation of data or deletion
required CREATE ONLINE INDEX b ON t1(b);
• Faster and better performing table Query OK, 0 rows affected (0.58 sec)
maintenance operations
• Less memory and disk requirements DROP ONLINE INDEX b ON t1;
Query OK, 0 rows affected (0.46 sec)
ALTER ONLINE TABLE t1 ADD COLUMN d INT;
Query OK, 0 rows affected (0.36 sec)
© 2011 Oracle Corporation 56
- 57. Performance I Flexibility I Simplification
• SQL and NoSQL Access Methods to tables
– SQL: complex queries, rich ecosystem of apps & expertise
– Simple Key/Value interfaces bypassing SQL layer for blazing fast reads & writes
– Real-time interfaces for micro-second latency
– Developers free to work in their preferred environment
© 2011 Oracle Corporation 57
- 58. Scaling Distributed Joins 7.2DM
Adaptive Query Localization
• ‘Complex’ joins traditionally slower in MySQL Cluster
– Complex = lots of levels and interim results in JOIN
• JOIN was implemented in the MySQL Server:
– Nested Loop join
– When data is needed, it must be fetched over the
mysqld network from the Data Nodes; row by row
– This causes latency and consumes resources
• Can now push the execution down into the data
Data Nodes
nodes, greatly reducing the network trips
AQL • 25x-40x performance gain in customer PoC!
mysqld
Data Nodes The existence, content and timing of future releases described here is included for
information only and may be changed at Oracles discretion.
http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html
© 2011 Oracle Corporation 58
- 59. Adaptive Query Localization: Current Limitations
• Columns to be joined
– must use exactly the same data type
– cannot be any of the BLOB or TEXT types
– columns to be joined must be part of a table index or primary key
• AQL can be disabled using the ndb_join_pushdown server system variable
– enabled by default
© 2011 Oracle Corporation 59
- 60. •<Insert Picture Here>
Early Adopter Speaks!
“Testing of Adaptive Query Localization has yielded over 20x
higher performance on complex queries within our application,
enabling Docudesk to expand our use of MySQL Cluster into a
broader range of highly dynamic web services.”
Casey Brown
Manager, Development & DBA Services, Docudesk
© 2011 Oracle Corporation 60
- 61. MySQL Cluster: SQL & NoSQL Combined
Mix & Match!
Same data accessed
simultaneously through
SQL & NoSQL interfaces
• NoSQL – Multiple ways to bypass SQL, and maximize performance:
• NDB API. C++ for highest performance, lowest latency
• Cluster/J for optimized access in Java
• NEW! Memcached. Use all your existing clients/applications
© 2011 Oracle Corporation 61
- 64. NoSQL With NDB API
Best possible performance
Clients • Application embeds the NDB API C++
interface library
• NDB API make intelligent decision (where
possible) about which data node to send queries
to
Applications with embedded NDB API
Library – With a little planning in the schema design, achieve linear
scalability
• Used by all of the other application nodes
(MySQL, LDAP, ClusterJ,…)
• Best possible performance but requires >
development skill
• Favourite API for real-time network
applications
• Foundation for all interfaces
MySQL Cluster Data Nodes
© 2011 Oracle Corporation 64
- 65. NoSQL with memcached 7.2DM
• Memcached is a distributed memory based
hash-key/value store with no persistence to disk
Memcached protocol • NoSQL, simple API, popular with developers
• MySQL Cluster already provides scalable, in-
memory performance with NoSQL (hashed)
access as well as persistence
• Provide the Memcached API but map to NDB API calls
• Writes-in-place, so no need to invalidate cache
• Simplifies architecture as caching & database
integrated into 1 tier
• Access data from existing relational tables
© 2011 Oracle Corporation 65
- 66. NoSQL with Memcached 7.2DM
Pre-GA version available from labs.mysql.com
Flexible: Simple:
• Deployment options set maidenhead 0 0 3
SL6
• Multiple Clusters
STORED
• Simultaneous SQL Access
• Can still cache in Memcached server get maidenhead
• Flat key-value store or map to multiple tables/ VALUE maidenhead 0 3
SL6
columns
END
© 2011 Oracle Corporation 66
- 67. MySQL Cluster Manager 1.1 Features
Delivered as part of MySQL Cluster CGE 7.1
© 2011 Oracle Corporation 67
- 68. How Does MySQL Cluster Manager Help ?
Example: Initiating upgrade from MySQL Cluster 6.3 to
7.1
Before MySQL Cluster Manager With MySQL Cluster Manager
•1 x preliminary check of cluster state upgrade cluster --package=7.1 mycluster;
•8 x ssh commands per server
•8 x per-process stop commands
•4 x scp of configuration files (2 x mgmd & 2 x Total: 1 Command -
mysqld) Unattended Operation
•8 x per-process start commands
•8 x checks for started and re-joined processes • Results
•8 x process completion verifications
• Reduces the overhead and complexity of
•1 x verify completion of the whole cluster.
managing database clusters
•Excludes manual editing of each configuration file.
• Reduces the risk of downtime resulting from
Total: 46 commands - administrator error
2.5 hours of attended operation • Automates best practices in database
cluster management
© 2011 Oracle Corporation 68
- 69. Terms used by MySQL Cluster Manager
• Site: the set of physical hosts which are to run
Cluster processes to be managed by MySQL
Cluster Manager. A site can include 1 or more
Site clusters.
Host Host Host Host • Cluster: represents a MySQL Cluster
deployment. A Cluster contains 1 or more
Cluster
processes running on 1 or more hosts
• Host: Physical machine, running the MySQL
Process
Process
Process
Process
Process
Process
Process
Cluster Manager agent
Cluster • Agent: The MySQL Cluster Manager process
running on each host
Process
Process
Process
• Process: an individual MySQL Cluster node;
one of: ndb_mgmd, ndbd, ndbmtd, mysqld &
agent agent agent agent ndbapi*
• Package: A copy of a MySQL Cluster installation
directory as downloaded from mysql.com, stored
on each host
*ndbapi is a special case, representing a slot for an external application
process to connect to the cluster using the NDB API
© 2011 Oracle Corporation 69
- 70. Example configuration
mysql
client
• MySQL Cluster Manager agent runs on
each physical host
7. mysqld 8. mysqld
• No central process for Cluster Manager –
1. ndb_mgmd 2. ndb_mgmd
agents co-operate, each one responsible
agent agent for its local nodes
• Agents are responsible for managing all
192.168.0.10 192.168.0.11
nodes in the cluster
3. ndbd 4. ndbd
• Management responsibilities
• Starting, stopping & restarting nodes
5. ndbd 6. ndbd
• Configuration changes
agent agent
• Upgrades
192.168.0.12 192.168.0.13 • Host & Node status reporting
• Recovering failed nodes
n. mysqld MySQL Server (ID=n)
n. ndb_mgmd Management Node (ID=n)
n. ndbd Data Node (ID=n)
agent MySQL Cluster Manager agent
© 2011 Oracle Corporation 70
- 71. Creating & Starting a Cluster
mysql 1.Define the site:
client
Mysql> create site --hosts=192.168.0.10,192.168.0.11,
-> 192.168.0.12,192.168.0.13 mysite;
2.Expand the MySQL Cluster tar-ball(s) from
mysql.com to known directory
7. mysqld 8. mysqld
3.Define the package(s):
1. ndb_mgmd 2. ndb_mgmd Mysql> add package --basedir=/usr/local/mysql_6_3_26 6.3;
Mysql> add package --basedir=/usr/local/mysql_7_0_7 7.0;
agent agent
Note that the basedir should match the directory used
in Step 2.
192.168.0.10 192.168.0.11
4.Create the Cluster
Mysql> create cluster --package=6.3
3. ndbd 4. ndbd -> --processhosts=ndb_mgmd@192.168.0.10,ndb_mgmd@192.168.0.11,
-> ndbd@192.168.0.12,ndbd@192.168.0.13, ndbd@192.168.0.12,
-> ndbd@192.168.0.13,mysqld@192.168.9.10,mysqld@192.168.9.11
5. ndbd 6. ndbd -> mycluster;
agent agent This is where you define what nodes/processes make
up the Cluster and where they should run
192.168.0.12 192.168.0.13 5.Start the Cluster:
Mysql> start cluster mycluster;
© 2011 Oracle Corporation 71
- 72. Upgrade Cluster
mysql
client
• Upgrade from MySQL Cluster 6.3.26 to 7.0.7:
7. mysqld 8. mysqld
mysql> upgrade cluster --package=7.0 mycluster;
1. ndb_mgmd 2. ndb_mgmd
agent agent
• Automatically upgrades each node and restarts
the process – in the correct order to avoid any
loss of service
192.168.0.10 192.168.0.11
• Without MySQL Cluster Manager, the
3. ndbd 4. ndbd administrator must stop each process in turn,
start the process with the new version and wait
5. ndbd 6. ndbd
for the node to restart before moving onto the
agent agent next one
192.168.0.12 192.168.0.13
© 2011 Oracle Corporation 72
- 73. MySQL Cluster Manager
GA 1st November 2010
Mgmt Mgmt Mgmt Mgmt
33 mysqld
Node 34 mysqld
Node 33 mysqld mysqld
Node 34 mysqld mysqld
Node
Data Data Data Data Data Data
31 Node 32 Node 31 Node 32 Node 35 Node 36 Node
• On-line add-node
mysql> add hosts --hosts=192.168.0.35,192.168.0.36 mysite;
mysql> add package --basedir=/usr/local/mysql_7_0_7 –
hosts=192.168.0.35,192.168.0.36 7.0;
mysql> add process --
processhosts=mysqld@192.168.0.33,mysqld@192.168.0.34,ndbd@192.1
68.0.35,ndbd@192.168.0.36 mycluster;
mysql> start process --added mycluster;
• Restart optimizations
• Fewer nodes restarted on some parameter changes
© 2011 Oracle Corporation 73
- 74. General Design Considerations
• MySQL Cluster is designed for
– Short transactions
– Many parallel transactions
• Utilize Simple access patterns to fetch data
– Use efficient scans and batching interfaces
• Analyze what your most typical use cases are
– optimize for those
Overall design goal
Minimize network roundtrips for your
most important requests!
© 2011 Oracle Corporation 74
- 75. Best Practice : Primary Keys
• To avoid problems with
• Cluster 2 Cluster replication
• Recovery
• Application behavior (KEY NOT FOUND.. etc)
• ALWAYS DEFINE A PRIMARY KEY ON THE TABLE!
• A hidden PRIMARY KEY is added if no PK is specified. BUT..
• .. NOT recommended
• The hidden primary key is for example not replicated (between Clusters)!!
• There are problems in this area, so avoid the problems!
• So always, at least have
id BIGINT AUTO_INCREMENT PRIMARY KEY
• Even if you don't “need” it for you applications
© 2011 Oracle Corporation 75
- 76. Best Practice: Distribution Aware Apps
SELECT SUM(population) FROM towns
WHERE country=“UK”;
• Partition selected using hash on
Partition Key
Partition Key
• Primary Key by default
Primary Key
• User can override in table definition
town country population
Maidenhead UK 78000 • MySQL Server (or NDB API) will
Paris France 2193031 attempt to send transaction to the
Boston UK 58124 correct data node
Boston USA 617594 • If all data for the transaction are in the
same partition, less messaging -> faster
SELECT SUM(population) FROM towns
WHERE town=“Boston”;
• Aim to have all rows for high-running
queries in same partition
Partition Key
Primary Key
town country population
Maidenhead UK 78000
Paris France 2193031
Boston UK 58124
Boston USA 617594
© 2011 Oracle Corporation 76
- 77. Best Practice: Distribution Aware – Multiple Tables
Partition Key
Primary Key
sub_id age gender • Extend partition awareness over
19724 25 male multiple tables
84539 43 female
• Same rule – aim to have all data for
19724 16 female
instance of high running transactions
74574 21 female
in the same partition
Partition Key
Primary Key
ALTER TABLE service_ids
service sub_id svc_id
PARTITION BY KEY(sub_id);
twitter 19724 76325732
twitter 84539 67324782
facebook 19724 83753984
facebook 73642 87324793
© 2011 Oracle Corporation 77
- 79. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
- Node groups are created automatically F4 F2
- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 79
- 80. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
Px Partition
P1
Data Node 2
P2
P3 Data Node 3
P4
Data Node 4
A fragment is a copy of a partition (aka fragment replica)
Number of fragments = # of partitions * # of replicas
© 2011 Oracle Corporation 80
- 81. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1
Px Partition
P1
Data Node 2
P2
P3 Data Node 3
P4
Fx Primary Fragment Data Node 4
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 81
- 82. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1
Px Partition
P1
Data Node 2
P2 F1
P3 Data Node 3
P4
Fx Primary Fragment Data Node 4
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 82
- 83. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
P4
Fx Primary Fragment Data Node 4
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 83
- 84. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
P4
Fx Primary Fragment Data Node 4
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 84
- 85. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2
P4
Fx Primary Fragment Data Node 4
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 85
- 86. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2
P4
Fx Primary Fragment Data Node 4
F2
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 86
- 87. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2
P4
Fx Primary Fragment Data Node 4
F4 F2
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 87
- 88. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Fx Primary Fragment Data Node 4
F4 F2
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 88
- 89. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Fx Primary Fragment Data Node 4
F4 F2
Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 89
- 90. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
- Node groups are created automatically F4 F2
- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica)
© 2011 Oracle Corporation 90
- 91. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
As long as one data node in each node
group is running we have a complete F4 F2
Fx Secondary Fragment (fragment replica)
copy of the data
© 2011 Oracle Corporation 91
- 92. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
As long as one data node in each node
group is running we have a complete F4 F2
Fx Secondary Fragment (fragment replica)
copy of the data
© 2011 Oracle Corporation 92
- 93. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
As long as one data node in each node
group F4 F2
Fx Secondary Fragment (fragment replica)
is running we have a complete copy of
the data
© 2011 Oracle Corporation 93
- 94. Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1
F1 F3
Px Partition
P1 Node Group 1
Data Node 2
P2 F3 F1
P3 Data Node 3
F2 F4
P4
Node Group 2
Fx Primary Fragment Data Node 4
F4 F2
- No complete copy of the data Fx Secondary Fragment (fragment replica)
- Cluster shutdowns automatically
© 2011 Oracle Corporation 94
- 95. Data Partitioning
• Automatic distribution/partitioning
– Primary Key hash value (partitioning by Key)
• Transparent load balancing
– Distribution awareness
• Data Node chosen based on PK hash value
– Or proximity (SQL Node - shared memory, localhost, remote host)
• Support for user defined partitioning 4 Partitions * 2 Replicas = 8 Fragments
• Key Concepts Table T1 Data Node 1
F1 F3
– Partition Px Partition
• Horizontal P1 Node Group 1
Data Node 2
• # of partitions = # of data nodes P2 F3 F1
– Fragment
P3
• Copy of a partition Data Node 3
F2 F4
– Replica P4
• Complete copy of the data Fx Primary Fragment Data Node 4
Node Group 2
– Node Group - Node groups are created automatically
- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica)
F4 F2
• Groups data nodes (automatically)
• Determined by the order in configuration file
• # of groups = # of data nodes / # of replicas
© 2011 Oracle Corporation 95
- 96. Internal Replication
• Replication between Data Nodes
• Synchronous Replication
– To ensure minimal failover time
– Data Nodes have the same information at the same point in time
– Achieved by Two-phase commit protocol
• Two-phase commit
– 1. Prepare/update phase
• All fragments (primary/secondary) gets updated
– 2. Commit phase
• The changes are committed
– Every Data Node has Transaction Coordinator
– One is elected to be the transaction coordinator
– The information goes from the Transaction Coordinator (TC) to primary fragments
and further to secondary fragments
© 2011 Oracle Corporation 96
- 97. Internal Replication: Prepare Phase
Data Node insert into T1 values (...) Data Node
1
Transaction Coordinator Transaction Coordinator
4 2
Local Query Handler 3 Local Query Handler
ACC TUP 1. Calc hash on PK ACC TUP
2. Forward request to LQH
Index F1 F2 where primary fragment is Index F2 F1
Memory 3. Prepare secondary fragment Memory
Data Memory Data Memory
4. Prepare phase done
© 2011 Oracle Corporation 97
- 98. Internal Replication: Commit Phase
Data Node insert into T1 values (...) Data Node
4
Transaction Coordinator Transaction Coordinator
1 3
Local Query Handler 2 Local Query Handler
ACC TUP ACC TUP
Index F1 F2 Index F2 F1
Memory Memory
Data Memory Data Memory
© 2011 Oracle Corporation 98
- 99. Transactions
• Transaction Coordinator
– The elected TC starts the transaction
– TC calculates a hash on the primary key
– Each transaction contains one or more Read/Insert/Update or Delete
Operations
– Operations are forwarded to the LQH of the Data Node having the data for
the operation
• Isolation Level
– Committed Read
• Read both from primary and secondary fragment
• No lock required
• Update/Insert/Delete
– Locks on index entry in ACC
– Both primary and secondary fragments
• Read exclusive/Read shared
– Locks the index entry in ACC on primary and secondary fragments
© 2011 Oracle Corporation 99