Oracle my sql cluster cge

MySQL Cluster Carrier Grade Edition
Alexander Yu
Principal Sales Consultant | MySQL Asia Pacific & Japan
2011-07-20

Agenda / Topics

• Oracle MySQL Strategy
• MySQL Server
Pluggable Storage Engine Architecture
• High Availability Solutions
• MySQL Cluster Carrier Grade
– Internals
– Geographical Replication
– Scale Out
– Backup & Restore
•Q&A

© 2011 Oracle Corporation 2

About MySQL

• Founded, first release in 1995
• MySQL Acquired by Sun Microsystems Feb 2008
• Oracle Acquires Sun Microsystems Jan 2010
• +12M Product Installations
• 65K+ Downloads Per Day
• Part of the rapidly growing open source LAMP stack

Customers across every major operating system, hardware
vendor, geography, industry, and application type

High Performance ▪ Reliable ▪ Easy to Use


Oracle’s Strategy:
Complete. Open. Integrated.
• Built together
• Tested together
• Managed together
• Serviced together
• Based on open standards
• Lower cost
• Lower risk
• More reliable


Complete. Open. Integrated.
MySQL Completes The Stack

• Oracle never settles for being second
best at any level of the stack
• “Complete” means we meet most
customer requirements at every level

That’s why MySQL matters to
Oracle and Oracle customers


The “M” in the LAMP Stack

Operating
System L
Application
Server A
Database M
Scripting P

For© 2011 Oracle Only -- Oracle Conﬁdential & Proprietary
Internal Use Corporation 6

Investment in MySQL
Rapid Innovation

• Make MySQL a Better MySQL
• #1 Open Source Database for Web Applications
• Most Complete LAMP Stack
• Telecom & Embedded
• Develop, Promote and Support MySQL
• Improve engineering, consulting and support
• Leverage 24x7, World-Class Oracle Support

• MySQL Community Edition
• Source and binary releases
• GPL license


Oracle + MySQL Customers

• Product Integration
• Oracle GoldenGate (Complete!)
• Oracle Enterprise Linux + Oracle VM (Complete!)
HA Template Available
• Oracle Secure Backup (CY 2011)
• Oracle Audit Vault (CY 2011)
• Oracle Enterprise Manager (CY 2011)

• Support
• Leverage 24x7, World-Class Oracle Support
• MyOracle Support


Serving Key Markets and Industry Leaders

Powering Data Management on the Web & in the Network

Web OEM / ISV’s

SaaS, Hosting Telecommunications Enterprise 2.0


MySQL in Communications

http://www.mysql.com/industry/communications/resources.html#customer_case_studies


MySQL Server


MySQL Server
Connectors
Clients and Apps Native C API, JDBC, ODBC, .Net, PHP, Ruby, Python, VB, Perl

Enterprise Management
Services and Utilities Connection Pool
Backup & Recovery Authentication – Thread Reuse – Connection Limits – Check Memory – Caches
Security
Replication
Cluster
Partitioning SQL Interface Parser Caches
Optimizer
Instance Manager DDL, DML, Stored Query Translation, Global and Engine
Access Paths, Statistics
Information_Schema Procedures, Views, Object Privileges Specific Caches and
MySQL Workbench Triggers, Etc.. Buffers

Pluggable Storage Engines
Memory, Index and Storage Management

InnoDB MyISAM Cluster Etc… Partners Community More..

Filesystems, Files and Logs
Redo, Undo, Data, Index, Binary, Error, Query and Slow


MySQL Cluster Architecture
Shared-nothing distributed database with no SPOF: JDBC (Java)
High Read & Write Performance & 99.999% uptime NDB API (C++)
ClusterJ (Java)
Clients OpenJPA (Java)
PHP/P*/ODBC
OpenLDAP

MySQL Cluster Application Nodes
SQL Nodes

ClusterJ MGM Client

NDB API (C++) MGM API (C)
NDB API

NDB API
MGM Node MGM Node
MySQL Cluster Data Nodes


Workload Qualification InnoDB vs MySQL Cluster
Workload InnoDB MySQL Cluster

No. Unless mainly
Packaged Applications (i.e. standard business applications) Yes
PK access

Custom Applications Yes Yes

OLTP Applications Yes Yes

DSS Applications (i.e. Data Marts, Analytics, etc.) Yes No

Content Management Yes Limited Support

In-Network Telecoms Applications (HLR, HSS, SDP, etc) No Yes

Web Session Management Yes Yes

User Profile Management & AAA Yes Yes

eCommerce Databases Yes Yes


Feature Comparison InnoDB vs MySQL Cluster
Feature Qualification InnoDB MySQL Cluster
Latest MySQL 5.5 & InnoDB 1.1 Performance Enhancements Yes No

Storage Limits 64TB 2TB (a)
Foreign Keys Yes No
MVCC Non-Blocking Reads Yes No
Optimized for Complex Multi-Table JOINs with Thousands of Accesses Yes No (b)

Hash Indexes No Yes
Compressed Data Yes No
Support for 8KB+ Row Sizes Yes Only via BLOBs ( c )
Built-in Clustering Support for 99.999% HA No Yes
Minimum Number of Physical Hosts for Redundancy 2 (Active / Passive) 2 + 1 ( A/A & Mgmt) (d)
Time to Recovery After Node Failure 30s - hours Sub-Second
Real-Time Performance No Yes
Option for In-Memory Storage of Tables with Disk Persistence No Yes

Non-SQL Access Methods to Data (i.e. NDB API) No Yes

Write Scalability without Application Partitioning No Yes (e)
Max Number of Nodes for Parallel Write Performance 1 48 (f)

Conflict Resolution & Detection across Multiple Replication Masters No Yes

Virtualization Support Yes No


Storage Engines

Feature MyISAM NDB Archive InnoDB Memory
Storage limits No Yes No 64TB Yes

Transactions No Yes No Yes No
Locking granularity Table Row Row Row Table
MVCC snapshot read No No No Yes No

Geospatial support Yes No Yes Yes No
Data caches No Yes No Yes NA
Index caches Yes Yes No Yes NA
Compressed data Yes No Yes No No

Storage cost (relative to other engines)‫‏‬ Small Med Small Med NA
Memory cost (relative to other engines)‫‏‬ Low High Low High High
Bulk insert speed High High Highest Med High

Replication support Yes Yes Yes Yes Yes
Foreign Key support No No No Yes No

Built-in Cluster/High-availability support No Yes No No No

Dynamically add and remove storage engines. Change the storage engine on a table with “ALTER TABLE …”


Why Users Adopt MySQL Cluster
MySQL Already in Use

High Read/Write 99.999% MySQL
Throughput

Real Time Performance Scale-Out, On-Demand


Why Users Buy MySQL Cluster CGE
Standardized on Open Source

Blend of Web & Deploying Mission Critical Applications
Telecoms Capabilities HA MySQL

Management & Monitoring
Global 24x7 support Tools

Embedding MySQL Cluster
Real-Time, High Read/
Write Performance Scale-Out, Shared
Nothing


High Availability Solutions


Selecting the Right HA Architecture


Mapping HA Architecture to Applications
Shared-Nothing,
Data Clustered /
Applications Geo-Replicated
Replication Virtualized
Cluster
E-Commerce / Trading

Session Management

User Authentication / Accounting

Feeds, Blogs, Wikis
Data Refinery

OLTP

Data Warehouse/BI

Content Management

CRM / SCM

Collaboration

Packaged Software
Telco Apps (HLR/HSS/SDP…)


MySQL High Availability Solutions

9 5. 0 0 0 % • MySQL Replication
9 9. 0 0 0 % • MySQL Replication with Clustering Software
9 9. 9 0 0 % • DRBD with Clustering Software
9 9. 9 0 0 % • Shared Storage with Clustering Software (A/P - A/A)
9 9. 9 9 0 % • DRBD and Replication with Clustering Software
9 9. 9 9 0 % • Shared Storage and Replication with Clustering SW
9 9. 9 9 0 % • Shared Storage Replication
9 9. 9 9 0 % • Virtualised Environment
9 9. 9 9 9 % • MySQL Cluster
9 9. 9 9 9 % • MySQL Cluster & Replication
9 9. 9 9 9 % • MySQL Cluster Carrier Grade Edition


MySQL Replication

• Native in MySQL
• Used for Scalability and HA
• Asynchronous as standard
• Semi-Synchronous support
added in MySQL 5.5
• Each slave adds minimal
load on master

Relay Log


Replication Topologies

Single Chain Circular

Multiple Multi - Master Multi - Circular


MySQL Replication
Read Scalability
Clients

MySQL Replication

Slaves Master

• Used by leading web properties for scale-out
• Reads are directed to slaves, writes to master
• Delivers higher performance & scale with efficient resource utilization


MySQL Replication
Failure Detection & Failover

• Linux Heartbeat implements heartbeat protocol between nodes
• Failover initiated by Cluster Resource Manager (Pacemaker) if heartbeat message is not
received
• Virtual IP address failed over to ensure failover is transparent to apps


Shared Disk Clusters
A/P - A/A
READS/WRITES
Applications

VIP
Shared
Storage

• Reliability • High Availability
- Commonly used solution - Data handled by a SAN or NAS
and always available
• Fault Tolerance - Automatic fail-over
- No single point of failure with
appropriate hardware • Simplified Management


Distributed Replicated Block Device

• DRBD creates transaction-safe hot standby configuration
• MySQL updates written to block device on the Active Server
• DRBD synchronously replicates updates to the Passive Server
• Linux Heartbeat fails over from Active to Passive in event of failure


Sharding aka Application Partitioning

Master

Clients
Slave
Reads

Writes

Partitioning Logic

1 2 3 4 5

Shards

Slaves


Oracle VM Template for MySQL
Integrated & Tested OS, VM and Database Stack

Oracle VM Oracle VM Oracle VM
Fastest, simplest & most reliable
way to deploy virtualized, cloud-
ready MySQL instances, certified Oracle VM Oracle VM

for production use

• Rapid DEPLOYMENT Oracle VM Server Pool

• Increased RELIABILITY
• Higher AVAILABILITY Oracle VM Servers

• Lower COST


Template Components
Certified for Production Deployment

Oracle VM Oracle VM
Automatic Fault
Detection &
Recovery
• Oracle Linux 5 Update 6 with the
Unbreakable Enterprise Kernel
• Oracle VM 2.2.1 Secure Live Migration
(SSL)

• Oracle VM Manager 2.1.5
Oracle VM Server Pool
• Oracle Cluster File System 2 (OCFS2) Oracle VM
Manager
• MySQL Database 5.5.10 (Enterprise Edition)
Oracle VM Servers

Pre-Installed & Pre-Configured
ocfs2
Full Integration & QA Testing
SAN / iSCSI
Single Point of Support


Positioning Current Solutions
Requirement MySQL Replication Heartbeat + DRBD Oracle VM Template MySQL Cluster
Availability
All Supported by MySQL All Supported by MySQL
Platform Support Linux Oracle Linux
Server Cluster
Depends on Connector and
Automated IP Failover No Yes Yes
Configuration
Automated Database
No Yes Yes Yes
Failover
Automatic Data
No Yes N/A - Shared Storage Yes
Resynchronization
Configuration Dependent, 60 Configuration Dependent, 60
Typical Failover Time User / Script Dependent 1 Second and Less
seconds and Above seconds and Above
No, Asynchronous and Semi-
Synchronous Replication Yes N/A - Shared Storage Yes
Synchronous
Geographic Redundancy
Yes Yes, via MySQL Replication Yes, via MySQL Replication Yes, via MySQL Replication
Support
Scalability

One Active (primary), one One Active (primary), one
Number of Nodes One Master, Multiple Slaves 255
Passive (secondary) Node Passive (secondary) Node

Reads, via MySQL Reads, via MySQL Reads, via MySQL
Built-in Load Balancing Yes, Reads and Writes
Replication Replication Replication & During Failover

Read-Intensive Workloads Yes Yes Yes Yes

Yes, via Application-Level Yes, via Application-Level
Yes, via Application-Level
Write-Intensive Workloads Sharding to Multiple Active/ Sharding to Multiple Active/ Yes, via Auto-Sharding
Sharding
Passive Pairs Passive Pairs
Scale On-Line (add nodes,
No No No Yes
repartition, etc.)


MySQL Cluster
Real-time Carrier Grade Database


Customers & Applications

• Web
– User profile management
– Session stores
– eCommerce
– On-Line Gaming
– Application Servers
• Telecoms
– Subscriber Databases (HLR/HSS)
– Service Delivery Platforms
– VoIP, IPTV & VoD
– Mobile Content Delivery
– On-Line app stores and portals
– IP Management
– Payment Gateways

http://www.mysql.com/industry/telecom/


MySQL Cluster - NDB Storage Engine


MySQL Cluster Architecture
Shared-nothing distributed database with no SPOF: JDBC (Java)
High Read & Write Performance & 99.999% uptime NDB API (C++)
ClusterJ (Java)
Clients OpenJPA (Java)
PHP/P*/ODBC
OpenLDAP

MySQL Cluster Application Nodes
SQL Nodes

ClusterJ MGM Client

NDB API (C++) MGM API (C)
NDB API

NDB API
MGM Node MGM Node


MySQL Cluster Nodes
SQL Based Applications

JDBC/ODBC

MySQL/ API API
API Node Management
SQL Node Node Node
Client

NDB API

Data
MySQL Cluster Data MGM API
Node Node
Management
Node

NDB API
Data Data
Node Node


MySQL Cluster Nodes
• Standard SQL Interface
SQL Node • Scale-out for Performance
(MySQL)
• Enables Replication

• High Performance
NDB API • C, C++ & Java, LDAP, HTTP API
(Application) • Developer’s Guide

• Data Storage (Memory/Disk)
Data Node • Automatic & User-Defined Partitioning
• Local & Global Checkpoints
(NDB Storage Engine)
• Scale-out or scale-up for Capacity & Redundancy
• Scale dynamically with on-line add node

• Administration and Configuration
Management • Arbitration
• Use Two for Redundancy
Node


Replication Flexibility

• Synchronous replication within a
Cluster node group for HA
• Bi-Direction asynchronous
Cluster 1 Cluster 2 replication to remote Cluster for
geographic redundancy
• Asynchronous replication to non-
Cluster databases for specialised
activities such as report
generation
• Mix and match replication types
MyISAM MyISAM InnoDB

Synchronous replication

Asynchronous replication


MySQL Cluster Loads

MySQL MySQL MySQL
Community Cluster Cluster
Server (GPL) CGE
• MySQL Cluster software (Management &
MySQL MySQL MySQL
Server ≠ Server = Server Data Nodes) included with MySQL
Community Server should not be used
InnoDB
≠ InnoDB = InnoDB
• MySQL Server included with MySQL
≠ Data Node = Data Node
Cluster loads is different to regular
≠ Mgmt Node = Mgmt Node
MySQL Server
• Always use this special version of MySQL Server when
accessing MySQL Cluster data
• MySQL Cluster CGE downloaded from
oem.mysql.com
• GA GPL Community versions downloaded
from www.mysql.com/downloads
• In-development GPL Community versions
downloaded from dev.mysql.com/
downloads/


MySQL Cluster System Requirements
System Component Requirement

Hosts Maximum of 255 total nodes (48 Data Nodes)‫‏‬

COTS – Advanced TCA
Hardware
32 & 64-bit x86 & SPARC

Memory Varies on size of database, # of hosts, # of replicas

Shared-Nothing - Memory & Disk Data
Storage
SCSI or RAID for I/O performance

Network >1 Gigabit recommended, SCI supported

Linux (Red Hat, SuSE), Solaris, HP-UX, Mac OSX,
Operating System
Windows, others…


MySQL Cluster 6.2


MySQL Cluster 6.3

http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-changes-5-1-ndb-6-3.html


MySQL Cluster 7.0 –GA April 2009

http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php


Scale out – multi core environments


MySQL Cluster vs MySQL MEMORY:
30x Higher Throughput / 1/3rd the Latency on a single node

• Table level locking inhibits MEMORY scalability beyond a single client
connection
• Check-pointing & logging enabled, MySQL Cluster still delivers durability
• 4 socket server, 64GB RAM, running Linux


Scale-Out Reads & Writes on Commodity Hardware

• NDB API Performance 4.33 M
Queries per second!
• 8 Intel servers, dual-6-core CPUs
@2.93 GHz, 24GB RAM
• 2 Data Nodes per server
• flexAsync benchmark
– 16 parallel threads, each issuing 256
simultaneous transactions
– Read / Write 100KB attribute
• Interim results from 2 days testing –
watch this space:
mikaelronstrom.blogspot.com


MySQL Cluster CGE 7.1 – Key Enhancements

http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php


MySQL Cluster 7.1 Momentum

1,000 Downloads per Day

Windows GA

Pro-active Cluster
Monitoring
Fully Automated “MySQL Cluster 7.1 gave us the
Management perfect combination of extreme levels
of transaction throughput, low
10x Higher Java latency & carrier-grade availability,
Performance while reducing TCO”
Phani Naik, Pyro Group



• Example 1: Check memory usage/availability

mysql> select * from ndbinfo.memoryusage;
+---------+--------------+--------+------------+-----------+-------------+
| node_id | memory_type | used | used_pages | total | total_pages |
+---------+--------------+--------+------------+-----------+-------------+
| 3 | Data memory | 917504 | 28 | 104857600 | 3200 |
| 3 | Index memory | 221184 | 27 | 11010048 | 1344 |
| 4 | Data memory | 917504 | 28 | 104857600 | 3200 |
| 4 | Index memory | 221184 | 27 | 11010048 | 1344 |
+---------+--------------+--------+------------+-----------+-------------+

• Note that there is a DATA_MEMORY and INDEX_MEMORY row for each data node in the
cluster
• If the Cluster is nearing the configured limit then increase the DataMemory and/or
IndexMemory parameters in config.ini and then perform a rolling restart



• Example 2: Check how many table scans performed on each data node since the last restart

mysql> select node_id as 'data node', val as 'Table Scans' from ndbinfo.counters
where counter_name='TABLE_SCANS';
+-----------+-------------+
| data node | Table Scans |
+-----------+-------------+
| 3 | 3 |
| 4 | 4 |
+-----------+-------------+

• You might check this if your database performance is lower than anticipated
• If this figure is rising faster than you expected then examine your application to understand
why there are so many table scans


Latest news on MySQL Cluster 7.1

• As of MySQL Cluster 7.1.9a:
• InnoDB plugin included
• New view in ndbinfo:
mysql> SELECT node_id, page_requests_direct_return AS hit, page_requests_wait_io AS miss,
100*page_requests_direct_return/(page_requests_direct_return+page_requests_wait_io) AS
hit_rate FROM ndbinfo.diskpagebuffer;
+---------+------+------+----------+
| node_id | hit | miss | hit_rate |
+---------+------+------+----------+
| 3 | 6 | 3 | 66.6667 |
| 4 | 10 | 3 | 76.9231 |
+---------+------+------+----------+

• MEM2.3 includes new Cluster Advisor/graphs


MySQL Enterprise Monitor 2.3


Online Operations

• Scale the cluster for throughput or capacity
– Data and SQL Nodes
• Repartition tables
• Recover failed nodes
• Upgrade / patch servers & OS
• Upgrade / patch MySQL Cluster
• Back-Up
• Evolve the schema on-line, in real-time


Real-Time, On-Line Schema Changes

CREATE OFFLINE INDEX b ON t1(b);
• Fully online – transaction response
Query OK, 1356 rows affected (2.20 sec)‫‏‬
times unchanged
• Add and remove indexes, add new
columns and tables DROP OFFLINE INDEX b ON t1;

• No temporary table creation Query OK, 1356 rows affected (2.03 sec)‫‏‬

• No recreation of data or deletion
required CREATE ONLINE INDEX b ON t1(b);
• Faster and better performing table Query OK, 0 rows affected (0.58 sec)‫‏‬
maintenance operations
• Less memory and disk requirements DROP ONLINE INDEX b ON t1;

ALTER ONLINE TABLE t1 ADD COLUMN d INT;


Performance I Flexibility I Simplification
• SQL and NoSQL Access Methods to tables
– SQL: complex queries, rich ecosystem of apps & expertise
– Simple Key/Value interfaces bypassing SQL layer for blazing fast reads & writes
– Real-time interfaces for micro-second latency
– Developers free to work in their preferred environment


Scaling Distributed Joins 7.2DM
Adaptive Query Localization

• ‘Complex’ joins traditionally slower in MySQL Cluster
– Complex = lots of levels and interim results in JOIN
• JOIN was implemented in the MySQL Server:
– Nested Loop join
– When data is needed, it must be fetched over the
mysqld network from the Data Nodes; row by row
– This causes latency and consumes resources
• Can now push the execution down into the data
Data Nodes
nodes, greatly reducing the network trips
AQL • 25x-40x performance gain in customer PoC!

mysqld

Data Nodes The existence, content and timing of future releases described here is included for
information only and may be changed at Oracles discretion.

http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html


Adaptive Query Localization: Current Limitations

• Columns to be joined
– must use exactly the same data type
– cannot be any of the BLOB or TEXT types
– columns to be joined must be part of a table index or primary key
• AQL can be disabled using the ndb_join_pushdown server system variable
– enabled by default


•<Insert Picture Here>

Early Adopter Speaks!

“Testing of Adaptive Query Localization has yielded over 20x
higher performance on complex queries within our application,
enabling Docudesk to expand our use of MySQL Cluster into a
broader range of highly dynamic web services.”

Casey Brown
Manager, Development & DBA Services, Docudesk


MySQL Cluster: SQL & NoSQL Combined

Mix & Match!
Same data accessed
simultaneously through
SQL & NoSQL interfaces

• NoSQL – Multiple ways to bypass SQL, and maximize performance:
• NDB API. C++ for highest performance, lowest latency
• Cluster/J for optimized access in Java
• NEW! Memcached. Use all your existing clients/applications


Which to Choose ?


Performance


NoSQL With NDB API
Best possible performance

Clients • Application embeds the NDB API C++
interface library
• NDB API make intelligent decision (where
possible) about which data node to send queries
to
Applications with embedded NDB API
Library – With a little planning in the schema design, achieve linear
scalability
• Used by all of the other application nodes
(MySQL, LDAP, ClusterJ,…)
• Best possible performance but requires >
development skill
• Favourite API for real-time network
applications
• Foundation for all interfaces



NoSQL with memcached 7.2DM

• Memcached is a distributed memory based
hash-key/value store with no persistence to disk
Memcached protocol • NoSQL, simple API, popular with developers
• MySQL Cluster already provides scalable, in-
memory performance with NoSQL (hashed)
access as well as persistence
• Provide the Memcached API but map to NDB API calls
• Writes-in-place, so no need to invalidate cache
• Simplifies architecture as caching & database
integrated into 1 tier
• Access data from existing relational tables


NoSQL with Memcached 7.2DM
Pre-GA version available from labs.mysql.com

Flexible: Simple:
• Deployment options set maidenhead 0 0 3
SL6
• Multiple Clusters
STORED
• Simultaneous SQL Access
• Can still cache in Memcached server get maidenhead
• Flat key-value store or map to multiple tables/ VALUE maidenhead 0 3
SL6
columns
END


MySQL Cluster Manager 1.1 Features

Delivered as part of MySQL Cluster CGE 7.1


How Does MySQL Cluster Manager Help ?
Example: Initiating upgrade from MySQL Cluster 6.3 to
7.1

Before MySQL Cluster Manager With MySQL Cluster Manager
•1 x preliminary check of cluster state upgrade cluster --package=7.1 mycluster;

•8 x ssh commands per server
•8 x per-process stop commands
•4 x scp of configuration files (2 x mgmd & 2 x Total: 1 Command -
mysqld) Unattended Operation
•8 x per-process start commands
•8 x checks for started and re-joined processes • Results
•8 x process completion verifications
• Reduces the overhead and complexity of
•1 x verify completion of the whole cluster.
managing database clusters
•Excludes manual editing of each configuration file.
• Reduces the risk of downtime resulting from
Total: 46 commands - administrator error
2.5 hours of attended operation • Automates best practices in database
cluster management


Terms used by MySQL Cluster Manager

• Site: the set of physical hosts which are to run
Cluster processes to be managed by MySQL
Cluster Manager. A site can include 1 or more
Site clusters.
Host Host Host Host • Cluster: represents a MySQL Cluster
deployment. A Cluster contains 1 or more
Cluster
processes running on 1 or more hosts
• Host: Physical machine, running the MySQL
Process

Process
Process

Process
Process

Process

Process

Cluster Manager agent
Cluster • Agent: The MySQL Cluster Manager process
running on each host
Process

Process

Process

• Process: an individual MySQL Cluster node;
one of: ndb_mgmd, ndbd, ndbmtd, mysqld &
agent agent agent agent ndbapi*
• Package: A copy of a MySQL Cluster installation
directory as downloaded from mysql.com, stored
on each host
*ndbapi is a special case, representing a slot for an external application
process to connect to the cluster using the NDB API


Example configuration
mysql
client

• MySQL Cluster Manager agent runs on
each physical host
7. mysqld 8. mysqld
• No central process for Cluster Manager –
1. ndb_mgmd 2. ndb_mgmd
agents co-operate, each one responsible
agent agent for its local nodes
• Agents are responsible for managing all
192.168.0.10 192.168.0.11
nodes in the cluster
3. ndbd 4. ndbd
• Management responsibilities
• Starting, stopping & restarting nodes
5. ndbd 6. ndbd
• Configuration changes
agent agent
• Upgrades
192.168.0.12 192.168.0.13 • Host & Node status reporting
• Recovering failed nodes
n. mysqld MySQL Server (ID=n)

n. ndb_mgmd Management Node (ID=n)

n. ndbd Data Node (ID=n)

agent MySQL Cluster Manager agent


Creating & Starting a Cluster

mysql 1.Define the site:
client
Mysql> create site --hosts=192.168.0.10,192.168.0.11,
-> 192.168.0.12,192.168.0.13 mysite;

2.Expand the MySQL Cluster tar-ball(s) from
mysql.com to known directory
7. mysqld 8. mysqld
3.Define the package(s):
1. ndb_mgmd 2. ndb_mgmd Mysql> add package --basedir=/usr/local/mysql_6_3_26 6.3;
Mysql> add package --basedir=/usr/local/mysql_7_0_7 7.0;
agent agent
Note that the basedir should match the directory used
in Step 2.
192.168.0.10 192.168.0.11
4.Create the Cluster
Mysql> create cluster --package=6.3
3. ndbd 4. ndbd -> --processhosts=ndb_mgmd@192.168.0.10,ndb_mgmd@192.168.0.11,
-> ndbd@192.168.0.12,ndbd@192.168.0.13, ndbd@192.168.0.12,
-> ndbd@192.168.0.13,mysqld@192.168.9.10,mysqld@192.168.9.11
5. ndbd 6. ndbd -> mycluster;

agent agent This is where you define what nodes/processes make
up the Cluster and where they should run
192.168.0.12 192.168.0.13 5.Start the Cluster:
Mysql> start cluster mycluster;


Upgrade Cluster

mysql
client

• Upgrade from MySQL Cluster 6.3.26 to 7.0.7:
7. mysqld 8. mysqld
mysql> upgrade cluster --package=7.0 mycluster;
1. ndb_mgmd 2. ndb_mgmd

agent agent
• Automatically upgrades each node and restarts
the process – in the correct order to avoid any
loss of service
192.168.0.10 192.168.0.11
• Without MySQL Cluster Manager, the
3. ndbd 4. ndbd administrator must stop each process in turn,
start the process with the new version and wait
5. ndbd 6. ndbd
for the node to restart before moving onto the
agent agent next one

192.168.0.12 192.168.0.13


MySQL Cluster Manager
GA 1st November 2010
Mgmt Mgmt Mgmt Mgmt
33 mysqld
Node 34 mysqld
Node 33 mysqld mysqld
Node 34 mysqld mysqld
Node

Data Data Data Data Data Data
31 Node 32 Node 31 Node 32 Node 35 Node 36 Node

• On-line add-node
mysql> add hosts --hosts=192.168.0.35,192.168.0.36 mysite;
mysql> add package --basedir=/usr/local/mysql_7_0_7 –
hosts=192.168.0.35,192.168.0.36 7.0;
mysql> add process --
processhosts=mysqld@192.168.0.33,mysqld@192.168.0.34,ndbd@192.1
68.0.35,ndbd@192.168.0.36 mycluster;
mysql> start process --added mycluster;
• Restart optimizations
• Fewer nodes restarted on some parameter changes


General Design Considerations

• MySQL Cluster is designed for
– Short transactions
– Many parallel transactions
• Utilize Simple access patterns to fetch data
– Use efficient scans and batching interfaces
• Analyze what your most typical use cases are
– optimize for those

Overall design goal
Minimize network roundtrips for your
most important requests!


Best Practice : Primary Keys

• To avoid problems with
• Cluster 2 Cluster replication
• Recovery
• Application behavior (KEY NOT FOUND.. etc)
• ALWAYS DEFINE A PRIMARY KEY ON THE TABLE!
• A hidden PRIMARY KEY is added if no PK is specified. BUT..
• .. NOT recommended
• The hidden primary key is for example not replicated (between Clusters)!!
• There are problems in this area, so avoid the problems!
• So always, at least have
id BIGINT AUTO_INCREMENT PRIMARY KEY
• Even if you don't “need” it for you applications


Best Practice: Distribution Aware Apps
SELECT SUM(population) FROM towns
WHERE country=“UK”;
• Partition selected using hash on
Partition Key
Partition Key
• Primary Key by default
Primary Key
• User can override in table definition
town country population
Maidenhead UK 78000 • MySQL Server (or NDB API) will
Paris France 2193031 attempt to send transaction to the
Boston UK 58124 correct data node
Boston USA 617594 • If all data for the transaction are in the
same partition, less messaging -> faster
SELECT SUM(population) FROM towns
WHERE town=“Boston”;
• Aim to have all rows for high-running
queries in same partition
Partition Key

Primary Key
town country population
Maidenhead UK 78000
Paris France 2193031
Boston UK 58124
Boston USA 617594


Best Practice: Distribution Aware – Multiple Tables

Partition Key

Primary Key
sub_id age gender • Extend partition awareness over
19724 25 male multiple tables
84539 43 female
• Same rule – aim to have all data for
19724 16 female
instance of high running transactions
74574 21 female
in the same partition
Partition Key

Primary Key
ALTER TABLE service_ids
service sub_id svc_id
PARTITION BY KEY(sub_id);
twitter 19724 76325732
twitter 84539 67324782
facebook 19724 83753984
facebook 73642 87324793


MySQL Cluster
Internals


Automatic Data Partitioning
4 Partitions * 2 Replicas = 8 Fragments
Table T1 Data Node 1

F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2
Fx Primary Fragment Data Node 4
- Node groups are created automatically F4 F2
- # of groups = # of data nodes / # of replicas Fx Secondary Fragment (fragment replica)



Px Partition

P1
Data Node 2

P2

P3 Data Node 3

P4

Data Node 4
A fragment is a copy of a partition (aka fragment replica)
Number of fragments = # of partitions * # of replicas



F1
Px Partition

P1
Data Node 2

P2

P3 Data Node 3

P4


Fx Secondary Fragment (fragment replica)



F1
Px Partition

P1
Data Node 2

P2 F1

P3 Data Node 3

P4





F1
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

P4





F1 F3
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

P4





F1 F3
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2
P4





F1 F3
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2
P4


F2



F1 F3
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2
P4


F4 F2



F1 F3
Px Partition

P1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4


F4 F2



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4


F4 F2



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2
- Node groups are created automatically F4 F2



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2
As long as one data node in each node
group is running we have a complete F4 F2
copy of the data



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2
group is running we have a complete F4 F2
copy of the data



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2
group F4 F2
is running we have a complete copy of
the data



F1 F3
Px Partition

P1 Node Group 1
Data Node 2

P2 F3 F1

P3 Data Node 3

F2 F4
P4

Node Group 2

F4 F2
- No complete copy of the data Fx Secondary Fragment (fragment replica)
- Cluster shutdowns automatically


Data Partitioning

• Automatic distribution/partitioning
– Primary Key hash value (partitioning by Key)
• Transparent load balancing
– Distribution awareness
• Data Node chosen based on PK hash value
– Or proximity (SQL Node - shared memory, localhost, remote host)
• Support for user defined partitioning 4 Partitions * 2 Replicas = 8 Fragments

• Key Concepts Table T1 Data Node 1

F1 F3
– Partition Px Partition

• Horizontal P1 Node Group 1
Data Node 2

• # of partitions = # of data nodes P2 F3 F1

– Fragment
P3
• Copy of a partition Data Node 3

F2 F4
– Replica P4

• Complete copy of the data Fx Primary Fragment Data Node 4
Node Group 2

– Node Group - Node groups are created automatically
F4 F2

• Groups data nodes (automatically)
• Determined by the order in configuration file
• # of groups = # of data nodes / # of replicas


Internal Replication

• Replication between Data Nodes
• Synchronous Replication
– To ensure minimal failover time
– Data Nodes have the same information at the same point in time
– Achieved by Two-phase commit protocol
• Two-phase commit
– 1. Prepare/update phase
• All fragments (primary/secondary) gets updated
– 2. Commit phase
• The changes are committed
– Every Data Node has Transaction Coordinator
– One is elected to be the transaction coordinator
– The information goes from the Transaction Coordinator (TC) to primary fragments
and further to secondary fragments


Internal Replication: Prepare Phase

Data Node insert into T1 values (...) Data Node
1

Transaction Coordinator Transaction Coordinator

4 2

Local Query Handler 3 Local Query Handler

ACC TUP 1. Calc hash on PK ACC TUP
2. Forward request to LQH
Index F1 F2 where primary fragment is Index F2 F1
Memory 3. Prepare secondary fragment Memory
Data Memory Data Memory
4. Prepare phase done


Internal Replication: Commit Phase

Data Node insert into T1 values (...) Data Node
4

Transaction Coordinator Transaction Coordinator

1 3

Local Query Handler 2 Local Query Handler

ACC TUP ACC TUP

Index F1 F2 Index F2 F1
Memory Memory
Data Memory Data Memory


Transactions

• Transaction Coordinator
– The elected TC starts the transaction
– TC calculates a hash on the primary key
– Each transaction contains one or more Read/Insert/Update or Delete
Operations
– Operations are forwarded to the LQH of the Data Node having the data for
the operation
• Isolation Level
– Committed Read
• Read both from primary and secondary fragment
• No lock required
• Update/Insert/Delete
– Locks on index entry in ACC
– Both primary and secondary fragments
• Read exclusive/Read shared
– Locks the index entry in ACC on primary and secondary fragments


Oracle my sql cluster cge

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Oracle my sql cluster cge

Similar to Oracle my sql cluster cge (20)

Recently uploaded

Recently uploaded (20)

Oracle my sql cluster cge