SlideShare a Scribd company logo
1 of 44
Easy MySQL Database Sharding
with CUBRID SHARD
Esen Sagynov
April 24, 2013
Today
1. About NHN
2. Sharding in Production
3. Why CUBRID SHARD
4. How to shard MySQL databases
5. DEMO
6. CUBRID SHARD in Ndrive
2
About me
3
• Esen Sagynov (NHN Corp.)
– @CUBRID
– fb.com/cubrid
esen@cubrid.org
About NHN
Sharding in Production
• Uses RDBMS with Sharding
• Data is stored as simple Key-Value.
•
•
•
•
•
•
•
•
•
•
•
•
Sharding Solutions
Name Type Requirements Interface
DB ETC
Hibernate shards AS framework
DBMS w/
Hibernate
support
- Hibernate
- JVM
Java
HiveDB AS framework MySQL
- Hibernate
- JVM
Java
dbShards AS & Middleware MySQL
Java, C, PHP, Python,
Ruby
Gizzard (Twitter) Middleware Any storage - JVM Java
Spider for MySQL
Middleware &
Storage Engine
MySQL Any
Spock Proxy Middleware MySQL Any
Shard-Query Middleware MySQL PHP, RESTful API
CUBRID SHARD Middleware
- CUBRID
- MySQL
- Oracle
Any
Sharding Solution Categories
• Application layer
• Storage layer
• Heavy middleware
• Lightweight middleware
Application & Storage Layers
Application Layer
• Hibernate Shards
• HiveDB
Disadvantage
• Requires Hibernate/Java
• Uses many XML files for configuration
• Not for running services
8
Storage Layer
• Spider for MySQL
Disadvantage
• Requires to change storage
engine
• Not for running services
Heavy Middleware
dbShards Gizzard
9
• Requires to change application
code
• Requires agents to be installed on
each DB server
• Not for running services
• Not active
Lightweight Middleware
• Spock Proxy
– Active project
– Lightweight
– Flexible
– Easy to configure
– No application change
10
Spock Proxy
Spock Proxy
Sharding rule storage Database
Sharding strategy Modulo
Determine Sharding Key Full SQL Parsing
Strength No need to change SQL
Weakness • Performance degradation:
• Extra SQL parsing
• Resultset merging
• Not all MySQL SQL is supported
• Single threaded
11
Blog post: http://www.cubrid.org/blog/dev-
platform/database-sharding-platform-at-nhn/
Spock Proxy Performance
12
• Single threaded
• Parses and rewrites SQL
0
100
200
300
400
500
1 5 10 20 30 50 70 100 200 400
App Sharding
Spock Proxy
CUBRID SHARD
Concurrent clients
Exec. time
Spock Proxy
 Active project
 Lightweight
 Flexible
 Easy to configure
 No application change
✕No performance impact
Lightweight, Easy to Configure
Sharding Middleware
CUBRID SHARD
14
Spock Proxy vs. CUBRID SHARD
Spock Proxy CUBRID SHARD
Sharding rule storage Database Configuration file
Sharding strategy Modulo • Modulo
• User defined hash function
Determine Sharding Key Full SQL Parsing SQL Hint Search
Strength No need to change SQL • Supports CUBRID and MySQL
• Full MySQL SQL support
• Higher performance
• No SQL parsing
• Multi-threaded
• Connection pooling
• Load balancing
• Custom sharding strategy
• Easy configuration
Weakness • Performance degradation:
• Extra SQL parsing
• Resultset merging
• Supports MySQL only
• Not all MySQL SQL is supported
• Single threaded
• Requires to change SQL queries to
insert the sharding hint
15
Blog post: http://www.cubrid.org/blog/dev-
platform/database-sharding-platform-at-nhn/
CUBRID Facts
 RDBMS
 True Open Source @ www.cubrid.org
 Optimized for Web services
 High performance
 Large DB support
 High-Availability feature
 DB Sharding support
 90+% MySQL compatible SQL syntax + Oracle analytical
functions
 ACID Transactions
 Online Backup
 Supported by NHN Corporation
CUBRID SHARD Architecture
… …
……
Single database view
OR
SHARD Environment
…
…
……
Installing CUBRID SHARD is easy!
Easy Installation
http://www.cubrid.org/downloads
apt-get
yum
chef ⭐
VM
EC2 AMI
cloud service
Doc page:
http://www.cubrid.org/wiki_tutorials/entry/cubrid
-installation-instructions
Configuring is very easy and intuitive!
Configuration Steps
• Create
1. Shards
2. Database Users
3. Database Schema
4. Configure CUBRID SHARD
– shard database information
– backend shards connection information
– sharding strategy
5. Start CUBRID SHARD
6. Change application code
– connection URL
– shard hint
23
CUBRID SHARD
# 1. Create Shards
• Host 1..N:
$> mysql –ushard -ppassword –hnode1
mysql> CREATE DATABASE sharddb;
# 2. Create Users
• Host 1..N:
$> mysql –ushard -ppassword –hnode1
mysql> USE mysql;
mysql> GRANT ALL PRIVILEGES ON
sharddb@localhost TO shard@localhost
IDENTIFIED BY ‘shard123’
mysql> GRANT ALL PRIVILEGES ON
sharddb@localhost TO
shard@shardBrokerNode IDENTIFIED BY
‘shard123’
# 3. Create same tables
$> mysql –ushard -ppassword –hnode1
mysql> USE sharddb;
mysql> CREATE TABLE tbl_users (id BIGINT
PRIMARY KEY, name VARCHAR(20), age
SMALLINT)
$> mysql –ushard -ppassword –hnode2
…
• Host 1..N:
# 4. Simple Configuration
• shard.conf
– Main configuration file for CUBRID SHARD.
• shard_connection.txt
– Predefined list of shard IDs, database and host
names for CUBRID/MySQL.
• shard_keys.txt
– A list of shard_key_columns and their mapping
with shard_id
Doc page:
http://www.cubrid.org/manual/91/en/sha
rd.html#configuration-and-setup
shard.conf
Set:
1. SHARD_DB_NAME
2. SHARD_DB_USER
3. SHARD_DB_PASSWORD
4. APPL_SERVER
…
SHARD_DB_NAME = sharddb
SHARD_DB_USER = shard
SHARD_DB_PASSWORD = shard123
APPL_SERVER = CAS_MYSQL
…
Doc page:
http://www.cubrid.org/manual/91/en/shard.html#
default-configuration-file-shard-conf
shard_connection.txt
Set:
1. Shard ID
2. Real database name
3. Remote/local host name
# shard-id real-db-name connection-info
0 sharddb mysqlA:3306
1 sharddb mysqlB:3306
2 sharddb mysqlC:3306
…
** Host names must be identical to the output of
hostname command of every node.
Doc page:
http://www.cubrid.org/manual/91/en/shard.html#
setting-shard-metadata
shard_keys.txt
Set:
1. Min shard key
2. Max shard key
3. Shard ID
[%student_no]
# min max shard_id
0 63 0
64 127 1
128 191 2
192 255 3
** Default sharding strategy is
to apply modulo 256
(SHARD_KEY_MODULAR in
shard.conf ).
Doc page:
http://www.cubrid.org/manual/91/en/shard.html#
setting-shard-metadata
Custom Library
int fn_shard_key_udf(int type, void *val)
{
int mod = 2;
if (val == NULL)
{
return ERROR_ON_ARGUMENT;
}
switch(type)
{
case SHARD_U_TYPE_INT:
{
int ival;
ival = (int) (*(int *)val);
return ival % 2;
}
break;
case SHARD_U_TYPE_STRING:
return ERROR_ON_MAKE_SHARD_KEY;
default:
return ERROR_ON_ARGUMENT;
}
return ERROR_ON_MAKE_SHARD_KEY;
}
shard.conf
1. SHARD_KEY_LIBRARY_NAME
2. SHARD_KEY_FUNCTION_NAME
[%student_no]
SHARD_KEY_LIBRARY_NAME=$CUBRID/c
onf/shard_key_udf.so
SHARD_KEY_FUNCTION_NAME
=fn_shard_key_udf
Doc page:
http://www.cubrid.org/manual/91/en/shard.html#
setting-user-defined-hash-function
# 5. Start CUBRID SHARD
$> cubrid shard start
@ cubrid shard start ++
cubrid shard start: success
# 6. Connection URL
connectionURL =
"jdbc:cubrid:localhost:45511:sharddb:shard:shard123:
?althosts=node2:port2,node3:port3
&loadBalance=true";
Querying Shards
SELECT name FROM student WHERE
student_no = /*+ shard_key */ ?;
•
•
Types of SQL Hints
String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";
PrepareStatement query_stmt = connection.prepareStatement(query);
query_stmt.setInt(1,100);
ResultSet rs = query_stmt.executeQuery();
// fetch resultset
key_column
range
(hash result) shard_id
min max
student_no 0 63 0
student_no 64 127 1
student_no 128 191 2
student_no 192 255 3
MySQL Sharding DEMO
Requirements:
• 1GB free RAM
• 3GB free space for 2 VMs
• VirtualBox
• Vagrant
MySQL Sharding DEMO
39
https://github.com/kadishmal/cubr
id-shard-demo
CUBRID SHARD
• Easy
– No configuration hassle
– No “moving parts”
• Reliable
– High performance
– No SPOF
• Open source
– Supported by NHN
CUBRID SHARD Disadvantages
Need to alter SQL to add Hints
No Data Rebalancing
Need to carefully plan the sharding strategy in
advance.
No GUI monitoring tool. Only command line.
CUBRID SHARD is great when…
• Services are already running and stable
• But data is growing fast
• And you need a stable solution
• Quick installation and easy configuration
• Time constraints
43
Ndrive cloud storage service
• User files meta data
• Sharding strategy by user ID
• 24 master shards
– Intel(R) Xeon(R) L5640 @ 2.27GHz * 8, 16G RAM, 820G
HDD
• 10TB data
• Load pattern:
– 75~80% SELECT vs. 20~25% INSERT
– Avg. ~3000 QPS/shard
– Avg. ~5% CPU load/shard
44
Ndrive cloud storage service
• 1 SHARD BROKER
• 4 Proxies per Broker
• 50 CAS per proxy
• No performance
degradation after CUBRID
SHARD is used
45
64 128 192 256 320
Vuser
CUBRID SHARD Next
Auto-rebalancing in CUBRID SHARD
CM shard monitoring
Aggregation feature
Questions?
47
• Esen Sagynov (NHN Corp.)
– @CUBRID
– fb.com/cubrid
esen@cubrid.org

More Related Content

Viewers also liked

Proven Mobile Solutions that Increase Productivity, Visibility and Sales
Proven Mobile Solutions that Increase Productivity, Visibility and SalesProven Mobile Solutions that Increase Productivity, Visibility and Sales
Proven Mobile Solutions that Increase Productivity, Visibility and SalesStayinFront
 
Starting up - Lessons learned from the trenches
Starting up - Lessons learned from the trenchesStarting up - Lessons learned from the trenches
Starting up - Lessons learned from the trenchesIstvan Hoka
 
Enterprise Risk Management
Enterprise Risk ManagementEnterprise Risk Management
Enterprise Risk ManagementClayton Scott
 
Are you and your computer guy praying 3
Are you and your computer guy praying 3Are you and your computer guy praying 3
Are you and your computer guy praying 3Phil Hutchins
 
It's All About Me by Penny Herscher at Defrag Conference 2014
It's All About Me by Penny Herscher at Defrag Conference 2014It's All About Me by Penny Herscher at Defrag Conference 2014
It's All About Me by Penny Herscher at Defrag Conference 2014FirstRain, Inc.
 
Presentation wikipixel
Presentation wikipixelPresentation wikipixel
Presentation wikipixelWIKIPIXEL
 
Blogging: North Country Technology Symposium
Blogging: North Country Technology SymposiumBlogging: North Country Technology Symposium
Blogging: North Country Technology SymposiumJustin Groden
 
Currency matters trading platform vision v1.1
Currency matters trading platform vision v1.1Currency matters trading platform vision v1.1
Currency matters trading platform vision v1.1HBS Technologies
 

Viewers also liked (11)

MECBOT
MECBOTMECBOT
MECBOT
 
Proven Mobile Solutions that Increase Productivity, Visibility and Sales
Proven Mobile Solutions that Increase Productivity, Visibility and SalesProven Mobile Solutions that Increase Productivity, Visibility and Sales
Proven Mobile Solutions that Increase Productivity, Visibility and Sales
 
Starting up - Lessons learned from the trenches
Starting up - Lessons learned from the trenchesStarting up - Lessons learned from the trenches
Starting up - Lessons learned from the trenches
 
Enterprise Risk Management
Enterprise Risk ManagementEnterprise Risk Management
Enterprise Risk Management
 
E commerce(1)
E commerce(1)E commerce(1)
E commerce(1)
 
The Four C's
The Four C'sThe Four C's
The Four C's
 
Are you and your computer guy praying 3
Are you and your computer guy praying 3Are you and your computer guy praying 3
Are you and your computer guy praying 3
 
It's All About Me by Penny Herscher at Defrag Conference 2014
It's All About Me by Penny Herscher at Defrag Conference 2014It's All About Me by Penny Herscher at Defrag Conference 2014
It's All About Me by Penny Herscher at Defrag Conference 2014
 
Presentation wikipixel
Presentation wikipixelPresentation wikipixel
Presentation wikipixel
 
Blogging: North Country Technology Symposium
Blogging: North Country Technology SymposiumBlogging: North Country Technology Symposium
Blogging: North Country Technology Symposium
 
Currency matters trading platform vision v1.1
Currency matters trading platform vision v1.1Currency matters trading platform vision v1.1
Currency matters trading platform vision v1.1
 

More from CUBRID

CUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID
 
CUBRID Developer's Course
CUBRID Developer's CourseCUBRID Developer's Course
CUBRID Developer's CourseCUBRID
 
The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRIDCUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on WindowsCUBRID
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on LinuxCUBRID
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCUBRID
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCUBRID
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCUBRID
 

More from CUBRID (8)

CUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID Cluster Introduction
CUBRID Cluster Introduction
 
CUBRID Developer's Course
CUBRID Developer's CourseCUBRID Developer's Course
CUBRID Developer's Course
 
The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on Windows
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on Linux
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 Replication
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 Migration
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha Implementation
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

Easy MySQL Database Sharding with CUBRID SHARD - 2013 Percona

  • 1. Easy MySQL Database Sharding with CUBRID SHARD Esen Sagynov April 24, 2013
  • 2. Today 1. About NHN 2. Sharding in Production 3. Why CUBRID SHARD 4. How to shard MySQL databases 5. DEMO 6. CUBRID SHARD in Ndrive 2
  • 3. About me 3 • Esen Sagynov (NHN Corp.) – @CUBRID – fb.com/cubrid esen@cubrid.org
  • 5. Sharding in Production • Uses RDBMS with Sharding • Data is stored as simple Key-Value. • • • • • • • • • • • •
  • 6. Sharding Solutions Name Type Requirements Interface DB ETC Hibernate shards AS framework DBMS w/ Hibernate support - Hibernate - JVM Java HiveDB AS framework MySQL - Hibernate - JVM Java dbShards AS & Middleware MySQL Java, C, PHP, Python, Ruby Gizzard (Twitter) Middleware Any storage - JVM Java Spider for MySQL Middleware & Storage Engine MySQL Any Spock Proxy Middleware MySQL Any Shard-Query Middleware MySQL PHP, RESTful API CUBRID SHARD Middleware - CUBRID - MySQL - Oracle Any
  • 7. Sharding Solution Categories • Application layer • Storage layer • Heavy middleware • Lightweight middleware
  • 8. Application & Storage Layers Application Layer • Hibernate Shards • HiveDB Disadvantage • Requires Hibernate/Java • Uses many XML files for configuration • Not for running services 8 Storage Layer • Spider for MySQL Disadvantage • Requires to change storage engine • Not for running services
  • 9. Heavy Middleware dbShards Gizzard 9 • Requires to change application code • Requires agents to be installed on each DB server • Not for running services • Not active
  • 10. Lightweight Middleware • Spock Proxy – Active project – Lightweight – Flexible – Easy to configure – No application change 10
  • 11. Spock Proxy Spock Proxy Sharding rule storage Database Sharding strategy Modulo Determine Sharding Key Full SQL Parsing Strength No need to change SQL Weakness • Performance degradation: • Extra SQL parsing • Resultset merging • Not all MySQL SQL is supported • Single threaded 11 Blog post: http://www.cubrid.org/blog/dev- platform/database-sharding-platform-at-nhn/
  • 12. Spock Proxy Performance 12 • Single threaded • Parses and rewrites SQL 0 100 200 300 400 500 1 5 10 20 30 50 70 100 200 400 App Sharding Spock Proxy CUBRID SHARD Concurrent clients Exec. time
  • 13. Spock Proxy  Active project  Lightweight  Flexible  Easy to configure  No application change ✕No performance impact
  • 14. Lightweight, Easy to Configure Sharding Middleware CUBRID SHARD 14
  • 15. Spock Proxy vs. CUBRID SHARD Spock Proxy CUBRID SHARD Sharding rule storage Database Configuration file Sharding strategy Modulo • Modulo • User defined hash function Determine Sharding Key Full SQL Parsing SQL Hint Search Strength No need to change SQL • Supports CUBRID and MySQL • Full MySQL SQL support • Higher performance • No SQL parsing • Multi-threaded • Connection pooling • Load balancing • Custom sharding strategy • Easy configuration Weakness • Performance degradation: • Extra SQL parsing • Resultset merging • Supports MySQL only • Not all MySQL SQL is supported • Single threaded • Requires to change SQL queries to insert the sharding hint 15 Blog post: http://www.cubrid.org/blog/dev- platform/database-sharding-platform-at-nhn/
  • 16. CUBRID Facts  RDBMS  True Open Source @ www.cubrid.org  Optimized for Web services  High performance  Large DB support  High-Availability feature  DB Sharding support  90+% MySQL compatible SQL syntax + Oracle analytical functions  ACID Transactions  Online Backup  Supported by NHN Corporation
  • 17. CUBRID SHARD Architecture … … …… Single database view OR
  • 20. Easy Installation http://www.cubrid.org/downloads apt-get yum chef ⭐ VM EC2 AMI cloud service Doc page: http://www.cubrid.org/wiki_tutorials/entry/cubrid -installation-instructions
  • 21. Configuring is very easy and intuitive!
  • 22. Configuration Steps • Create 1. Shards 2. Database Users 3. Database Schema 4. Configure CUBRID SHARD – shard database information – backend shards connection information – sharding strategy 5. Start CUBRID SHARD 6. Change application code – connection URL – shard hint 23 CUBRID SHARD
  • 23. # 1. Create Shards • Host 1..N: $> mysql –ushard -ppassword –hnode1 mysql> CREATE DATABASE sharddb;
  • 24. # 2. Create Users • Host 1..N: $> mysql –ushard -ppassword –hnode1 mysql> USE mysql; mysql> GRANT ALL PRIVILEGES ON sharddb@localhost TO shard@localhost IDENTIFIED BY ‘shard123’ mysql> GRANT ALL PRIVILEGES ON sharddb@localhost TO shard@shardBrokerNode IDENTIFIED BY ‘shard123’
  • 25. # 3. Create same tables $> mysql –ushard -ppassword –hnode1 mysql> USE sharddb; mysql> CREATE TABLE tbl_users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT) $> mysql –ushard -ppassword –hnode2 … • Host 1..N:
  • 26. # 4. Simple Configuration • shard.conf – Main configuration file for CUBRID SHARD. • shard_connection.txt – Predefined list of shard IDs, database and host names for CUBRID/MySQL. • shard_keys.txt – A list of shard_key_columns and their mapping with shard_id Doc page: http://www.cubrid.org/manual/91/en/sha rd.html#configuration-and-setup
  • 27. shard.conf Set: 1. SHARD_DB_NAME 2. SHARD_DB_USER 3. SHARD_DB_PASSWORD 4. APPL_SERVER … SHARD_DB_NAME = sharddb SHARD_DB_USER = shard SHARD_DB_PASSWORD = shard123 APPL_SERVER = CAS_MYSQL … Doc page: http://www.cubrid.org/manual/91/en/shard.html# default-configuration-file-shard-conf
  • 28. shard_connection.txt Set: 1. Shard ID 2. Real database name 3. Remote/local host name # shard-id real-db-name connection-info 0 sharddb mysqlA:3306 1 sharddb mysqlB:3306 2 sharddb mysqlC:3306 … ** Host names must be identical to the output of hostname command of every node. Doc page: http://www.cubrid.org/manual/91/en/shard.html# setting-shard-metadata
  • 29. shard_keys.txt Set: 1. Min shard key 2. Max shard key 3. Shard ID [%student_no] # min max shard_id 0 63 0 64 127 1 128 191 2 192 255 3 ** Default sharding strategy is to apply modulo 256 (SHARD_KEY_MODULAR in shard.conf ). Doc page: http://www.cubrid.org/manual/91/en/shard.html# setting-shard-metadata
  • 30. Custom Library int fn_shard_key_udf(int type, void *val) { int mod = 2; if (val == NULL) { return ERROR_ON_ARGUMENT; } switch(type) { case SHARD_U_TYPE_INT: { int ival; ival = (int) (*(int *)val); return ival % 2; } break; case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY; default: return ERROR_ON_ARGUMENT; } return ERROR_ON_MAKE_SHARD_KEY; } shard.conf 1. SHARD_KEY_LIBRARY_NAME 2. SHARD_KEY_FUNCTION_NAME [%student_no] SHARD_KEY_LIBRARY_NAME=$CUBRID/c onf/shard_key_udf.so SHARD_KEY_FUNCTION_NAME =fn_shard_key_udf Doc page: http://www.cubrid.org/manual/91/en/shard.html# setting-user-defined-hash-function
  • 31. # 5. Start CUBRID SHARD $> cubrid shard start @ cubrid shard start ++ cubrid shard start: success
  • 32. # 6. Connection URL connectionURL = "jdbc:cubrid:localhost:45511:sharddb:shard:shard123: ?althosts=node2:port2,node3:port3 &loadBalance=true";
  • 33. Querying Shards SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; • •
  • 34. Types of SQL Hints
  • 35. String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; "; PrepareStatement query_stmt = connection.prepareStatement(query); query_stmt.setInt(1,100); ResultSet rs = query_stmt.executeQuery(); // fetch resultset key_column range (hash result) shard_id min max student_no 0 63 0 student_no 64 127 1 student_no 128 191 2 student_no 192 255 3
  • 36. MySQL Sharding DEMO Requirements: • 1GB free RAM • 3GB free space for 2 VMs • VirtualBox • Vagrant
  • 38. CUBRID SHARD • Easy – No configuration hassle – No “moving parts” • Reliable – High performance – No SPOF • Open source – Supported by NHN
  • 39. CUBRID SHARD Disadvantages Need to alter SQL to add Hints No Data Rebalancing Need to carefully plan the sharding strategy in advance. No GUI monitoring tool. Only command line.
  • 40. CUBRID SHARD is great when… • Services are already running and stable • But data is growing fast • And you need a stable solution • Quick installation and easy configuration • Time constraints 43
  • 41. Ndrive cloud storage service • User files meta data • Sharding strategy by user ID • 24 master shards – Intel(R) Xeon(R) L5640 @ 2.27GHz * 8, 16G RAM, 820G HDD • 10TB data • Load pattern: – 75~80% SELECT vs. 20~25% INSERT – Avg. ~3000 QPS/shard – Avg. ~5% CPU load/shard 44
  • 42. Ndrive cloud storage service • 1 SHARD BROKER • 4 Proxies per Broker • 50 CAS per proxy • No performance degradation after CUBRID SHARD is used 45 64 128 192 256 320 Vuser
  • 43. CUBRID SHARD Next Auto-rebalancing in CUBRID SHARD CM shard monitoring Aggregation feature
  • 44. Questions? 47 • Esen Sagynov (NHN Corp.) – @CUBRID – fb.com/cubrid esen@cubrid.org

Editor's Notes

  1. To keep this presentation simple, I will focus on three main things.
  2. Ask your questions as I'm going through presentation.Hold off with longer questions to the endDo not hesitate to talk to me during conferenceFollowup by email.
  3. We have researched how other companies manage big data. They still rely on relational databases and manage their data through data sharding. At NHN eventually we do the same.
  4. These are the existing sharding solutions.
  5. - Talking about the open source RDBMS solutions, MySQL doesn’t provide database sharding out of the box.- Google had to significantly change MySQL replication to make it work similarly. But at the time Sun, the former owner of MySQL didn’t accept Google’s changes, resulting in a fork form mainstream without mainstream support.Twitter has recently opened their MySQL fork.http://www.oracle.com/technetwork/database/features/availability/300461-132370.pdf
  6. Spider for MySQL requires to change the storage engine. It’s not an option. We have running service and don’t want to change anything.
  7. Application-aware architecture of dbShards is explained here http://www.dbshards.com/blog/2010/09/black-box-vs-application-aware-sharding/.
  8. NHN has many large services in production. We don’t want any middleware that we need to add to affect the performance. So this was a critical requirement so that the sharding middleware shouldn’t decrease the overall performance of the service.
  9. Spock Proxy and CUBRID SHARD have somewhat similar architecture. Both are lightweight and flexible.
  10. No additional SQL parsing because of HINT.