Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

Growing in the Wild. The story by
CUBRID Database Developers.
Esen Sagynov (@CUBRID),
NHN Corporation
Service Platform Development Center
Monday, April 2, 2012
Eugen Stoianovici,
NHN Corporation
CUBRID Development Lab

Who are we?
• Eugen Stoianovici
– CUBRID Engine Team
– eugen.stoianovici@cubrid.org
• Esen Sagynov @CUBRID
– CUBRID Project Manager
– esen.sagynov@nhn.com

Purpose of this presentation
This is what I remember from every presentation that
I’ve attended. Not the details.
1. “Some guys talked about some cool stuff they
encountered in applications (don't remember what)”
2. “There's a database that they use for this type of
applications, it's open source and saves from a lot of
trouble (don't remember what trouble exactly).”
3. “They're really keen on doing things right.”

You will learn…
Reasons behind CUBRID development.
What CUBRID has to offe
r. Benefits & advantages.
What we have learnt so fa
r. Where we are heading t
o.

CUBRID Facts
 RDBMS
 True Open Source @ www.cubrid.org
 Optimized for Web services
 High performance 3-tier architecture
 Large DB support
 High-Availability feature
 DB Sharding support
 MySQL compatible SQL syntax
 ACID Transactions
 Online Backup

Reasons Behind CUBRID Development

150+
Web Services
USA
Korea
Japan
China

150+
Web Services
Korea Japan
USA
USA
Korea
Korea Japan
iOS & Android
Japan
MySQL
Oracle, MySQL,
CUBRID
MySQL
NoSQL
Oracle,
MSSQL
MSSQL,
Oracle, MySQL
CUBRID
Monitoring &
Logging
System

Disadvantages of existing solutions
1. High License Cost
1. Over 10,000 servers @ NHN
2. Third-party solution
1. No ownership of the code base
2. Additional $$$ for customizations
3. Branch tech support is not enough
4. Communication barriers w/ vendors
5. Slow updates & fixes

Fork or Start from Scratch?
• No full ownership
• Time to learn the
code base
• Fixed architecture
• Understand the
design philosophy
• Full ownership
• Time to develop
• Custom more
advanced
architecture and
design

Benefits of in-house solution
1. High License Cost
1. Over 10,000 servers
@ NHN
2. Third-party solution
1. No ownership of the
code base
2. Additional $$$ for
customizations
3. Communication
barriers w/ vendors
4. Slow updates &
fixes
1. No License Cost
2. Core Technological Asset
1. Complete control of the code base
2. No additional $$$ for customizations
3. No communication barriers
4. Fast updates & fixes
3. Key Storage Technology Skills
1. Grow our developers
2. Export developers
4. New Database Solution Service
1. Provide CUBRID service to other
platforms
2. Instant reaction to customer issues
5. Recurring Key Technology
1. High-Availability
2. Sharding
3. Rebalancing
4. Cluster
5. etc.

CUBRID
Stability Performance
Scalability Ease of Use
Goal
• Human vs. DB Errors
• # of customers
• Smart Index Optimizations
• Shared Query Caching
• Web Optimized Features
• Load Balancer
• High-Availability w/ auto fail-over
• Sharding
• Data Rebalancer
• Cluster
• SQL & API Compatibility
• Native Migration Tool
• Native GUI DB Management Tools
• Monitoring Tools

Client
Requests
Performance UP!
Types of
Web
Services
Main operations Example
READ > 95% News, Wiki, Blog, etc.
READ:WRITE = 70:30% SNS, Push services, etc.
WRITE > 90% Log monitoring, Analytics.
90% of
Web
Services
CRUD WHY?
SELECT Fast searching, avoid sequential scan and ORDER BY
INSERT Concurrent WRITE performance, reduce I/O, and
Fast searching
UPDATE Fast searching, improve lock mechanism
DELET
E
Fast searching
How &
What to
improve

Phase 1
v1.0 ~ 2.0
Phase 2
v8.2.2
Phase 3
v8.4.0
Phase 4
v8.4.1
Phase 5
Apricot
Phase 6
Banana
SELECT
Performance
+
INSERT &
DELETE
Performance
+
SELECT
Performance
++
INSERT &
UPDATE
Performance
++
INSERT
Performance
+++
SELECT
Performance
++++
Shared Quer
y Plan Cachi
ng
Space
Reusability
Improvement
Covering
Index,
Key limit, etc.
Memory
Buffer Mgmt.
Improvement
s
Filter index,
Skip index,
etc.
Optimize
JOINs
DB & Index
Volume
Optimization
s
API
Performance
+
Windows
Performance
+
TPS 15% 10% 270% 70%
Smart Indexing
MySQL SELECT
performance
CUBRID SELECT
performance
< MySQL INSERT
performance
CUBRID INSERT
performance
<

CREATE TABLE forum_posts(
user_id INTEGER,
post_moment INTEGER,
post_text VARCHAR(64)
);
INDEX i_forum_posts_post_moment ON forum_posts (post_moment);
INDEX i_forum_posts_post_moment_user_id
ON forum_posts (post_moment, user_id);
Random INSERT Performance
SELECT username FROM users WHERE id = ?;
INSERT INTO forum_posts(user_id, post_moment, post_text)
VALUES (?, ?, ?);
UPDATE users SET last_posted = ? WHERE id = ?;
CREATE TABLE users(
id INTEGER UNIQUE,
username VARCHAR(255),
last_posted INTEGER,
);

• Users
– 100,000 rows prepopulated
• Test
– CUBRID vNext (code name Apricot)
– MySQL 5.5.21
– 40 workers
– 1 hour
– Record QPS every 2 minutes

0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Queries
per
second
CUBRID QPS decrease with DataSet size
Average = 3685
Max = 4469
Min = 2821

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0
1,074,219
1,769,130
2,231,016
2,533,965
2,797,236
3,033,198
3,225,948
3,399,681
3,568,563
3,723,471
3,873,873
4,015,635
4,157,433
4,289,112
4,432,938
4,570,920
4,706,523
4,838,079
4,978,152
5,118,651
5,270,694
5,419,056
5,546,517
5,675,619
5,809,068
5,941,296
6,073,431
6,201,138
6,334,749
Queries
per
second
MySQL QPS decrease with DataSet size
Average = 1796
Max = 8951
Min = 1122

0
2000
4000
6000
8000
10000
12000
Queries
per
second
CUBRID vs MySQL QPS decrease with DataSet size
CUBRID QPS
MySQL QPS

CUBRID Optimizations
Index Features
Reverse Index
Prefix Index
Function Index
Filter Index
Unique Index
Primary Key
Foreign Key
Query Features
Multi-range key
limit
Index skip scan
Skip order by
Skip group by
Range Scan
optimizations
Query rewrites
Covering Index
Descending
Index
Server level
optimizations
Log
compression
Shared Query
Plan cache
Locking
Optimizations
Transaction
concurrency

Filter Index
• Interesting (open) tickets fit into a very small index.
• No overhead for INSERT/UPDATE
• Very fast results for open tickets
CREATE INDEX ON tickets(component, assignee)
WHERE status = ‘open’;
SELECT title, component, assignee FROM users
WHERE register_date > ‘2008-01-01’ AND status = ‘open’;

QPS Filter vs. Full index
0
1000
2000
3000
4000
5000
6000
7000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
5,500,000
6,000,000
6,500,000
7,000,000
7,500,000
8,000,000
8,500,000
9,000,000
9,500,000
10,000,000
Queries
per
second
QPS Full Index
QPS Filter Index

CUBRID Architecture
API
CCI, JDBC, ADO.NET, OLEDB, ODBC,
PHP, Perl, Python, Ruby
Broker
Query Parser Query Optimizer
Query Planer
Server
Query Manager Query Executor
Transaction
Manager
Lock Manager Log Manager
Storage Manager
File Manager
CUBRID

Parameterized Queries & Filter Index
• Will not use partial index
PostgreSQL
• Provides workaround
MS SQL Server
• Less flexible, has to be the exact
expression
ORACLE
• “Shared” Query Plan Cache
CUBRID
WHERE register_date > ? AND status = ?;
SELECT name, email FROM users
WHERE register_date > ? AND age < ? AND age < 18;

Query Plan Cache
• Cache a plan for the lifespan of a
driver level prepared statement
PostgreSQL
• No query plan cache
MySQL
• “Shared” Query Plan Cache
CUBRID

Query Plan Cache
Parse SQL
Name
Resolving
Semantic check
Query
Optimize
Query Plan
Query
Execution
Query Execution
without Plan Cache
Parse SQL
Get Cached Plan
Query Execution
Query Execution with
Plan Cache

Auto Parameterization
WHERE register_date > ‘2008-01-01’ AND status = ‘open’;
WHERE register_date > ? AND status = ?;

Scalability challenges
• How to synchronize?
– Async
• Load balancing?
– Third-party solution
• Who handles Fail-over?
– Application
– Third-party solution
• Cost?

HA solutions
DBMS Cost Disk-shared Replication
Consistenc
y
Auto-
Failover
Oracle RAC +++++
Shared
everything
N/A N/A O
MS-SQL
Cluster
+++
Shared
everything
N/A N/A O
MySQL
Cluster
++
Shared
nothing
Log Based
Async
Sync
O
MySQL
Replication
+ Third-party
Free
Shared
nothing
Statement
Based
Async O
CUBRID Free
Shared
nothing
Log Based
Sync
Semi-sync
Async
O

Client
Requests
1. Non-stop 24/7 service uptime
2. No missing data between nodes
Phase 1
v8.1.0
Phase 2
v8.2.x
Phase 4
v8.3.x
Phase 5
v8.4.x
Phase 6
Apricot
Replicatio
n
HA
Support
Extended
HA
features
HA
Monitoring
+
Easy Admin
Scripts
Async Auto
Fail-over
HA Status
Monitoring
HA
Performance
+
Reduce
Replication
Delay Time
CUBRID
Heartbeat
HA +
Replica
Admin
Scripts
Read-Write Service
during DB maintenanc
e
Async,
Semi-sync, Sy
nc
Broker Modes
(RW, RO)

N:N Master:Slave
http://www.cubrid.org/cubrid_ha_oscon
1:1 M:S
1:N M:S
1:1:N M:S:R
N:N M:S
N:1 M:S

CUBRID HA: Benefits
• Non-stop maintenance
• Auto Fail-over
• Large Installations are Easy
• Load balancing
• Accurate and reliable Failure detection
• Various Master-Slave Configurations:
– 3 replication modes
– 3 broker modes

Database Sharding
• Partitioning
Divide the data between
multiple tables within one
Database Instance
• Sharding
Divide the data between
multiple tables created in
separate Database Instances
DB
X Y Z
DB
X
DB
Y
DB
Z
Shard

Without Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4

With Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4

CUBRID SHARD
Phase 1
Apricot
Phase 2
Banana
Unlimited
Shards
Data
Rebalancing
Multiple
Shard ID Gen. Algorith
m
Connection & Stateme
nt
Pooling
Load Balancing
HA Support
CUBRID, MySQL, Ora
cle Support







Sharding: Benefits
• Developer friendly
– Single database view
– No more application logic
– No application changes
• Multiple sharding strategies
• Native scale-out support
• Load balancing
• Support for heterogeneous databases

Phase 1
v.8.2.x
Phase 2
v.8.3.x
Phase 4
v8.4.x
Phase 6
Apricot
Oracle MySQL MySQL MySQL,
Oracle
Hierarchical
Query
SQL: 60
+
PHP: 20
+
SQL: 70+
PHP: 20+
Currency
SQL
LOB,
API++
Implicit Typ
e
Conversion
+
Usability
+
Usability+++
RegExpr
MSSQL win-ba
ck
MySQL, Oracle win-bac
k:
Monitoring system
Oracle: Ad
s,
Shopping
Client
Requests
SQL Compatibility
> 90% MySQL SQL Compatibility

Client
Requests
1. API Support
2. Ease of Migration
3. Usability
Phase
1
v.8.1.x
Phase 2
v.8.3.x
Phase 3
v.8.4.x
Phase 3
Apricot
CM CM, CQB, C
MT
CUNITO
R
Web manag
er
CM
Monitoring
++
Phase 1
v.8.1.x
Phase 2
v.8.2.x
Phase 3
v.8.3.x
Phase 4
v.8.4.x
CCI, JDBC, OL
EDB
PHP, Python, Ru
by
ODBC Perl, ADO.N
ET

MSSQL Win-Back in 2010
Dual
Read/Writer
MS SQL
Application
CUBRID
Read
Write
[Step1] Dual Write
Dual
Read/Writer
MS SQL
Application
CUBRID
Read
Write
[Step2] Dual Write and Read
Application
CUBRID
Read
Write
[Step3] Win-back Complete
• 16 Master/Slave servers and 1 Archive server
• DB size:
 0.4~0.5 billion/DB, Total 4 billion records
 Total 3.2 TB
 Total 4,000 ~ 5,000 QPS
• Save money for MSSQL License and SAN Storage

ORACLE
Enterprise CUBRID
ORACLE
Standard
ORACLE
Standard
ORACLE
Standard
ORACLE
Standard
CUBRID
CUBRID
CUBRID
CUBRID
40 servers
25 servers
• DB size:
 1.5 ~ 2.0 TB/DB, Total 40 TB
 10~100K Inserts per second
• Save money for Oracle License and SAN Storage
1 server
Oracle Win-Back in 2011
System Monitoring
Service

What we have learnt so far and Where we
are heading to?

What we have learnt so far
• Not easy to break users’ habits.
• Need time.
• Technical support is the key to
acceptance!
• Some services don’t deserve Oracle.

CUBRID Deployment in NHN
42 50
60
69
77
82
94
100
107
117
166
181
208
259
273 283
312
326
346
500
0
100
200
300
400
500
0
20
40
60
80
100
120
140
~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q
∑ services ∑ deployments

CUBRID
 Stability  Performance
 Scalability  Ease of Use
Achievements
• Human vs. DB Errors
• # of customers
• Smart Index Optimizations
• Shared Query Caching
• Web Optimized Features
• Load Balancer
• High-Availability w/ auto fail-over
• Sharding
• Data Rebalancer
• Cluster
• > 90% MySQL SQL Compatibility
• Native Migration Tool
• Native GUI DB Management Tools
• Monitoring Tools

CUBRID Roadmap
8.4.x
 Performance++
Covering index,
Key limit, Range sc
an
 SQL Compatibility+
70+ new syntax
 HA++
Monitoring tools
 I18N, L10N
2~3 European charse
ts
 SQL Compatibility++
Cursor holdability,
Mass table UPDATE
&
DELETE
 I18N, L10N+
more charsets
 Performance+++
 SQL monitoring performance+
 SQL Compatibility+++
 Table Partitioning Improvements
 DB SHARDING+
 Performance++++
 CURBID Lite
 SQL Compatibility++++
 DB Monitoring
Improvements
 Arcus Caching Integrati
on

CUBRID is Big now.
What can you do?
1. Keep watching it
2. Consider using
3. Discuss, talk, write about CUBRID
4. Support CUBRID in your apps
5. Contribute to CUBRID
6. Provide CUBRID service

esen.sagynov@nhn.com
kadishmal@gmail.com
eugen.stoianovici@arnia.ro
www.cubrid.org
www.facebook.com/cubrid www.twitter.com/cubrid

. . .
• How do CUBRID developers cope with
stress?
– Join MySQL issue tracker ;)
• Want more?
– Follow us to the next room. We’ll have more
discussions!

Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

Similaire à Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

Similaire à Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic) (20)

Plus de Ontico

Plus de Ontico (20)

Dernier

Dernier (20)

Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

Notes de l'éditeur