SlideShare une entreprise Scribd logo
1  sur  53
Growing in the Wild. The story by
CUBRID Database Developers.
Esen Sagynov (@CUBRID),
NHN Corporation
Service Platform Development Center
Monday, April 2, 2012
Eugen Stoianovici,
NHN Corporation
CUBRID Development Lab
Who are we?
• Eugen Stoianovici
– CUBRID Engine Team
– eugen.stoianovici@cubrid.org
• Esen Sagynov @CUBRID
– CUBRID Project Manager
– esen.sagynov@nhn.com
Purpose of this presentation
This is what I remember from every presentation that
I’ve attended. Not the details.
1. “Some guys talked about some cool stuff they
encountered in applications (don't remember what)”
2. “There's a database that they use for this type of
applications, it's open source and saves from a lot of
trouble (don't remember what trouble exactly).”
3. “They're really keen on doing things right.”
You will learn…
Reasons behind CUBRID development.
What CUBRID has to offe
r. Benefits & advantages.
What we have learnt so fa
r. Where we are heading t
o.
CUBRID Facts
 RDBMS
 True Open Source @ www.cubrid.org
 Optimized for Web services
 High performance 3-tier architecture
 Large DB support
 High-Availability feature
 DB Sharding support
 MySQL compatible SQL syntax
 ACID Transactions
 Online Backup
Reasons Behind CUBRID Development
150+
Web Services
USA
Korea
Japan
China
150+
Web Services
Korea Japan
USA
USA
Korea
Korea Japan
iOS & Android
Japan
MySQL
Oracle, MySQL,
CUBRID
MySQL
NoSQL
Oracle,
MSSQL
MSSQL,
Oracle, MySQL
CUBRID
Monitoring &
Logging
System
Disadvantages of existing solutions
1. High License Cost
1. Over 10,000 servers @ NHN
2. Third-party solution
1. No ownership of the code base
2. Additional $$$ for customizations
3. Branch tech support is not enough
4. Communication barriers w/ vendors
5. Slow updates & fixes
Fork or Start from Scratch?
• No full ownership
• Time to learn the
code base
• Fixed architecture
• Understand the
design philosophy
• Full ownership
• Time to develop
• Custom more
advanced
architecture and
design
Benefits of in-house solution
1. High License Cost
1. Over 10,000 servers
@ NHN
2. Third-party solution
1. No ownership of the
code base
2. Additional $$$ for
customizations
3. Communication
barriers w/ vendors
4. Slow updates &
fixes
1. No License Cost
2. Core Technological Asset
1. Complete control of the code base
2. No additional $$$ for customizations
3. No communication barriers
4. Fast updates & fixes
3. Key Storage Technology Skills
1. Grow our developers
2. Export developers
4. New Database Solution Service
1. Provide CUBRID service to other
platforms
2. Instant reaction to customer issues
5. Recurring Key Technology
1. High-Availability
2. Sharding
3. Rebalancing
4. Cluster
5. etc.
CUBRID
Stability Performance
Scalability Ease of Use
Goal
• Human vs. DB Errors
• # of customers
• Smart Index Optimizations
• Shared Query Caching
• Web Optimized Features
• Load Balancer
• High-Availability w/ auto fail-over
• Sharding
• Data Rebalancer
• Cluster
• SQL & API Compatibility
• Native Migration Tool
• Native GUI DB Management Tools
• Monitoring Tools
#1
Performance
Client
Requests
Performance UP!
Types of
Web
Services
Main operations Example
READ > 95% News, Wiki, Blog, etc.
READ:WRITE = 70:30% SNS, Push services, etc.
WRITE > 90% Log monitoring, Analytics.
90% of
Web
Services
CRUD WHY?
SELECT Fast searching, avoid sequential scan and ORDER BY
INSERT Concurrent WRITE performance, reduce I/O, and
Fast searching
UPDATE Fast searching, improve lock mechanism
DELET
E
Fast searching
How &
What to
improve
Phase 1
v1.0 ~ 2.0
Phase 2
v8.2.2
Phase 3
v8.4.0
Phase 4
v8.4.1
Phase 5
Apricot
Phase 6
Banana
SELECT
Performance
+
INSERT &
DELETE
Performance
+
SELECT
Performance
++
INSERT &
UPDATE
Performance
++
INSERT
Performance
+++
SELECT
Performance
++++
Shared Quer
y Plan Cachi
ng
Space
Reusability
Improvement
Covering
Index,
Key limit, etc.
Memory
Buffer Mgmt.
Improvement
s
Filter index,
Skip index,
etc.
Optimize
JOINs
DB & Index
Volume
Optimization
s
API
Performance
+
Windows
Performance
+
TPS 15% 10% 270% 70%
Smart Indexing
MySQL SELECT
performance
CUBRID SELECT
performance
< MySQL INSERT
performance
CUBRID INSERT
performance
<
CREATE TABLE forum_posts(
user_id INTEGER,
post_moment INTEGER,
post_text VARCHAR(64)
);
INDEX i_forum_posts_post_moment ON forum_posts (post_moment);
INDEX i_forum_posts_post_moment_user_id
ON forum_posts (post_moment, user_id);
Random INSERT Performance
SELECT username FROM users WHERE id = ?;
INSERT INTO forum_posts(user_id, post_moment, post_text)
VALUES (?, ?, ?);
UPDATE users SET last_posted = ? WHERE id = ?;
CREATE TABLE users(
id INTEGER UNIQUE,
username VARCHAR(255),
last_posted INTEGER,
);
Random INSERT Performance
• Users
– 100,000 rows prepopulated
• Test
– CUBRID vNext (code name Apricot)
– MySQL 5.5.21
– 40 workers
– 1 hour
– Record QPS every 2 minutes
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Queries
per
second
CUBRID QPS decrease with DataSet size
Random INSERT Performance
Average = 3685
Max = 4469
Min = 2821
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0
1,074,219
1,769,130
2,231,016
2,533,965
2,797,236
3,033,198
3,225,948
3,399,681
3,568,563
3,723,471
3,873,873
4,015,635
4,157,433
4,289,112
4,432,938
4,570,920
4,706,523
4,838,079
4,978,152
5,118,651
5,270,694
5,419,056
5,546,517
5,675,619
5,809,068
5,941,296
6,073,431
6,201,138
6,334,749
Queries
per
second
MySQL QPS decrease with DataSet size
Random INSERT Performance
Average = 1796
Max = 8951
Min = 1122
Random INSERT Performance
0
2000
4000
6000
8000
10000
12000
Queries
per
second
CUBRID vs MySQL QPS decrease with DataSet size
CUBRID QPS
MySQL QPS
CUBRID Optimizations
Index Features
Reverse Index
Prefix Index
Function Index
Filter Index
Unique Index
Primary Key
Foreign Key
Query Features
Multi-range key
limit
Index skip scan
Skip order by
Skip group by
Range Scan
optimizations
Query rewrites
Covering Index
Descending
Index
Server level
optimizations
Log
compression
Shared Query
Plan cache
Locking
Optimizations
Transaction
concurrency
Filter Index
• Interesting (open) tickets fit into a very small index.
• No overhead for INSERT/UPDATE
• Very fast results for open tickets
CREATE INDEX ON tickets(component, assignee)
WHERE status = ‘open’;
SELECT title, component, assignee FROM users
WHERE register_date > ‘2008-01-01’ AND status = ‘open’;
QPS Filter vs. Full index
0
1000
2000
3000
4000
5000
6000
7000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
5,500,000
6,000,000
6,500,000
7,000,000
7,500,000
8,000,000
8,500,000
9,000,000
9,500,000
10,000,000
Queries
per
second
QPS Full Index
QPS Filter Index
CUBRID Architecture
API
CCI, JDBC, ADO.NET, OLEDB, ODBC,
PHP, Perl, Python, Ruby
Broker
Query Parser Query Optimizer
Query Planer
Server
Query Manager Query Executor
Transaction
Manager
Lock Manager Log Manager
Storage Manager
File Manager
CUBRID
Parameterized Queries & Filter Index
• Will not use partial index
PostgreSQL
• Provides workaround
MS SQL Server
• Less flexible, has to be the exact
expression
ORACLE
• “Shared” Query Plan Cache
CUBRID
SELECT title, component, assignee FROM users
WHERE register_date > ? AND status = ?;
SELECT name, email FROM users
WHERE register_date > ? AND age < ? AND age < 18;
Query Plan Cache
• Cache a plan for the lifespan of a
driver level prepared statement
PostgreSQL
• No query plan cache
MySQL
• “Shared” Query Plan Cache
CUBRID
Query Plan Cache
Parse SQL
Name
Resolving
Semantic check
Query
Optimize
Query Plan
Query
Execution
Query Execution
without Plan Cache
Parse SQL
Get Cached Plan
Query Execution
Query Execution with
Plan Cache
Auto Parameterization
SELECT title, component, assignee FROM users
WHERE register_date > ‘2008-01-01’ AND status = ‘open’;
SELECT title, component, assignee FROM users
WHERE register_date > ? AND status = ?;
#2
Scalability
Scalability challenges
• How to synchronize?
– Async
• Load balancing?
– Third-party solution
• Who handles Fail-over?
– Application
– Third-party solution
• Cost?
HA solutions
DBMS Cost Disk-shared Replication
Consistenc
y
Auto-
Failover
Oracle RAC +++++
Shared
everything
N/A N/A O
MS-SQL
Cluster
+++
Shared
everything
N/A N/A O
MySQL
Cluster
++
Shared
nothing
Log Based
Async
Sync
O
MySQL
Replication
+ Third-party
Free
Shared
nothing
Statement
Based
Async O
CUBRID Free
Shared
nothing
Log Based
Sync
Semi-sync
Async
O
Client
Requests
1. Non-stop 24/7 service uptime
2. No missing data between nodes
Phase 1
v8.1.0
Phase 2
v8.2.x
Phase 4
v8.3.x
Phase 5
v8.4.x
Phase 6
Apricot
Replicatio
n
HA
Support
Extended
HA
features
HA
Monitoring
+
Easy Admin
Scripts
Async Auto
Fail-over
HA Status
Monitoring
HA
Performance
+
Reduce
Replication
Delay Time
CUBRID
Heartbeat
HA +
Replica
Admin
Scripts
Read-Write Service
during DB maintenanc
e
Async,
Semi-sync, Sy
nc
Broker Modes
(RW, RO)
N:N Master:Slave
http://www.cubrid.org/cubrid_ha_oscon
1:1 M:S
1:N M:S
1:1:N M:S:R
N:N M:S
N:1 M:S
CUBRID HA: Benefits
• Non-stop maintenance
• Auto Fail-over
• Large Installations are Easy
• Load balancing
• Accurate and reliable Failure detection
• Various Master-Slave Configurations:
– 3 replication modes
– 3 broker modes
Database Sharding
• Partitioning
Divide the data between
multiple tables within one
Database Instance
• Sharding
Divide the data between
multiple tables created in
separate Database Instances
DB
X Y Z
DB
X
DB
Y
DB
Z
Shard
Without Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4
With Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4
CUBRID SHARD
Phase 1
Apricot
Phase 2
Banana
Unlimited
Shards
Data
Rebalancing
Multiple
Shard ID Gen. Algorith
m
Connection & Stateme
nt
Pooling
Load Balancing
HA Support
CUBRID, MySQL, Ora
cle Support






Sharding: Benefits
• Developer friendly
– Single database view
– No more application logic
– No application changes
• Multiple sharding strategies
• Native scale-out support
• Load balancing
• Support for heterogeneous databases
#3
Ease of Use
Phase 1
v.8.2.x
Phase 2
v.8.3.x
Phase 4
v8.4.x
Phase 6
Apricot
Oracle MySQL MySQL MySQL,
Oracle
Hierarchical
Query
SQL: 60
+
PHP: 20
+
SQL: 70+
PHP: 20+
Currency
SQL
LOB,
API++
Implicit Typ
e
Conversion
+
Usability
+
Usability+++
RegExpr
MSSQL win-ba
ck
MySQL, Oracle win-bac
k:
Monitoring system
Oracle: Ad
s,
Shopping
Client
Requests
SQL Compatibility
> 90% MySQL SQL Compatibility
Client
Requests
1. API Support
2. Ease of Migration
3. Usability
Phase
1
v.8.1.x
Phase 2
v.8.3.x
Phase 3
v.8.4.x
Phase 3
Apricot
CM CM, CQB, C
MT
CUNITO
R
Web manag
er
CM
Monitoring
++
Phase 1
v.8.1.x
Phase 2
v.8.2.x
Phase 3
v.8.3.x
Phase 4
v.8.4.x
CCI, JDBC, OL
EDB
PHP, Python, Ru
by
ODBC Perl, ADO.N
ET
MSSQL Win-Back in 2010
Dual
Read/Writer
MS SQL
Application
CUBRID
Read
Write
[Step1] Dual Write
Dual
Read/Writer
MS SQL
Application
CUBRID
Read
Write
[Step2] Dual Write and Read
Application
CUBRID
Read
Write
[Step3] Win-back Complete
• 16 Master/Slave servers and 1 Archive server
• DB size:
 0.4~0.5 billion/DB, Total 4 billion records
 Total 3.2 TB
 Total 4,000 ~ 5,000 QPS
• Save money for MSSQL License and SAN Storage
ORACLE
Enterprise CUBRID
ORACLE
Standard
ORACLE
Standard
ORACLE
Standard
ORACLE
Standard
CUBRID
CUBRID
CUBRID
CUBRID
40 servers
25 servers
• DB size:
 1.5 ~ 2.0 TB/DB, Total 40 TB
 10~100K Inserts per second
• Save money for Oracle License and SAN Storage
1 server
Oracle Win-Back in 2011
System Monitoring
Service
What we have learnt so far and Where we
are heading to?
What we have learnt so far
• Not easy to break users’ habits.
• Need time.
• Technical support is the key to
acceptance!
• Some services don’t deserve Oracle.
CUBRID Deployment in NHN
42 50
60
69
77
82
94
100
107
117
166
181
208
259
273 283
312
326
346
500
0
100
200
300
400
500
0
20
40
60
80
100
120
140
~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q
∑ services ∑ deployments
CUBRID
 Stability  Performance
 Scalability  Ease of Use
Achievements
• Human vs. DB Errors
• # of customers
• Smart Index Optimizations
• Shared Query Caching
• Web Optimized Features
• Load Balancer
• High-Availability w/ auto fail-over
• Sharding
• Data Rebalancer
• Cluster
• > 90% MySQL SQL Compatibility
• Native Migration Tool
• Native GUI DB Management Tools
• Monitoring Tools
CUBRID Roadmap
8.4.x
 Performance++
Covering index,
Key limit, Range sc
an
 SQL Compatibility+
70+ new syntax
 HA++
Monitoring tools
 I18N, L10N
2~3 European charse
ts
 SQL Compatibility++
Cursor holdability,
Mass table UPDATE
&
DELETE
 I18N, L10N+
more charsets
 Performance+++
 SQL monitoring performance+
 SQL Compatibility+++
 Table Partitioning Improvements
 DB SHARDING+
 Performance++++
 CURBID Lite
 SQL Compatibility++++
 DB Monitoring
Improvements
 Arcus Caching Integrati
on
CUBRID is Big now.
What can you do?
1. Keep watching it
2. Consider using
3. Discuss, talk, write about CUBRID
4. Support CUBRID in your apps
5. Contribute to CUBRID
6. Provide CUBRID service
esen.sagynov@nhn.com
kadishmal@gmail.com
eugen.stoianovici@arnia.ro
www.cubrid.org
www.facebook.com/cubrid www.twitter.com/cubrid
. . .
• How do CUBRID developers cope with
stress?
– Join MySQL issue tracker ;)
• Want more?
– Follow us to the next room. We’ll have more
discussions!

Contenu connexe

Tendances

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentspil-engineering
 
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...
Cassandra nyc 2011   ilya maykov - ooyala - scaling video analytics with apac...Cassandra nyc 2011   ilya maykov - ooyala - scaling video analytics with apac...
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...ivmaykov
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Databricks
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentationadvaitdeo
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
 
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShellScott Sutherland
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with ClojureJohn Stevenson
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSMongoDB
 
Testing Big Data in AWS - Sept 2021
Testing Big Data in AWS - Sept 2021Testing Big Data in AWS - Sept 2021
Testing Big Data in AWS - Sept 2021Michael98364
 
Tango Database & MySQL Cluster
Tango Database & MySQL ClusterTango Database & MySQL Cluster
Tango Database & MySQL Clusterelliando dias
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Databricks
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangDatabricks
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanDatabricks
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogVadim Semenov
 

Tendances (16)

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environment
 
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...
Cassandra nyc 2011   ilya maykov - ooyala - scaling video analytics with apac...Cassandra nyc 2011   ilya maykov - ooyala - scaling video analytics with apac...
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with Clojure
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWS
 
Testing Big Data in AWS - Sept 2021
Testing Big Data in AWS - Sept 2021Testing Big Data in AWS - Sept 2021
Testing Big Data in AWS - Sept 2021
 
Tango Database & MySQL Cluster
Tango Database & MySQL ClusterTango Database & MySQL Cluster
Tango Database & MySQL Cluster
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 
Dynamo db
Dynamo dbDynamo db
Dynamo db
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 

Similaire à Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.CUBRID
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talkKrishna Gade
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven ProductsDataWorks Summit
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architectureRiccardo Perico
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
 
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編Miho Yamamoto
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!Teamstudio
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analyticsDavid Barbarin
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analyticsGUSS
 

Similaire à Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic) (20)

Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM Metrics
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecture
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics
 

Plus de Ontico

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...Ontico
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Ontico
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Ontico
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Ontico
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Ontico
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Ontico
 

Plus de Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Growing in the wild. The story by cubrid database developers (Esen Sagynov, Eugene Stoyanovic)

  • 1. Growing in the Wild. The story by CUBRID Database Developers. Esen Sagynov (@CUBRID), NHN Corporation Service Platform Development Center Monday, April 2, 2012 Eugen Stoianovici, NHN Corporation CUBRID Development Lab
  • 2. Who are we? • Eugen Stoianovici – CUBRID Engine Team – eugen.stoianovici@cubrid.org • Esen Sagynov @CUBRID – CUBRID Project Manager – esen.sagynov@nhn.com
  • 3. Purpose of this presentation This is what I remember from every presentation that I’ve attended. Not the details. 1. “Some guys talked about some cool stuff they encountered in applications (don't remember what)” 2. “There's a database that they use for this type of applications, it's open source and saves from a lot of trouble (don't remember what trouble exactly).” 3. “They're really keen on doing things right.”
  • 4. You will learn… Reasons behind CUBRID development. What CUBRID has to offe r. Benefits & advantages. What we have learnt so fa r. Where we are heading t o.
  • 5. CUBRID Facts  RDBMS  True Open Source @ www.cubrid.org  Optimized for Web services  High performance 3-tier architecture  Large DB support  High-Availability feature  DB Sharding support  MySQL compatible SQL syntax  ACID Transactions  Online Backup
  • 6. Reasons Behind CUBRID Development
  • 7.
  • 9. 150+ Web Services Korea Japan USA USA Korea Korea Japan iOS & Android Japan MySQL Oracle, MySQL, CUBRID MySQL NoSQL Oracle, MSSQL MSSQL, Oracle, MySQL CUBRID Monitoring & Logging System
  • 10. Disadvantages of existing solutions 1. High License Cost 1. Over 10,000 servers @ NHN 2. Third-party solution 1. No ownership of the code base 2. Additional $$$ for customizations 3. Branch tech support is not enough 4. Communication barriers w/ vendors 5. Slow updates & fixes
  • 11. Fork or Start from Scratch? • No full ownership • Time to learn the code base • Fixed architecture • Understand the design philosophy • Full ownership • Time to develop • Custom more advanced architecture and design
  • 12. Benefits of in-house solution 1. High License Cost 1. Over 10,000 servers @ NHN 2. Third-party solution 1. No ownership of the code base 2. Additional $$$ for customizations 3. Communication barriers w/ vendors 4. Slow updates & fixes 1. No License Cost 2. Core Technological Asset 1. Complete control of the code base 2. No additional $$$ for customizations 3. No communication barriers 4. Fast updates & fixes 3. Key Storage Technology Skills 1. Grow our developers 2. Export developers 4. New Database Solution Service 1. Provide CUBRID service to other platforms 2. Instant reaction to customer issues 5. Recurring Key Technology 1. High-Availability 2. Sharding 3. Rebalancing 4. Cluster 5. etc.
  • 13. CUBRID Stability Performance Scalability Ease of Use Goal • Human vs. DB Errors • # of customers • Smart Index Optimizations • Shared Query Caching • Web Optimized Features • Load Balancer • High-Availability w/ auto fail-over • Sharding • Data Rebalancer • Cluster • SQL & API Compatibility • Native Migration Tool • Native GUI DB Management Tools • Monitoring Tools
  • 15. Client Requests Performance UP! Types of Web Services Main operations Example READ > 95% News, Wiki, Blog, etc. READ:WRITE = 70:30% SNS, Push services, etc. WRITE > 90% Log monitoring, Analytics. 90% of Web Services CRUD WHY? SELECT Fast searching, avoid sequential scan and ORDER BY INSERT Concurrent WRITE performance, reduce I/O, and Fast searching UPDATE Fast searching, improve lock mechanism DELET E Fast searching How & What to improve
  • 16. Phase 1 v1.0 ~ 2.0 Phase 2 v8.2.2 Phase 3 v8.4.0 Phase 4 v8.4.1 Phase 5 Apricot Phase 6 Banana SELECT Performance + INSERT & DELETE Performance + SELECT Performance ++ INSERT & UPDATE Performance ++ INSERT Performance +++ SELECT Performance ++++ Shared Quer y Plan Cachi ng Space Reusability Improvement Covering Index, Key limit, etc. Memory Buffer Mgmt. Improvement s Filter index, Skip index, etc. Optimize JOINs DB & Index Volume Optimization s API Performance + Windows Performance + TPS 15% 10% 270% 70% Smart Indexing MySQL SELECT performance CUBRID SELECT performance < MySQL INSERT performance CUBRID INSERT performance <
  • 17. CREATE TABLE forum_posts( user_id INTEGER, post_moment INTEGER, post_text VARCHAR(64) ); INDEX i_forum_posts_post_moment ON forum_posts (post_moment); INDEX i_forum_posts_post_moment_user_id ON forum_posts (post_moment, user_id); Random INSERT Performance SELECT username FROM users WHERE id = ?; INSERT INTO forum_posts(user_id, post_moment, post_text) VALUES (?, ?, ?); UPDATE users SET last_posted = ? WHERE id = ?; CREATE TABLE users( id INTEGER UNIQUE, username VARCHAR(255), last_posted INTEGER, );
  • 18. Random INSERT Performance • Users – 100,000 rows prepopulated • Test – CUBRID vNext (code name Apricot) – MySQL 5.5.21 – 40 workers – 1 hour – Record QPS every 2 minutes
  • 19. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Queries per second CUBRID QPS decrease with DataSet size Random INSERT Performance Average = 3685 Max = 4469 Min = 2821
  • 21. Random INSERT Performance 0 2000 4000 6000 8000 10000 12000 Queries per second CUBRID vs MySQL QPS decrease with DataSet size CUBRID QPS MySQL QPS
  • 22. CUBRID Optimizations Index Features Reverse Index Prefix Index Function Index Filter Index Unique Index Primary Key Foreign Key Query Features Multi-range key limit Index skip scan Skip order by Skip group by Range Scan optimizations Query rewrites Covering Index Descending Index Server level optimizations Log compression Shared Query Plan cache Locking Optimizations Transaction concurrency
  • 23. Filter Index • Interesting (open) tickets fit into a very small index. • No overhead for INSERT/UPDATE • Very fast results for open tickets CREATE INDEX ON tickets(component, assignee) WHERE status = ‘open’; SELECT title, component, assignee FROM users WHERE register_date > ‘2008-01-01’ AND status = ‘open’;
  • 24. QPS Filter vs. Full index 0 1000 2000 3000 4000 5000 6000 7000 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 4,500,000 5,000,000 5,500,000 6,000,000 6,500,000 7,000,000 7,500,000 8,000,000 8,500,000 9,000,000 9,500,000 10,000,000 Queries per second QPS Full Index QPS Filter Index
  • 25. CUBRID Architecture API CCI, JDBC, ADO.NET, OLEDB, ODBC, PHP, Perl, Python, Ruby Broker Query Parser Query Optimizer Query Planer Server Query Manager Query Executor Transaction Manager Lock Manager Log Manager Storage Manager File Manager CUBRID
  • 26. Parameterized Queries & Filter Index • Will not use partial index PostgreSQL • Provides workaround MS SQL Server • Less flexible, has to be the exact expression ORACLE • “Shared” Query Plan Cache CUBRID SELECT title, component, assignee FROM users WHERE register_date > ? AND status = ?; SELECT name, email FROM users WHERE register_date > ? AND age < ? AND age < 18;
  • 27. Query Plan Cache • Cache a plan for the lifespan of a driver level prepared statement PostgreSQL • No query plan cache MySQL • “Shared” Query Plan Cache CUBRID
  • 28. Query Plan Cache Parse SQL Name Resolving Semantic check Query Optimize Query Plan Query Execution Query Execution without Plan Cache Parse SQL Get Cached Plan Query Execution Query Execution with Plan Cache
  • 29. Auto Parameterization SELECT title, component, assignee FROM users WHERE register_date > ‘2008-01-01’ AND status = ‘open’; SELECT title, component, assignee FROM users WHERE register_date > ? AND status = ?;
  • 31. Scalability challenges • How to synchronize? – Async • Load balancing? – Third-party solution • Who handles Fail-over? – Application – Third-party solution • Cost?
  • 32. HA solutions DBMS Cost Disk-shared Replication Consistenc y Auto- Failover Oracle RAC +++++ Shared everything N/A N/A O MS-SQL Cluster +++ Shared everything N/A N/A O MySQL Cluster ++ Shared nothing Log Based Async Sync O MySQL Replication + Third-party Free Shared nothing Statement Based Async O CUBRID Free Shared nothing Log Based Sync Semi-sync Async O
  • 33. Client Requests 1. Non-stop 24/7 service uptime 2. No missing data between nodes Phase 1 v8.1.0 Phase 2 v8.2.x Phase 4 v8.3.x Phase 5 v8.4.x Phase 6 Apricot Replicatio n HA Support Extended HA features HA Monitoring + Easy Admin Scripts Async Auto Fail-over HA Status Monitoring HA Performance + Reduce Replication Delay Time CUBRID Heartbeat HA + Replica Admin Scripts Read-Write Service during DB maintenanc e Async, Semi-sync, Sy nc Broker Modes (RW, RO)
  • 35. CUBRID HA: Benefits • Non-stop maintenance • Auto Fail-over • Large Installations are Easy • Load balancing • Accurate and reliable Failure detection • Various Master-Slave Configurations: – 3 replication modes – 3 broker modes
  • 36. Database Sharding • Partitioning Divide the data between multiple tables within one Database Instance • Sharding Divide the data between multiple tables created in separate Database Instances DB X Y Z DB X DB Y DB Z Shard
  • 37. Without Database Sharding Tbl1 Tbl2 Tbl3 Broker App DB Tbl4
  • 38. With Database Sharding Tbl1 Tbl2 Tbl3 Broker App DB Tbl4
  • 39. CUBRID SHARD Phase 1 Apricot Phase 2 Banana Unlimited Shards Data Rebalancing Multiple Shard ID Gen. Algorith m Connection & Stateme nt Pooling Load Balancing HA Support CUBRID, MySQL, Ora cle Support      
  • 40. Sharding: Benefits • Developer friendly – Single database view – No more application logic – No application changes • Multiple sharding strategies • Native scale-out support • Load balancing • Support for heterogeneous databases
  • 42. Phase 1 v.8.2.x Phase 2 v.8.3.x Phase 4 v8.4.x Phase 6 Apricot Oracle MySQL MySQL MySQL, Oracle Hierarchical Query SQL: 60 + PHP: 20 + SQL: 70+ PHP: 20+ Currency SQL LOB, API++ Implicit Typ e Conversion + Usability + Usability+++ RegExpr MSSQL win-ba ck MySQL, Oracle win-bac k: Monitoring system Oracle: Ad s, Shopping Client Requests SQL Compatibility > 90% MySQL SQL Compatibility
  • 43. Client Requests 1. API Support 2. Ease of Migration 3. Usability Phase 1 v.8.1.x Phase 2 v.8.3.x Phase 3 v.8.4.x Phase 3 Apricot CM CM, CQB, C MT CUNITO R Web manag er CM Monitoring ++ Phase 1 v.8.1.x Phase 2 v.8.2.x Phase 3 v.8.3.x Phase 4 v.8.4.x CCI, JDBC, OL EDB PHP, Python, Ru by ODBC Perl, ADO.N ET
  • 44. MSSQL Win-Back in 2010 Dual Read/Writer MS SQL Application CUBRID Read Write [Step1] Dual Write Dual Read/Writer MS SQL Application CUBRID Read Write [Step2] Dual Write and Read Application CUBRID Read Write [Step3] Win-back Complete • 16 Master/Slave servers and 1 Archive server • DB size:  0.4~0.5 billion/DB, Total 4 billion records  Total 3.2 TB  Total 4,000 ~ 5,000 QPS • Save money for MSSQL License and SAN Storage
  • 45. ORACLE Enterprise CUBRID ORACLE Standard ORACLE Standard ORACLE Standard ORACLE Standard CUBRID CUBRID CUBRID CUBRID 40 servers 25 servers • DB size:  1.5 ~ 2.0 TB/DB, Total 40 TB  10~100K Inserts per second • Save money for Oracle License and SAN Storage 1 server Oracle Win-Back in 2011 System Monitoring Service
  • 46. What we have learnt so far and Where we are heading to?
  • 47. What we have learnt so far • Not easy to break users’ habits. • Need time. • Technical support is the key to acceptance! • Some services don’t deserve Oracle.
  • 48. CUBRID Deployment in NHN 42 50 60 69 77 82 94 100 107 117 166 181 208 259 273 283 312 326 346 500 0 100 200 300 400 500 0 20 40 60 80 100 120 140 ~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q ∑ services ∑ deployments
  • 49. CUBRID  Stability  Performance  Scalability  Ease of Use Achievements • Human vs. DB Errors • # of customers • Smart Index Optimizations • Shared Query Caching • Web Optimized Features • Load Balancer • High-Availability w/ auto fail-over • Sharding • Data Rebalancer • Cluster • > 90% MySQL SQL Compatibility • Native Migration Tool • Native GUI DB Management Tools • Monitoring Tools
  • 50. CUBRID Roadmap 8.4.x  Performance++ Covering index, Key limit, Range sc an  SQL Compatibility+ 70+ new syntax  HA++ Monitoring tools  I18N, L10N 2~3 European charse ts  SQL Compatibility++ Cursor holdability, Mass table UPDATE & DELETE  I18N, L10N+ more charsets  Performance+++  SQL monitoring performance+  SQL Compatibility+++  Table Partitioning Improvements  DB SHARDING+  Performance++++  CURBID Lite  SQL Compatibility++++  DB Monitoring Improvements  Arcus Caching Integrati on
  • 51. CUBRID is Big now. What can you do? 1. Keep watching it 2. Consider using 3. Discuss, talk, write about CUBRID 4. Support CUBRID in your apps 5. Contribute to CUBRID 6. Provide CUBRID service
  • 53. . . . • How do CUBRID developers cope with stress? – Join MySQL issue tracker ;) • Want more? – Follow us to the next room. We’ll have more discussions!

Notes de l'éditeur

  1. Self introduction.
  2. Eugen:When I started thinking about this presentation, this is the outcome that I wanted from it:For the experienced guys in the audience this are the thoughtswhat I want you to have at the end of this presentation. I want you to think that:Some guys talked about some cool stuff they encountered in applications (don&apos;t remember what)There&apos;s a database that they use for this type of applications, it&apos;s open source and saves a lot of trouble (don&apos;t remember what trouble exactly)They&apos;re really keen on doing things rightThis is what I remember from every presentation that I’ve attended. Not the details.So I don’t expect you to remember the technical details. What I want is to grasp the concept of what we will talk about.
  3. CUBRID is a fully-feature Relational Database Management System.
  4. When we were invited to speak at Russian IT conference, the committee members asked us to explain WHY NHN has started CUBRID development. And this is what I’m going to do now. We’ve never actually explained this in other conferences.
  5. Why not use existing solutions?Why start from scratch?Why not fork existing solutions?Why not co-develop?These are the common questions asked by users.To answer to these questions, we first need to understand who NHN is and what resource it possesses that it could pull off the project like CUBRID.
  6. Some of these services such as online games and billing systems already use Oracle databases, both standard and enterprise, and Microsoft SQL Server.Other services use MySQL.CUBRID is used in WRITE intensive services such as logging and spam filter services, as well as in READ intensive services such as commenting and monitoring systems.Oracle is a super reliable DBMS. We all know this. MySQL is great, too. Our DBAs love it very much. But as a services provider,we have certainproblems with all of them.
  7. They are all commercial. At NHN we have over 10,000 servers. Annually we pay several million dollars to extend the license and service support. No matter how much we earn from our main stream business, it’s a big chunk of an expense we have to pay every year. And we want to cut our expenses.Second disadvantage of existing solutions for us is that they are third-party, and NHN has no control over the course of core development.For this reason we spend quite a lot of money on customizations to serve our needs.There is also another big problem. It’s the communication problem we have with vendors. Most of them are located overseas. Many developers at NHN do not speak English well. They have problems conveying their requirements to the vendors.That slows the entire development process.
  8. Of course NHN has considered these two options before developing CUBRID: whether to fork MySQL or other open source solutions, or to start from scratch.To start with, in 2006 NHN hired the best database experts and architects in Korea and built a team of 20 developers.They all analyzed what was the best option:study every line of theexisting open source product (eg. MySQL), understand their philosophy, the reasons behind their architecture, etc.create a new DBMS from scratch with an architectureoptimized for Web services with native support for HA, sharding, load balancing, etc.After this NHN and the entire crew came to a conclusion that starting from scratch and optimizing for Web services is easier and cheaper in terms of time than studying the existing solution.
  9. As you will see now, there are many reasons why NHN has decided to create a new relational database solution.First, with an in-house solution NHN would cut the cost of ownership significantly. This if important, but it’s not the main reason.The key reason is that CUBRID Database is a core technological asset for NHN.By owning this technology, NHN controls the code base completely.No additional expenses are required for customizations.No communication problems with the database developers.Lastly, services wouldn’t suffer delayed updates and security and bug fixes.Possessing such technology would allow NHN to grow its developers, their skills and knowledge of storage technology. By training the developers, NHN can “export” these skills to other services at NHN, thereby improving the staff quality of those services. I will tell you what. NHN has invested 100M dollars to establish a Software Engineering Institution in Korea. Thus NHN is very serious when it comes to nurturing the engineering skills.Obviously, by developing CUBRID, NHN can provide database solution to other platforms and service departments within NHN. Those services would become so called internal customers. They would benefit from fast updates and fixes. There is a synergy effect.However, this is not all. After NHN has developed a relational database management system, the company now gained the knowledge how to develop other recurring solutions such as high-availability, sharding, rebalancing, cluster. And actually, after CUBRID, NHN has developed its own Owner based file system, has stared the Cluster database solution, and even a distributed database system for petabyte data.At this moment, I hope, you understand why NHN has decided to create its own RDBMS.
  10. When we started to develop CUBRID, we set a goal that our database solution should be:StableShould be fastand ScalableAs well as Easy to UseThereby, no matter what new feature we add these four things should not be broken as ACID shouldn’t be violated in relational databases. In the next slides I will explain how at CUBRID we try to meet these criteria.
  11. First, let’s see what we’ve done in CUBRID to meet the performance demands.
  12. What every customer wants is the Performance Boost. They don’t care how you do it, but they want it to be fast.At NHN we’ve identified 3 types of Web services.One, which is READ intensive. Examples are News services, Wiki sites, Blogs, and other user-generated content providers. In this kind of services the READ operations account for almost 99%.On the other hand there are services like SNS sites and Push services where around 70% of operations are READ, and 30% - are WRITE operations.And the last type of Web services are INSERT heavy services which account for over 90% of operations. Examples are Log monitoring systems and Analytics.Over 90% of all services are READ heavy services.Then we thought what we can do to provide the satisfactory level of performance. There are 4 CRUD operations commonly used by developers.In CUBRID we needed to provide Fast Searching, and avoid table scan and ORDER BY.We should increase the performance of concurrent write operations.Improve locking mechanism.What all these have in common is that all these can be achieved by optimizing the indexing algorithm. Because indexes affect how fast the data can be searched, which you can see affects all operations. This is the approach we’ve taken in CUBRID. With super fast indexes we can satisfy our clients.
  13. So we first focused on improving the READ operations, because this is what 90% of all services need. We’ve introduced a new concept of Shared Query Plan Caches. The first customers were very happy.Further the customer base started to grow, and the new clients asked more WRITE performance. In the next version we’ve improved the algorithm to achieve I/O load balancing when the INSERT operations are concentrated at a certain point of time. Besides, the Increased Space Reusability was another performance enhancement in the second phase.Phase 3 was the most promising for us. We had a few big services who said would replace their MySQL deployments with CUBRID if we further improved our READ operations. We’ve redesigned how indexes are stored in CUBRID which allowed us to reduce the data and index volume size by 70%. All of a sudden your database size becomes twice smaller. The performance of the database engine was increased by almost 3 times. Moreover we’ve significantly improved our Windows binaries by better handling the concurrent requests. Thus, we’ve migrated more MySQL based services to CUBRID. This is the first time when the performance of SELECT queries have surpassed the performance of MySQL.Our latest version of CUBRID 8.4.1 is another breakthrough. We’ve received many requests from SNS service providers, which have heavy WRITE operations. They wanted to try CUBRID if we promise to improve the WRITE performance. Therefore, we’ve focused on improving INSERT and UPDATE operations by rethinking how memory buffer and transaction logs were written to the disk. We’ve achieved 70% performance increase over the previous version. So far it’s the best CUBRID ever.In the next version under the code name Apricot which is due this summer, we’ll have several super smart improvements to indexes in CUBRID.Further, perhaps, we’ll improve the JOINs in CUBRID.So this whole performance improvement thing was completely led by our clients.
  14. Over one hour, CUBRID manages an average QPS of 3685 with the maximum being 4469 and the minimum 2821. Both these values are close enough to the average and show a slow decrease of performance for CUBRID on the dataset.
  15. Over one hour, MySQL manages an average QPS of 1796 with the maximum being 8951 and the minimum 1122. The performance of MySQL is very good at the beginning of the test but falls dramatically after the first few minutes.
  16. Even though MySQL performed two times faster than CUBRID before it reached one million rows, at the end of the result, CUBRID inserted two times as much data in the table (~13 million rows for CUBRID versus ~6.5 million rows for MySQL).If you’ve worked with big data before you know how much important is predictable performance. High performance is good, but predictable performance is the King. CUBRID’s INSERT performance is predictable.
  17. Apart from the must have optimizations, like all the databases have, we have special cases which we optimize for Web like (and show the list above). Each database has its own particular optimizations, there’s not much to talk about here. For example, MySQL optimizes special cases in inner joins which give them better performance over us (but worse on complicated joins), we heavily optimize range scans and limits and so on which gives us better performance for those cases. We can’t really go that far with inner joins because we have a much more complicated object model which makes it so difficult that during plan generation we don’t really know if that’s a table or a class or a class hierarchy or not even an object store at all but just a derived query. Not to mention hierarchical queries which complicate things even more.Loosely all query planers are based on the same algorithms designed in 1970’s by some guys at IBM, all current databases keep adding particular cases to them, we constantly search for common use cases that can be better handled in our database. The bottom line of this is that any optimization is a trade-off because you want to have the best query plan but you don’t want to spend too much time generating it so you have to compromise.To give a more clear example as to what I’m talking about when I say compromise, I’ll tell you about the Filter Index feature that will be released in the following version of CUBRID (around mid July this summer).Take query plan cache for exampleCUBRID caches 10000 plans by defaultthis takes a huge amount of memoryi think it can get to 50+Mband it&apos;s ok for us to do this but MySQL can&apos;tit&apos;s huge considering the fact thatfor exampleon my website i have 60+ mysql databases that use different things60 * 50Mb = 3 Gb just for caching plansit is ok for CUBRID because you would only have one db per machineso 50 Mb is not importantit&apos;s ok for oracle too (they use the same technique) because you only have one instance of Oracle per machineso again, caching data is not important for us, for oracle, for MS sql serverbut it&apos;s mostly unacceptable for PostgreSQL and MySQL.CUBRID is not created for every application. You cannot have 60, 100 databases like hosting companies have for their shared hosting customers. CUBRID is not designed for hosting companies.
  18. Filter Indexes (or partial indexes as they’re called in PostgreSQL) allow you to create an index on a subset of the data table, the syntax being something like: (create index)This index will make sure that only the tuples which have ‘open’ status will be in the index.With this index, if you want to find out which tickets are open, you will write a query like:SELECT title, component, assignee FROM users WHERE register_date &gt; ‘2008-01-01’ ANDstatus = ‘open’;You will naturally only look through tickets that are already open, ignoring all the rest. Obviously this is faster (YAY!).You’re bound to have only a few tickets open but many, many closed ones. However, the real improvement for this type of index is the performance on insert/update/delete. This is where the trade of of indexes is. They’re fast for searching but not that good when manipulating data in them. Since this index might end up holding several orders of magnitude less data that the whole table, this is going to work magic on insert statements.
  19. This chart shows the penalty normal indexes suffer while the dataset increases. It shows the number of INSERT statement a server can process per second. The data here was generated on my computer, it’s not an actual performance test but it shows the actual trend. You can see that the filter index QPS stays pretty much stable regardless of the dataset size while the Full index QPS slowly starts to decrease as we reach the 10.000.000 mark. Ok, so this is a filter index, it’s useful in certain situations, we decided to add support for it in CUBRID.However, there are some pitfalls to this type of index:
  20. First one is CUBRID specific and it has to do with the 3-tier architecture: The server component does not know how to handle expressions in their raw form (status &lt; ‘open’ has no meaning to it) so what we had to do was to extend the executable binary form of a query to be able to start execution wherever in the execution plan (thus enabling us to execute only the binary form status = open rather than the whole SELECT … FROM tickets WHERE status = open). This is like requiring a C compiler to be able to handle printing the value of x + 2 without having the main routine. More than this, we had to come up with a rather wicked way of cloning this section of binary code over multiple simultaneous transactions (rather than reinterpreting its serialized form over and over again since we think that disk read is BAD!). We’re really happy with the way this implementation turned out, we started applying it to other areas like partition pruning and precomputed columns and so on which will become much faster with this addition.
  21. The second pitfall is common to all databases: FILTER index doesn’t really work with parameterized queries:SELECT x FROM tickets WHERE register_date &gt; ? and status&lt; ?;Obviously, there’s no way to know what value ‘?’ will have during plan generation so we have to assume that the filter index is not enough. We can live with this and do what other databases do in this situation:PostgreSQL says that this will not be using the partial indexOracle (where you would use a function index) mentions that the expression of the index must appear in the exact same form in the query (so even age &lt; 17 will not be using the filter index, age &lt; ? is not even on the waiting list)MS SQL Server also says that it will not be using the index but if you want to have parameterized queries with filter indexes you can write:SELECT name, email FROM users WHERE register_date &gt; ? and age &lt; ? and age &lt; 18;This is obvious but MS makes it their motto to spell some things out for their users.OK, so problem solved, we will just do what other databases do and move on, right?No. It turns out that things are not as easy as this. The reason? “Shared” Query Plan cache.
  22. PostgreSQL does not have this problem, they only cache a plan for the lifespan of a driver level prepared statement. MySQL does not cache query plans at all. (Can you guess why? Yes, this is a trade-off decision also &lt;&lt;great for large scale, horrible for small applications, this can be extended into a nice talk too :P &gt;&gt;)
  23. However, CUBRID implements what is called a “shared” query plan cache. A shared query plan means that all compiled query plans get cached in a memory area and any session running the “same” query does not need to generate a new plan for it, it will just use the cached one. The default limit for cached plans is 10.000 query plans, considering that most applications do not have this many distinct queries (if they’re parameterized), we very rarely generate query plans for the life span of an application. We’ve got a huge improvement in overall performance when we added this feature (I think it was added before CUBRID became open source, It was already there when I joined the project).
  24. To optimize things even more, before plan generation we convert any literals or constants into a parameter and use the parameterized printed version as a key for indexing the cache:SELECT name, email FROM users WHERE register_date &lt; ‘2008-01-01’becomesSELECT name, email FROM users WHERE register_date &lt; ?and we never need to generate the plan for this query for any value of ‘?’.Ok, so why is this a trap for filter index? Because caching and decaching plans is a costly process so we have to be very careful: We simply cannot allow not parameterized queries to go into the cache because, and I would put my life on the line for this statement, the first programmer that will use a filter index will be very careful to query all the values of age from - 5 to 10 million filling up the cache and complaining that CUBRID is really slow. Just leaving it like SELECT title, component FROM tickets WHERE register_date &lt; ? and status =open is not an optionSELECT title, component FROM tickets WHERE register_date &lt; ? and status = ? is not an option either because this matches any status (status = closed) and if we generated a plan with a filter index for status = open, we will return less results than expected for status IN (open, closed) (the cached plan will be used). So again, we had to come up with a really clever way of solving this issue without affecting people not using filter indexes (the performance of what was already implemented must not be affected).
  25. Built-in Scalability is the second major field we’ve working on.
  26. These are the questions our clients had asked when they first approached to us.How the nodes will be synched? MySQL provides only Asynchronous replication. This affects the data consistency.How about Load balancer? how do we choose which slave will run a certain query and maintain the load balance?And Fail-over? If the Master “fails”, what will happen? Will the database provide the native fail-over?How much will this solution cost?
  27. There are various software in the market which provide High-Availability solutions. There is Oracle RAC, very expensive and very reliable. MySQL Cluster, or MySQL’s own Replication merged with a third-party solution such as MMM. In CUBRID there is a native transaction log based HA feature with 3 replication modes such as fully synchronized, semi-synched, and asynchronized.
  28. So our clients told us that if we can provide a solution which won’t cause application halt, and will not loose any data during the transaction, then they are ready to use CUBRID.The first thing we’ve implemented is the one-way Replication for Read load balancing. But that was not enough. The application developers were still required to implement the fail-over logic within their application.To remove the potential human error, we moved further. The version 8.2.0 was revolutionary. We’ve added the native HA support based on transaction log replication. Auto Fail-over feature was implemented on top of Linux Heartbeat. Our client were really happy. At the time we’ve migrated tens of services which used to run on MySQL and Microsoft SQL Server to CUBRID. That period was one of the most successful win-backs in CUBRID. For more than 50 instances. If 10 or less, easy to do manually.Later on we kept receiving complains from the users that fail-over feature was often delayed for a long time, sometimes even didn’t work at all. We analyzed their applications, created many test scenarios and found out that Linux Heartbeat was not very stable, its behavior was unpredictable. In the next version we’ve developed CUBRID’s native heartbeat technology and improved the failure detection algorithm.Further, the existing clients started asking the HA monitoring tools. We’ve created the monitoring tool for them.Then they asked more features. We’ve developed them.Currently we are working on a few more important features requested by our clients. For example, when there is a large service with hundreds of database nodes, everyday you see some slave nodes go offline. Of course the application developers never know about this because service is HA. So if not developers, someone has to fix those slave DB nodes. They are DBAs. So far they manually restore the slave nodes. And now they want a built-in script to auto restore the slave nodes. And we’re working in this.But even though we provide slave auto rebuilder, the slave nodes still have several terabytes of data. So no matter what you do, it will still take the time to replicate the data. To solve this, we need to reduce the replication time. What we’re going to do is we’ll introduce the multi-threaded replication for slave nodes.Big diff with MySQL , they have statement based, data inconsistency.Current: 20-30 Slave DB rebuild time – 14h. 700G.Easy admin script: multi-threaded, concurrent replication to make slave rebuild time shorter.Replication delay time reduction
  29. There are many configurations in CUBRID HA. You can build:Standard 1:1 Master-Slave HA systems.Extended 1:N Master-Slave system.1:1:N Master-Slave-ReplicaN:N Master-SlaveOr Compact N:1 Master:SlaveLast year at OSCON conference I presented the HA in its entirety. You can check out the presentation of the this link.
  30. Right now CUBRID HA is one of the most stable High-Availability solutions in the market. Hundreds of large Web services in Korea use CUBRID HA. In NHN only we have several big services which monitor over 300 DB instances each.It relies on the Native Heartbeat technology.It has very stable and predictable fail-over.And the native load balancer through CUBRID’s own Broker.So we’ve developed HA for redundancy. With Load balancing provided in our CUBRID Broker HA provides READ distribution. Great for READ-intensive services.But what about WRITE distribution?
  31. We’ve developed Database Sharding in CUBRID!The difference between partitioning and sharding is that with partitioning you can divide the data between multiple tables within one database which have identical schema.But with sharding you divide data between tables located in different databases. Sometimes the database gets so big that mere tables partitioning is not enough, in fact, it will hinder the performance of the entire system. So we’d better add new databases otherwise called Shards.If HA is for READ distribution, Sharding is for WRITE distribution as you can write to different databases simultaneously.This feature is something mostdevelopers dream to have it on Database side rather than on the application layer. Database Sharding doesn’t just simplify the developers’ life, but also improves both the application and database performance.The Application gets rid of the sharding logic.The Database reduces the index size.Win-win!
  32. Without built-in sharding developersneed to implement the sharding logic on the application layer using some sort of frameworks or the like.When the data in existing databases grew already too much, the developers needs to add a new database to partition the data.Since the sharding logic is implemented on the application layer, the developers also need to add a separate broker for this as well as update the application code to relay some of the traffic to a new database. This is, in fact, a very common architecture.
  33. With built-in database sharding, there is no work for developers. Everything is on the DBAs side. The developers see only one database. To determine which shard has to be queried, the developers can send the shard_id together with the query.When a new shard is added, all the DBA has to do is update the Metadata Directory on the Broker level. That’s all. The application never knows whether the data is partitioned or not.
  34. One of our clients who operates the largest News service in Korea has requested this feature saying that if we provide a seamless database sharding, they will be eager to replace their MySQL servers with CUBRID.We’ve already completed the 1st phase. CUBRID SHARD allows to create unlimited # of shards. The data can be distributed based on Modulo, or DATETIME, or hash/range calculations. The developers can even feed their own library to calculate the SHARD_ID using some complicated custom algorithm.CUBRID SHARD will natively support connection and statement pooling as well as Load Balancing. These days we are performing QA on HA support for shards. Another unique feature that CUBRID Sharding will provide is that you can use it with heterogeneous database backend, i.e. some data can be store in CUBRID, some in MySQL or even Oracle.This first version of CUBRID SHARD which will be released in the coming months, will require DBAs to create the necessary amount of shard in advance. That is this first version doesn’t provide dynamic scale-out, otherwise it would become a full Cluster solution and we would call it CUBRID Cluster instead. But Sharding is not a Cluster solution, so at this moment it doesn’t allow to dynamically add or remove shard. You have to decide that in advance.You know what? Didn’t I tell you at the beginning that one of the reasons NHN has started CUBRID development is that this project would allow to create Recurring projects?The developers from other platforms at NHN have already created a stable Data Rebalancing technology. In the next phase we will merge that technology into CUBRID which will allow DBAs to add SHARDs and seamlessly redistribute the data among them.It’s going to be a revolutionary product. All in one. The fast, and stable RDBMS which provides seamless scalability, this is what CUBRID is. This is what you can expect from CUBRID in the coming months.Once we roll out Sharding, a few candidate Web services back in our country can try it for the first time as a part of migration to CUBRID.
  35. Thus, with Sharding the developers can eradicate the sharding logic from their applications. All they will see is a single database view. No more they will need to make changes in the application code to adjust to the growing data.More than that the developers can define various sharding strategies by feeding their own libraries to calculate SHARD_ID.The DBAs will no more have to manually distribute data to the new shards. Everything will be done automatically.DBAs will be able to combine CUBRID with MySQL or Oracle if they prefer to.And the Load will be automatically balanced with CUBRID’s native Broker.
  36. As I’ve been saying several time throughout this presentation many of corporate clients ask us to provide a certain solution to a certain problem but they always ask us to make it easy to use, easier than what other third-party solutions provide.So we always focus on easy of use when developing CUBRID.
  37. SQL compatibility is the key request when it comes to database migration.After we’ve released the first version, one of the services in Korea running their system on Oracle asked us to support Hierarchical Queries. Now we support it.After that a few major READ-heavy MySQL services listed their requirements to consider the migration to CUBRID. Since then we’ve implemented several phases of MySQL compatibility and now we can proudly say that we support over 90% of MySQL SQL syntax.Phase 2:Extended CREATE/ALTER TABLE, INSERT, REPLACE, GROUP BY, ORDER BYOptional parts in SELECT.Added LIMIT, TRUNCATE, PREPAREOperator and functions extensions.Phase 3:API Improvements: added UTF-8 support to ODBC Connector, new server status related functions, functions to obtain schema and FK info.Usability: Perform bulk SQL operations on multiple table in CUBRID Tools.Phase 4:SQL syntax enhancementsSHOW statementsDATE/TIME functionsSTRING functionsAggregate functionsDB specific functionsImplicitTypeConversion behave very similar to that of MySQLUsability: all measurement units were changed from pages and Kilobytes to Total volume and Megabytes.Phrase 5:MySQL data type MIN/MAX valuesRegExpr
  38. To further improve the easy of use, we put much effort on improving our APIs and Administration Tools.Originally we had only CUBRID Manager, currently the most powerful and complete database administration tool with server and HA monitoring dashboards. This tools is perfect for DBAs.However, later on we started to receive requests to create a new tool which is light and oriented for developers, not for DBAs. They said CM was too powerful for developers. They didn’t need backup, replication and HA stuff. All they need is a tool for easy and fast query execution, testing, and checking the query plan. That’s all.So, we’ve created CUBRID Query Browser, the light version of CM created with developers in mind. For ease of migration, we’ve developed CUBRID Migration Tool. It allows to migrate a database from MySQL, Oracle or MSSQL to CUBRID automatically. All you need to do is to confirm the target data types and hit the migrate button.Last year one of the big service providers asked a tool to control the entire farm of Database servers. At the time we didn’t have such tool. Then we decided to work on CUNITOR, a powerful database monitoring tool.Likewise starting from the first version we’ve kept supporting more and more programming languages. First, we’ve rolled out C, Java, OLEDB for Oracle and Microsoft SQL Server clients. Further we’ve added most popular scripting languages for general users. At the beginning of this year, January and February, we announced two more APIs for Perl and .NET.
  39. Because you can run your service on cheaper hardware using proven open source solutions which are free.Last year we’ve migrated the entire System Monitoring service from Oracle to CUBRID. They key thing to notice here is how many servers were required to run Oracle based service and how many they needed to operate CUBRID servers.They had 40 Standard Oracle servers and 1 Enterprise server. After the migration we configured the entire system using only 25 CUBRID servers. They came and asked what should they do with the rest hardware. Because they were shocked to see how efficiently they could use CUBRID.After migration the service achieve 10,000 Inserts per second with CUBRID.The company saved a lot by staying away from Oracle licensing fees.That was an impressive success case.We’ve learnt that Oracle is not for every service. Your service can perfectly run on CUBRID with no need for compromise. Instead you will save a lot on license fees and support. If necessary buy a few cheap servers and use CUBRID’s built-in Load Balancer.There were around 30 databases, each stored some 1.5 to 2 Terabytes of logs.Mainly the service was INSERT intensive service.
  40. So what we’ve learnt so far and where we are heading to?
  41. All developers and project managers have got used to MySQL, Oracle and MS SQL Server so much that it’s really difficult to change their behavior. And this can happen to any software vendors if they enter already occupied market.But there is still a way to break this habit. You can achieve the acceptance through responsive technical support. Maybe CURBID is not as powerful as Oracle RAC, maybe we don’t support those features which our clients rely on, but with technical support we can solve anything. With technical support you can meet your clients’ expectation.The third lesson we’ve learnt with CUBRID is that some services don’t deserve Oracle, they even don’t deserve Microsoft SQL Server. Why? Here is why!
  42. So far at NHN we’ve already deployed CUBRID in over 100 Web services. The red line displays how many CUBRID servers are actually running on these services. Over 70% of servers are configured in HA environment.
  43. Four things:Stability, Performance, Scalability, Easy of Use. I wouldn’t hesitate to say that we’ve successfully achieved all these elements, though we have a long way to improve further!
  44. What’s next?SELECT is already faster than in MySQL. But we will improve it more with more Web optimized indexes. We’ll improve INSERT queries more.More performanceIndex improvements and optimizationsINSERT improvementsMore SQL compatibilityMySQL and OracleBetter and more powerful toolsCM+, Web administratorSharding is coming very soon!Auto Rebalancer
  45. Eugen:When I started thinking about this presentation, this is the outcome that I wanted from it:For the experienced guys in the audience this are the thoughtswhat I want you to have at the end of this presentation. I want you to think that:Some guys talked about some cool stuff they encountered in applications (don&apos;t remember what)There&apos;s a database that they use for this type of applications, it&apos;s open source and saves a lot of trouble (don&apos;t remember what trouble exactly)They&apos;re really keen on doing things rightThis is what I remember from every presentation that I’ve attended. Not the details.So I don’t expect you to remember the technical details. What I want is to grasp the concept of what we will talk about.