Scaling API-first – The story of a global engineering organization
Methods of NoSQL database systems benchmarking
1. Methods of benchmarking NoSQL database systems
Ilya Bakulin
webmaster@kibab.com, kibab@FreeBSD.org
SMS Traffic
LVEE 2011
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 1 / 16
2. 1 Introduction
2 YCSB benchmarking framework
3 YCSB practical usage
4 Results
5 Where to find further information
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 2 / 16
3. Why benchmarking NoSQL is nessesary
No guides / FAQs about performance are generally available, or are
outdated
NoSQL systems are actively developed
Nobody wants to end up with crashed DB in production right before
2-week vacation
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 3 / 16
4. Why benchmarking NoSQL is complex
RDBMS use SQL to provide access to data stored in them, while
NOSQL systems don’t
Each NoSQL uses different protocol (Thrift, Memcached-style, own
protocols)
Existing benchmarks require SQL to work with database under
inspection.
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 4 / 16
5. What is YCSB?
YCSB stands for Yahoo Cloud Serving Benchmark
Developed by Yahoo! Research group
Open Source project, hosted on GitHub (178 watchers, 42 forks)
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 5 / 16
6. Architecture Benchmark tool
•! Java application
Java application
–! Many systems have Java APIs
Shipped with ready-to-use adapters for several popular Opensource
databases systems via HTTP/REST, JNI or some other solution
–! Other
Command-line parameters
•! DB to use
•! Target throughput
•! Number of threads
•! …
Workload
YCSB client
Cloud DB
parameter file
DB client
•! R/W mix
•! Record size Client
Workload threads
•! Data set
executor
•! …
Stats
Extensible: define new workloads
Extensible: plug in new clients
5
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 6 / 16
7. More on DB interface
Benchmark tool
Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
•! Does not use SQL
Java application
... –! Many systems have Java APIs
but SQL support is avaible through contributed JDBC driver
... –! Other systems configurations are possible other solution
Even sharding via HTTP/REST, JNI or some
Command-line parameters
•! DB to use
•! Target throughput
•! Number of threads
•! …
Workload
YCSB client
Cloud DB
parameter file
DB client
•! R/W mix
•! Record size Client
Workload threads
•! Data set
executor
•! …
Stats
Extensible: define new workloads
Extensible: plug in new clients
5
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
8. More on DB interface
Benchmark tool
Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
•! Does not use SQL
Java application
... –! Many systems have Java APIs
but SQL support is avaible through contributed JDBC driver
... –! Other systems configurations are possible other solution
Even sharding via HTTP/REST, JNI or some
Command-line parameters
•! DB to use
•! Target throughput
•! Number of threads
•! …
Workload
YCSB client
Cloud DB
parameter file
DB client
•! R/W mix
•! Record size Client
Workload threads
•! Data set
executor
•! …
Stats
Extensible: define new workloads
Extensible: plug in new clients
5
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
9. More on DB interface
Benchmark tool
Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
•! Does not use SQL
Java application
... –! Many systems have Java APIs
but SQL support is avaible through contributed JDBC driver
... –! Other systems configurations are possible other solution
Even sharding via HTTP/REST, JNI or some
Command-line parameters
•! DB to use
•! Target throughput
•! Number of threads
•! …
Workload
YCSB client
Cloud DB
parameter file
DB client
•! R/W mix
•! Record size Client
Workload threads
•! Data set
executor
•! …
Stats
Extensible: define new workloads
Extensible: plug in new clients
5
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
10. More on workload
d by replacing the value of one Uniform:
+(,-./%'&0
ther one randomly chosen field
Specifies what DB
1)))2)))333
operations are used by
order, starting at a randomly !"#$%&'(")(%*$%
4
number of application
records to scan is
Zipfian:
Also defines request
+(,-./%'&0
distribution of scan lengths is
distribution
oad. Thus, the scan() method
number of records to scan.to specify
It is possible Of
y instead specify a scan interval 1)))2)))333 4
record size
February 15th). The number of !"#$%&'(")(%*$%
It’s possible to specify
to control the size of these in- Latest:
termine and number of records and
specify meaningful
+(,-./%'&0
of the database calls, including
tion 5.2.1.) operations
1)))2)))333 4
!"#$%&'(")(%*$%
t make many random choices
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 8 / 16
11. /* TODO: Remove this crap */
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 9 / 16
12. SMS Traffic: workload construction
SMS service provider, several gateways, big clients (such as banks)
15% inserts, 65% updates, 15% reads
Request distribution: latest SMS messages are the ”hottest” ones
Evaluated Cassandra and sharded MySQL as DB storage for the next
generation of SMS sending platform
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 10 / 16
13. Testing process
3 instances of DBMS system on one server (Core Quad Q9400, 4GB
RAM, SATA-II HDD, FreeBSD 8.2-amd64)
Cassandra 0.7.4 (1GB Java heap / instance)
MySQL 5.1 + InnoDB engine (1GB InnoDB buffer pool size /
instance)
Client: separate machine, 1Gb/s connection
Should avoid swapping and disk IO saturation
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 11 / 16
14. Some resuts: Workload ”A”: 50% read / 50% write
Workload A – Upd
•! 50/50 Read/update
Workload A - Read latency
70 80
60 70
Average read latency (ms)
Update latency (ms)
60
50
50
40
40
30
30
20
20
10 10
0 0
0 5000 10000 15000 0
Throughput (ops/sec)
Cassandra Hbase Sherpa MySQL Cassan
Ilya Bakulin (SMS Traffic) Comment: Cassandra is optimized for writes, and achieves h
NoSQL benchmarking July 2, 2011 12 / 16
15. oad A – Workload ”A”: 50% read / 50% write
Some resuts:
Update heavy
y Workload A - Update latency
80
70
Update latency (ms)
60
50
40
30
20
10
0
15000 0 5000 10000 15000
) Throughput (ops/sec)
MySQL Cassandra Hbase Sherpa MySQL
zed for writes, and achieves higher throughput and lower
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 13 / 16
16. Some resuts: Workload ”B”: 95% read / 5% write Workload B – Re
•! 95/5 Read/update
Workload B - Read latency
20 40
18 35
Average update latency (ms)
Average read latency (ms)
16
30
14
12 25
10 20
8 15
6
10
4
5
2
0 0
0 2000 4000 6000 8000 10000 0 2
Throughput (operations/sec)
Cassandra HBase Sherpa MySQL Cassan
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 14 / 16
17. oad resuts: Workload ”B”:heavy 5% write
Some B – Read 95% read /
Workload B - Update latency
40
35
Average update latency (ms)
30
25
20
15
10
5
0
8000 10000 0 2000 4000 6000 8000 10000
ec) Throughput (operations/sec)
MySQL Cassandra Hbase Sherpa MySQL
Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 15 / 16