At Percona Live in April 2016, Red Hat's Kyle Bader reviewed the general architecture of Ceph and then discussed the results of a series of benchmarks done on small to mid-size Ceph clusters, which led to the development of prescriptive guidance around tuning Ceph storage nodes (OSDs).
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
My SQL on Ceph
1. MySQL and
Ceph
2:20pm – 3:10pm
Room 203
MySQL in the Cloud
Head-to-Head Performance Lab
1:20pm – 2:10pm
Room 203
2. WHOIS
Brent Compton and Kyle Bader
Storage Solution Architectures
Red Hat
Yves Trudeau
Principal Architect
Percona
3. AGENDA
MySQL on Ceph MySQL in the Cloud
Head-to-Head Performance Lab
• MySQL on Ceph vs. AWS
• Head-to-head: Performance
• Head-to-head: Price/performance
• IOPS performance nodes for Ceph
• Why MySQL on Ceph
• Ceph Architecture
• Tuning: MySQL on Ceph
• HW Architectural Considerations
5. • Ceph #1 block storage for OpenStack clouds
• MySQL #4 workload on OpenStack
(#1-3 often use databases too!)
• 70% Apps use LAMP on OpenStack
• Ceph leading open-source SDS
• MySQL leading open-source RDBMS
WHY MYSQL ON CEPH?
MARKET DRIVERS
6. • Shared, elastic storage pool
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup to object pool
• Read replicas via copy-on-write snapshots
WHY MYSQL ON CEPH?
OPS EFFICIENCY
7. WHY MYSQL ON CEPH?
PUBLIC CLOUD FIDELITY
• Hybrid Cloud requires familiar platforms
• Developers want platform consistency
• Block storage, like the big kids
• Object storage, like the big kids
• Your hardware, datacenter, staff
8. WHY MYSQL ON CEPH?
HYBRID CLOUD REQUIRES HIGH IOPS
Ceph Provides
• Spinning Block – General Purpose
• Object Storage - Capacity
• SSD Block – High IOPS
10. ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
13. RADOS COMPONENTS
OSDs
• 10s to 10000s in a cluster
• Typically one per disk
• Serve stored objects to clients
• Intelligently peer for replication & recovery
Monitors
• Maintain cluster membership and state
• Provide consensus for distributed decision-making
• Small, odd number
• These do not serve stored objects to clients
22. ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
23. ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
25. RADOS ACCESS FOR APPLICATIONS
LIBRADOS
• Direct access to RADOS for applications
• C, C++, Python, PHP, Java, Erlang
• Direct access to storage nodes
• No HTTP overhead
26. ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
32. TUNING FOR HARMONY
OVERVIEW
Tuning MySQL
• Buffer pool > 20%
• Flush each Tx or batch?
• Parallel double write-buffer
flush
Tuning Ceph
• RHCS 1.3.2, tcmalloc 2.4
• 128M thread cache
• Co-resident journals
• 2-4 OSDs per SSD
33. TUNING FOR HARMONY
SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
1% Buffer Pool
5% Buffer Pool
25% Buffer Pool
50% Buffer Pool
75% Buffer Pool
34. TUNING FOR HARMONY
SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Batch Tx flush (1 sec)
Per Tx flush
35. TUNING FOR HARMONY
SAMPLE EFFECT OF CEPH TCMALLOC VERSION ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Per Tx flush
Per Tx flush (tcmalloc v2.4)
36. TUNING FOR HARMONY
CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• Add Volume Type to Cinder
37. TUNING FOR HARMONY
IF YOU MUST USE MAGNETIC MEDIA
Reducing seeks on magnetic pools
• RBD cache is safe
• RAID Controllers with write-back cache
• SSD Journals
• Software caches
40. NEXT UP
2:20pm – 3:10pm
Room 203
MySQL in the Cloud
Head-to-Head Performance Lab
Notes de l'éditeur
MySQL TPC-C tpmC benchmark on XS instances
2.5GB Dataset (25W) per Instance
(64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM
MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)
MySQL TPC-C tpmC benchmark on XS instances
2.5GB Dataset (25W) per Instance
(64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM
MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)
MySQL TPC-C tpmC benchmark on XS instances
2.5GB Dataset (25W) per Instance
(64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM
MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)