SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Modeling, estimating, and
predicting Ceph
Lars Marowsky-Brée
Distinguished Engineer, Architect Storage & HA
lmb@suse.com
What is Ceph?
3
From 10,000 Meters
[1] http://www.openstack.org/blog/2013/11/openstack-user-survey-statistics-november-2013/
• Open Source Distributed Storage solution
• Most popular choice of distributed storage for
OpenStack
[1]
• Lots of goodies
‒ Distributed Object Storage
‒ Redundancy
‒ Efficient Scale-Out
‒ Can be built on commodity hardware
‒ Lower operational cost
4
From 1,000 Meters
Object Storage
(Like Amazon S3)
Block Device File System
Unified Data Handling for 3 Purposes
●RESTful Interface
●S3 and SWIFT APIs
●Block devices
●Up to 16 EiB
●Thin Provisioning
●Snapshots
●POSIX Compliant
●Separate Data and
Metadata
●For use e.g. with
Hadoop
Autonomous, Redundant Storage Cluster
Unreasonable expectations
6
Unreasonable questions customers
want answers to
• How to build a storage system that meets my
requirements?
• If I build a system like this, what will its characteristics
be?
• If I change XY in my existing system, how will its
characteristics change?
7
How to design a Ceph cluster today
• Understand your workload (?)
• Make a best guess, based on the desirable properties and
factors
• Build a 1-10% pilot / proof of concept
‒ Preferably using loaner hardware from a vendor to avoid early
commitment. (The outlook of selling a few PB of storage and
compute nodes makes vendors very cooperative.)
• Refine and tune this until desired performance is achieved
• Scale up
‒ Ceph retains most characteristics at scale or even improves (until it
doesn’t), so re-tune
Software Defined Storage
9
Legacy Storage Arrays
• Limits:
‒ Tightly controlled
environment
‒ Limited scalability
‒ Few options
‒ Only certain approved drives
‒ Constrained number of disk
slots
‒ Few memory variations
‒ Only very few networking
choices
‒ Typically fixed controller and
CPU
• Benefits:
‒ Reasonably easy to
understand
‒ Long-term experience and
“gut instincts”
‒ Somewhat deterministic
behavior and pricing
10
Software Defined Storage (SDS)
• Limits:
‒ ?
• Benefits:
‒ Infinite scalability
‒ Infinite adaptability
‒ Infinite choices
‒ Infinite flexibility
‒ ... right.
11
Properties of a SDS System
• Throughput
• Latency
• IOPS
• Single client or aggregate
• Availability
• Reliability
• Safety
• Cost
• Adaptability
• Flexibility
• Capacity
• Density
12
Architecting a SDS system
• These goals often conflict:
‒ Availability versus Density
‒ IOPS versus Density
‒ Latency versus Throughput
‒ Everything versus Cost
• Many hardware options
• Software topology offers many configuration choices
• There is no one size fits all
A multitude of choices
14
Network
• Choose the fastest network you can afford
• Switches should be low latency with fully meshed
backplane
• Separate public and cluster network
• Cluster network should typically be twice the public
bandwidth
‒ Incoming writes are replicated over the cluster network
‒ Re-balancing and re-mirroring utilize the cluster network
15
Networking (Public and Internal)
• Ethernet (1, 10, 40 GbE)
‒ Can easily be bonded for availability
‒ Use jumbo frames if you can
‒ Various bonding modes to try
• Infiniband
‒ High bandwidth, low latency
‒ Typically more expensive
‒ RDMA support
‒ Replacing Ethernet since the year of the Linux desktop
16
Storage Node
• CPU
‒ Number and speed of cores
‒ One or more sockets?
• Memory
• Storage controller
‒ Bandwidth, performance, cache size
• SSDs for OSD journal
‒ SSD to HDD ratio
• HDDs
‒ Count, capacity, performance
17
Adding More Nodes
• Capacity increases
• Total throughput
increases
• IOPS increase
• Redundancy increases
• Latency unchanged
• Eventually: network
topology limitations
• Temporary impact during
re-balancing
18
Adding More Disks to a Node
• Capacity increases
• Redundancy increases
• Throughput might
increase
• IOPS might increase
• Internal node bandwidth
is consumed
• Higher CPU and memory
load
• Cache contention
• Latency unchanged
19
OSD File System
• btrfs
‒ Typically better write
throughput performance
‒ Higher CPU utilization
‒ Feature rich
‒ Compression, checksums, copy
on write
‒ The choice for the future!
• XFS
‒ Good all around choice
‒ Very mature for data
partitions
‒ Typically lower CPU
utilization
‒ The choice for today!
20
Impact of Caches
• Cache on the client side
‒ Typically, biggest impact on performance
‒ Does not help with write performance
• Server OS cache
‒ Low impact: reads have already been cached on the client
‒ Still, helps with readahead
• Caching controller, battery backed:
‒ Significant benefit for writes
21
Impact of SSD Journals
• SSD journals accelerate bursts and random write IO
• For sustained writes that overflow the journal,
performance degrades to HDD levels
• SSDs help very little with read performance
• SSDs are very costly
‒ ... and consume storage slots -> lower density
• A large battery-backed cache on the storage controller
is highly recommended if not using SSD journals
22
Hard Disk Parameters
• Capacity matters
‒ Often, highest density is not
most cost effective
‒ On-disk cache matters less
• Reliability advantage of
Enterprise drives typically
marginal compared to
cost
‒ Buy more drives instead
‒ Consider validation matrices
for small/medium NAS
servers as a guide
• RPM:
‒ Increase IOPS & throughput
‒ Increases power
consumption
‒ 15k drives quite expensive
still
• SMR/Shingled drives
‒ Density at the cost of write
speed
23
Other parameters
• IO Schedulers
‒ Deadline, CFQ, noop
• Let’s not even talk about
the various mkfs options
• Encryption
24
Impact of Redundancy Choices
• Replication:
‒ n number of exact, full-size
copies
‒ Potentially increased read
performance due to striping
‒ More copies lower
throughput, increase latency
‒ Increased cluster network
utilization for writes
‒ Rebuilds can leverage
multiple sources
‒ Significant capacity impact
• Erasure coding:
‒ Data split into k parts plus m
redundancy codes
‒ Better space efficiency
‒ Higher CPU overhead
‒ Significant CPU and cluster
network impact, especially
during rebuild
‒ Cannot directly be used with
block devices (see next
slide)
25
Cache Tiering
• Multi-tier storage architecture:
‒ Pool acts as a transparent write-back overlay for another
‒ e.g., SSD 3-way replication over HDDs with erasure coding
‒ Can flush either on relative or absolute dirty levels, or age
‒ Additional configuration complexity and requires workload-
specific tuning
‒ Also available: read-only mode (no write acceleration)
‒ Some downsides (no snapshots), memory consumption for
HitSet
• A good way to combine the advantages of replication
and erasure coding
26
Number of placement groups
● Number of hash buckets
per pool
● Data is chunked &
distributed across nodes
● Typically approx. 100 per
OSD/TB
• Too many:
‒ More peering
‒ More resources used
• Too few:
‒ Large amounts of data per
group
‒ More hotspots, less striping
‒ Slower recovery from failure
‒ Slower re-balancing
From components to system
28
More than the sum of its parts
• If this seems straightforward enough, it is because it
isn’t.
• The interactions are not trivial or humanly predictable.
• http://ceph.com/community/ceph-performance-part-2-
write-throughput-without-ssd-journals/
29
Hiding in a corner, weeping:
When anecdotes are not enough
31
The data drought of big data
• Isolated benchmarks
• No standardized work loads
• Huge variety of reporting formats (not parsable)
• Published benchmarks usually skewed to highlight
superior performance of product X
• Not enough data points to make sense of the “why”
• Not enough data about the environment
• Rarely from production systems
32
Existing efforts
• ceph-brag
‒ Last update: ~1 year ago
• cephperf
‒ Talk proposal for Vancover OpenStack Summit, presented at
Ceph Infernalis Developer Summit
‒ Mostly focused on object storage
33
Measuring Ceph performance
(you were in the previous session by Adolfo, right?)
• rados bench
‒ Measures backend performance of the RADOS store
• rados load-gen
‒ Generate configurable load on the cluster
• ceph tell osd.XX
• fio rbd backend
‒ Swiss army knife of IO benchmarking on Linux
‒ Can also compare in-kernel rbd with user-space librados
• rest-bench
‒ Measures S3/radosgw performance
34
Standard benchmark suite
• Gather all relevant and anonymized data in JSON
format
• Evaluate all components individually, as well as
combined
• Core and optional tests
‒ Must be fast enough to run on a (quiescent) production cluster
‒ Also means: non-destructive tests in core only
• Share data to build up a corpus on big data
performance
35
Block benchmarks – how?
• fio
‒ JSON output, bandwidth, latency
• Standard profiles
‒ Sequential and random IO
‒ 4k, 16k, 64k, 256k block sizes
‒ Read versus write
‒ Mixed profiles
‒ Don’t write zeros!
36
Block benchmarks – where?
• Individual OSDs
• All OSDs in a node
‒ On top of OSD fs, and/or raw disk?
‒ Cache baseline/destructive tests?
• Journal disks
‒ Read first, write back, for non-destructive tests
• RBD performance: rbd.ko, user-space
• One client, multiple clients
• Identify similar machines and run a random selection
37
Network benchmarks
• Latency (various packet sizes): ping
• Bandwidth: iperf3
• Relevant links:
‒ OSD – OSD, OSD – Monitor
‒ OSD – Rados Gateway, OSD - iSCSI
‒ OSD – Client, Monitor – Client, Client – RADOS Gateway
38
Other benchmark data
• CPU loads:
‒ Encryption
‒ Compression
• Memory access speeds
• Benchmark during re-balancing an OSD?
• CephFS (FUSE/cephfs.ko)
39
Environment description
• Hardware setup for all nodes
‒ Not always trivial to discover
• kernel, glibc, ceph versions
• Network layout metadata
40
Different Ceph configurations?
• Different node counts,
• replication sizes,
• erasure coding options,
• PG sizes,
• differently filled pools, ...
• Not necessarily non-destructive, but potentially highly
valuable.
41
Performance monitoring during runs
• Performance Co-Pilot?
• Gather network, CPU, IO, ... metrics
• While not fine grained enough to debug everything,
helpful to spot anomalies and trends
‒ e.g., how did network utilization of the whole run change over
time? Is CPU maxed out on a MON for long?
• Also, pretty pictures (such as those missing in this
presentation)
• LTTng instrumentation?
42
Summary of goals
• Holistic exploration of Ceph cluster performance
• Answer those pesky customer questions
• Identify regressions and brag about improvements
• Use machine learning to spot correlations that human
brains can’t grasp
‒ Recurrent neural networks?
• Provide students with a data corpus to answer
questions we didn’t even think of yet
Thank you.
43
Questions and Answers?
Corporate Headquarters
Maxfeldstrasse 5
90409 Nuremberg
Germany
+49 911 740 53 0 (Worldwide)
www.suse.com
Join us on:
www.opensuse.org
44
Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of
their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,
and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at
any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in
this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All
third-party trademarks are the property of their respective owners.

Contenu connexe

Tendances

2015 open storage workshop ceph software defined storage
2015 open storage workshop   ceph software defined storage2015 open storage workshop   ceph software defined storage
2015 open storage workshop ceph software defined storageAndrew Underwood
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsCeph Community
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High CostsJonathan Long
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community
 
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)Ceph Community
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Community
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQLPeter Eisentraut
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESJan Kalcic
 

Tendances (20)

2015 open storage workshop ceph software defined storage
2015 open storage workshop   ceph software defined storage2015 open storage workshop   ceph software defined storage
2015 open storage workshop ceph software defined storage
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQL
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
 

En vedette

A Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCA Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCbryanstillwell
 
Living with a Cephalopod: Daily Care & Feeding of Ceph Storage
Living with a Cephalopod: Daily Care & Feeding of Ceph StorageLiving with a Cephalopod: Daily Care & Feeding of Ceph Storage
Living with a Cephalopod: Daily Care & Feeding of Ceph StorageDaystromTech
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015bipin kunal
 
Ceph, storage cluster to go exabyte and beyond
Ceph, storage cluster to go exabyte  and beyondCeph, storage cluster to go exabyte  and beyond
Ceph, storage cluster to go exabyte and beyondAlvaro Soto
 
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Lenz Grimmer
 
SUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionSUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionGábor Nyers
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderMaor Lipchuk
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red_Hat_Storage
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 

En vedette (10)

A Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCA Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWC
 
Living with a Cephalopod: Daily Care & Feeding of Ceph Storage
Living with a Cephalopod: Daily Care & Feeding of Ceph StorageLiving with a Cephalopod: Daily Care & Feeding of Ceph Storage
Living with a Cephalopod: Daily Care & Feeding of Ceph Storage
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
 
Ceph, storage cluster to go exabyte and beyond
Ceph, storage cluster to go exabyte  and beyondCeph, storage cluster to go exabyte  and beyond
Ceph, storage cluster to go exabyte and beyond
 
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
 
SUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle IntroductionSUSE Enterprise Storage - a Gentle Introduction
SUSE Enterprise Storage - a Gentle Introduction
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_Cinder
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 

Similaire à Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)

Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Community
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephenSteve Feldman
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 

Similaire à Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015) (20)

Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 

Dernier

Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 

Dernier (20)

Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 

Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)

  • 1. Modeling, estimating, and predicting Ceph Lars Marowsky-Brée Distinguished Engineer, Architect Storage & HA lmb@suse.com
  • 3. 3 From 10,000 Meters [1] http://www.openstack.org/blog/2013/11/openstack-user-survey-statistics-november-2013/ • Open Source Distributed Storage solution • Most popular choice of distributed storage for OpenStack [1] • Lots of goodies ‒ Distributed Object Storage ‒ Redundancy ‒ Efficient Scale-Out ‒ Can be built on commodity hardware ‒ Lower operational cost
  • 4. 4 From 1,000 Meters Object Storage (Like Amazon S3) Block Device File System Unified Data Handling for 3 Purposes ●RESTful Interface ●S3 and SWIFT APIs ●Block devices ●Up to 16 EiB ●Thin Provisioning ●Snapshots ●POSIX Compliant ●Separate Data and Metadata ●For use e.g. with Hadoop Autonomous, Redundant Storage Cluster
  • 6. 6 Unreasonable questions customers want answers to • How to build a storage system that meets my requirements? • If I build a system like this, what will its characteristics be? • If I change XY in my existing system, how will its characteristics change?
  • 7. 7 How to design a Ceph cluster today • Understand your workload (?) • Make a best guess, based on the desirable properties and factors • Build a 1-10% pilot / proof of concept ‒ Preferably using loaner hardware from a vendor to avoid early commitment. (The outlook of selling a few PB of storage and compute nodes makes vendors very cooperative.) • Refine and tune this until desired performance is achieved • Scale up ‒ Ceph retains most characteristics at scale or even improves (until it doesn’t), so re-tune
  • 9. 9 Legacy Storage Arrays • Limits: ‒ Tightly controlled environment ‒ Limited scalability ‒ Few options ‒ Only certain approved drives ‒ Constrained number of disk slots ‒ Few memory variations ‒ Only very few networking choices ‒ Typically fixed controller and CPU • Benefits: ‒ Reasonably easy to understand ‒ Long-term experience and “gut instincts” ‒ Somewhat deterministic behavior and pricing
  • 10. 10 Software Defined Storage (SDS) • Limits: ‒ ? • Benefits: ‒ Infinite scalability ‒ Infinite adaptability ‒ Infinite choices ‒ Infinite flexibility ‒ ... right.
  • 11. 11 Properties of a SDS System • Throughput • Latency • IOPS • Single client or aggregate • Availability • Reliability • Safety • Cost • Adaptability • Flexibility • Capacity • Density
  • 12. 12 Architecting a SDS system • These goals often conflict: ‒ Availability versus Density ‒ IOPS versus Density ‒ Latency versus Throughput ‒ Everything versus Cost • Many hardware options • Software topology offers many configuration choices • There is no one size fits all
  • 13. A multitude of choices
  • 14. 14 Network • Choose the fastest network you can afford • Switches should be low latency with fully meshed backplane • Separate public and cluster network • Cluster network should typically be twice the public bandwidth ‒ Incoming writes are replicated over the cluster network ‒ Re-balancing and re-mirroring utilize the cluster network
  • 15. 15 Networking (Public and Internal) • Ethernet (1, 10, 40 GbE) ‒ Can easily be bonded for availability ‒ Use jumbo frames if you can ‒ Various bonding modes to try • Infiniband ‒ High bandwidth, low latency ‒ Typically more expensive ‒ RDMA support ‒ Replacing Ethernet since the year of the Linux desktop
  • 16. 16 Storage Node • CPU ‒ Number and speed of cores ‒ One or more sockets? • Memory • Storage controller ‒ Bandwidth, performance, cache size • SSDs for OSD journal ‒ SSD to HDD ratio • HDDs ‒ Count, capacity, performance
  • 17. 17 Adding More Nodes • Capacity increases • Total throughput increases • IOPS increase • Redundancy increases • Latency unchanged • Eventually: network topology limitations • Temporary impact during re-balancing
  • 18. 18 Adding More Disks to a Node • Capacity increases • Redundancy increases • Throughput might increase • IOPS might increase • Internal node bandwidth is consumed • Higher CPU and memory load • Cache contention • Latency unchanged
  • 19. 19 OSD File System • btrfs ‒ Typically better write throughput performance ‒ Higher CPU utilization ‒ Feature rich ‒ Compression, checksums, copy on write ‒ The choice for the future! • XFS ‒ Good all around choice ‒ Very mature for data partitions ‒ Typically lower CPU utilization ‒ The choice for today!
  • 20. 20 Impact of Caches • Cache on the client side ‒ Typically, biggest impact on performance ‒ Does not help with write performance • Server OS cache ‒ Low impact: reads have already been cached on the client ‒ Still, helps with readahead • Caching controller, battery backed: ‒ Significant benefit for writes
  • 21. 21 Impact of SSD Journals • SSD journals accelerate bursts and random write IO • For sustained writes that overflow the journal, performance degrades to HDD levels • SSDs help very little with read performance • SSDs are very costly ‒ ... and consume storage slots -> lower density • A large battery-backed cache on the storage controller is highly recommended if not using SSD journals
  • 22. 22 Hard Disk Parameters • Capacity matters ‒ Often, highest density is not most cost effective ‒ On-disk cache matters less • Reliability advantage of Enterprise drives typically marginal compared to cost ‒ Buy more drives instead ‒ Consider validation matrices for small/medium NAS servers as a guide • RPM: ‒ Increase IOPS & throughput ‒ Increases power consumption ‒ 15k drives quite expensive still • SMR/Shingled drives ‒ Density at the cost of write speed
  • 23. 23 Other parameters • IO Schedulers ‒ Deadline, CFQ, noop • Let’s not even talk about the various mkfs options • Encryption
  • 24. 24 Impact of Redundancy Choices • Replication: ‒ n number of exact, full-size copies ‒ Potentially increased read performance due to striping ‒ More copies lower throughput, increase latency ‒ Increased cluster network utilization for writes ‒ Rebuilds can leverage multiple sources ‒ Significant capacity impact • Erasure coding: ‒ Data split into k parts plus m redundancy codes ‒ Better space efficiency ‒ Higher CPU overhead ‒ Significant CPU and cluster network impact, especially during rebuild ‒ Cannot directly be used with block devices (see next slide)
  • 25. 25 Cache Tiering • Multi-tier storage architecture: ‒ Pool acts as a transparent write-back overlay for another ‒ e.g., SSD 3-way replication over HDDs with erasure coding ‒ Can flush either on relative or absolute dirty levels, or age ‒ Additional configuration complexity and requires workload- specific tuning ‒ Also available: read-only mode (no write acceleration) ‒ Some downsides (no snapshots), memory consumption for HitSet • A good way to combine the advantages of replication and erasure coding
  • 26. 26 Number of placement groups ● Number of hash buckets per pool ● Data is chunked & distributed across nodes ● Typically approx. 100 per OSD/TB • Too many: ‒ More peering ‒ More resources used • Too few: ‒ Large amounts of data per group ‒ More hotspots, less striping ‒ Slower recovery from failure ‒ Slower re-balancing
  • 28. 28 More than the sum of its parts • If this seems straightforward enough, it is because it isn’t. • The interactions are not trivial or humanly predictable. • http://ceph.com/community/ceph-performance-part-2- write-throughput-without-ssd-journals/
  • 29. 29 Hiding in a corner, weeping:
  • 30. When anecdotes are not enough
  • 31. 31 The data drought of big data • Isolated benchmarks • No standardized work loads • Huge variety of reporting formats (not parsable) • Published benchmarks usually skewed to highlight superior performance of product X • Not enough data points to make sense of the “why” • Not enough data about the environment • Rarely from production systems
  • 32. 32 Existing efforts • ceph-brag ‒ Last update: ~1 year ago • cephperf ‒ Talk proposal for Vancover OpenStack Summit, presented at Ceph Infernalis Developer Summit ‒ Mostly focused on object storage
  • 33. 33 Measuring Ceph performance (you were in the previous session by Adolfo, right?) • rados bench ‒ Measures backend performance of the RADOS store • rados load-gen ‒ Generate configurable load on the cluster • ceph tell osd.XX • fio rbd backend ‒ Swiss army knife of IO benchmarking on Linux ‒ Can also compare in-kernel rbd with user-space librados • rest-bench ‒ Measures S3/radosgw performance
  • 34. 34 Standard benchmark suite • Gather all relevant and anonymized data in JSON format • Evaluate all components individually, as well as combined • Core and optional tests ‒ Must be fast enough to run on a (quiescent) production cluster ‒ Also means: non-destructive tests in core only • Share data to build up a corpus on big data performance
  • 35. 35 Block benchmarks – how? • fio ‒ JSON output, bandwidth, latency • Standard profiles ‒ Sequential and random IO ‒ 4k, 16k, 64k, 256k block sizes ‒ Read versus write ‒ Mixed profiles ‒ Don’t write zeros!
  • 36. 36 Block benchmarks – where? • Individual OSDs • All OSDs in a node ‒ On top of OSD fs, and/or raw disk? ‒ Cache baseline/destructive tests? • Journal disks ‒ Read first, write back, for non-destructive tests • RBD performance: rbd.ko, user-space • One client, multiple clients • Identify similar machines and run a random selection
  • 37. 37 Network benchmarks • Latency (various packet sizes): ping • Bandwidth: iperf3 • Relevant links: ‒ OSD – OSD, OSD – Monitor ‒ OSD – Rados Gateway, OSD - iSCSI ‒ OSD – Client, Monitor – Client, Client – RADOS Gateway
  • 38. 38 Other benchmark data • CPU loads: ‒ Encryption ‒ Compression • Memory access speeds • Benchmark during re-balancing an OSD? • CephFS (FUSE/cephfs.ko)
  • 39. 39 Environment description • Hardware setup for all nodes ‒ Not always trivial to discover • kernel, glibc, ceph versions • Network layout metadata
  • 40. 40 Different Ceph configurations? • Different node counts, • replication sizes, • erasure coding options, • PG sizes, • differently filled pools, ... • Not necessarily non-destructive, but potentially highly valuable.
  • 41. 41 Performance monitoring during runs • Performance Co-Pilot? • Gather network, CPU, IO, ... metrics • While not fine grained enough to debug everything, helpful to spot anomalies and trends ‒ e.g., how did network utilization of the whole run change over time? Is CPU maxed out on a MON for long? • Also, pretty pictures (such as those missing in this presentation) • LTTng instrumentation?
  • 42. 42 Summary of goals • Holistic exploration of Ceph cluster performance • Answer those pesky customer questions • Identify regressions and brag about improvements • Use machine learning to spot correlations that human brains can’t grasp ‒ Recurrent neural networks? • Provide students with a data corpus to answer questions we didn’t even think of yet
  • 44. Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany +49 911 740 53 0 (Worldwide) www.suse.com Join us on: www.opensuse.org 44
  • 45. Unpublished Work of SUSE LLC. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.