SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
MAKING DISTRIBUTED STORAGE EASY:
USABILITY IN CEPH LUMINOUS AND BEYOND
SAGE WEIL – RED HAT
2018.01.26
2
PLAN
●
Ceph
●
Luminous
●
Simplify
●
Automate
●
Manage
●
Mimic
3
CEPH IS...
●
Object, block, and file storage in a single cluster
●
All components scale horizontally
●
No single point of failure
●
Hardware agnostic, commodity hardware
●
Self-managing whenever possible
●
Free and open source software (LGPL)
4
CEPH IS HARD
5
WE MUST
MAKE IT EASY
6 LUMINOUS
7
●
RADOS
– BlueStore (a new OSD backend)
●
stable and default
●
full data checksums
●
compression
– Erasure coding overwrites
●
now usable by RBD, CephFS
– ceph-mgr
●
scalability
●
prometheus, zabbix, restful
●
new web dashboard
– AsyncMessenger by default
●
RGW (object)
– metadata search
– compression and encryption
– NFS gateway (v3 and v4)
●
RBD (block)
– HA iSCSI (finally!)
– async mirroring improvements
●
CephFS (file)
– multiple MDS daemons
– subtree pinning
– auto directory fragmentation
LUMINOUS GOODNESS
8
CEPH RELEASES
Jewel (LTS)
Spring 2016
Kraken
Fall 2016
Luminous
Summer 2017
12.2.z
10.2.z
14.2.z
WE ARE HERE
Mimic
Spring 2018
Nautilus
Winter 2019
13.2.z
New release cadence
● Named release every 9 months
● Backports for 2 releases
● Upgrade up to 2 releases at a time
(e.g., Luminous → Nautilus)
9
SIMPLIFY
10
CEPH -S (BEFORE)
cluster 8ba08162-a390-479c-a698-6d8911c4f451
health HEALTH_OK
monmap e2: 3 mons at
{a=172.21.9.34:6789/0,b=172.21.9.34:6790/0,c=172.21.9.34:6791/0}
election epoch 8, quorum 0,1,2 a,b,c
fsmap e15: 1/1/1 up {0=b=up:active}, 1 up:standby
mgr active: x
osdmap e21: 1 osds: 1 up, 1 in
pgmap v191: 26 pgs, 3 pools, 4145 kB data, 23 objects
494 GB used, 436 GB / 931 GB avail
26 active+clean
11
CEPH -S (AFTER)
cluster:
id: 0554f6f9-6061-425d-a343-f246020f1464
health: HEALTH_OK
services:
mon: 1 daemons, quorum a
mgr: x(active)
mds: cephfs_a-1/1/1 up {[cephfs_a:0]=b=up:active}, 1 up:standby
osd: 3 osds: 3 up, 3 in
data:
pools: 5 pools, 40 pgs
objects: 42 objects, 4492 bytes
usage: 1486 GB used, 1306 GB / 2793 GB avail
pgs: 40 active+clean
12
HEALTH WARNINGS
health HEALTH_WARN
4 pgs degraded
5 pgs peering
1 pgs recovering
3 pgs recovery_wait
recovery 609/5442 objects degraded (11.191%)
health: HEALTH_WARN
Degraded data redundancy: 959/4791 objects degraded (20.017%), 5 pgs degraded
13
cluster [INF] osdmap e20: 4 osds: 4 up, 3 in
cluster [INF] pgmap v142: 24 pgs: 3 active+recovery_wait+degraded, 21 active+clean; 56
74 kB data, 1647 GB used, 1146 GB / 2793 GB avail; 818 kB/s wr, 230 op/s; 516/2256
objects degraded (22.872%); 0 B/s, 7 keys/s, 1 objects/s recovering
cluster [INF] pgmap v143: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77
19 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 1428 kB/s wr, 577 op/s; 1021/2901
objects degraded (35.195%); 321 kB/s, 65 keys/s, 76 objects/s recovering
cluster [INF] pgmap v144: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77
30 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 1090 kB/s wr, 483 op/s; 1021/3006
objects degraded (33.965%); 244 kB/s, 49 keys/s, 58 objects/s recovering
cluster [INF] pgmap v145: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77
30 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 905 kB/s wr, 401 op/s; 1021/3006
objects degraded (33.965%); 203 kB/s, 41 keys/s, 48 objects/s recovering
cluster [INF] pgmap v146: 24 pgs: 5 active+recovery_wait+degraded, 19 active+clean; 80
83 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 0 B/s rd, 959 kB/s wr, 494 op/s;
505/3711 objects degraded (13.608%); 1006 kB/s, 56 keys/s, 90 objects/s recovering
CLUSTER LOG (BEFORE)
14
cluster [WRN] Health check failed: Degraded data redundancy: 959/4791 objects degraded
(20.017%), 5 pgs degraded (PG_DEGRADED)
cluster [WRN] Health check update: Degraded data redundancy: 474/3399 objects degraded
(13.945%), 3 pgs degraded (PG_DEGRADED)
cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 474/3399
objects degraded (13.945%), 3 pgs degraded)
cluster [INF] Cluster is now healthy
CLUSTER LOG (AFTER)
15
●
>1400 configuration options
●
minimal documentation
– handful on https://docs.ceph.com
– comments in config_opts.h
(sometimes)
●
mix of
– user options
– developer constants
– debugging options to inject
errors or debugging behavior
●
difficult to determine relevant set
of current options
●
option schema (including docs) now
embedded in code (options.cc)
– ceph daemon <name> config help
<option>
– min/max, enum, or custom
validators
●
option levels: basic, advanced, and
dev
●
easy to identify changed options
– ceph daemon <name> config dif
●
configure cache sizes in bytes (not
objects)
●
similar levels + descriptions for perf
counters
CONFIGURATION
16
●
ceph.conf management tedious
and error-prone
– tooling needed to manage at
scale (puppet, chef, etc.)
●
nobody likes ini files any more
●
config stored on monitors
●
new ‘ceph config …’ CLI
●
prevent setting bogus values
●
config changes at runtime
●
“what is option X on daemon Y?”
●
‘assimilate-conf’ to import
existing config files
●
ceph.conf only (maybe) required
for bootstrap
– must identify monitor IPs
– DNS SRV records can also do
that
CENTRAL CONFIG (COMING IN MIMIC)
17
●
cephx capabilities powerful but unfriendly
– users must search docs for cap strings to copy/paste/modify
●
ceph auth add client.foo mon ‘profile rbd’ osd ‘profile rbd’ ...
●
ceph fs authorize <fsname> <entity/user> [rwp]
– automatically applies to any data pools associated (now or later) with the
file system
SIMPLIFY AUTH[NZ] SETUP
18
UPGRADES
$ ceph versions
{
"mon": {
"ceph version 12.2.2": 3
},
"mgr": {
"ceph version 12.2.2": 2
},
"osd": {
"ceph version 12.2.2": 7,
"ceph version 12.2.1": 1
},
"mds": {},
"overall": {
"ceph version 12.2.2": 12,
"ceph version 12.2.1": 1
}
}
19
CLIENT COMPATIBILITY
$ ceph features
{
...
"client": [
"group": {
"features": "0x107b84a842aca",
"release": "hammer",
"num": 3
},
"group": {
"features": "0x40107b86a842ada",
"release": "jewel",
"num": 1
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 5
}
]
}
●
CRUSH tunables and other
new/optional features afect client
compat
– often without admin realizing it
●
new command declares compatibility
– ceph osd set-require-min-compat-
client <release>
– prevent settings that break compat
promise
– cannot change compat promise if
current settings do not allow it
20
AUTOMATE
21
●
MTU sized ping messages between OSDs
– identify network/switch issues early
●
disable auto-out on small clusters
●
diferent (and sane) default values for HDDs and SSDs
●
ceph-volume replacement for ceph-disk
– adds support for dm-cache, (soon) VDO
– LVM-based instead of GPT+udev based (reliable)
EASY STUFF
22
CEPH-MGR – WHAT
●
a new core RADOS component
– sibling of ceph-mon, ceph-osd
– written in C++ to communicate natively (efficiently) with cluster
●
mandatory
– failed mgr afects reporting, introspection, APIs
– does not afect not data path
●
hosts python modules that implement monitoring/management
●
initially added in Kraken, mandatory in Luminous
23
CEPH-MGR – WHY
●
ceph-mon not a good home for high-level management
– mon stability is very important – no sloppy 3rd party code
– mon performance is important – minimize footprint, maximize scalability
– mon’s state view is synchronous, expensive
●
ceph-mgr has fast, async view of cluster state
– lightweight and efficient
– sufficient for introspection and management
●
ceph-mon shrinks
– drops stats responsibility
– demonstrated scale of >10k OSDs (~40PB)
24
CEPH-MGR ARCHITECTURE
mons
Other daemons
(OSD, MDS, RGW, etc)
read latest state
persist health, stats,
config, etc
mgr
status + stats
Local copy of
cluster maps
Table of latest
daemon stats
Python
dashboard
restful
prometheus
influx
status
Ceph
CLI
Browser
Script
prometheus
InfluxDB
Ceph
CLI
commands
25
MODULES ARE EASY AND ROBUST
●
easy
– trivially implement new CLI commands (e.g., status)
– expose cluster state (e.g., prometheus, influx, zabbix)
●
a few 100s of lines of code each
●
robust
– control the cluster (e.g., restful implements a full REST API) with cherrypy
– dashboard module is a full web-based GUI
●
Ceph handles the details
– HA, failover, plumbing for cluster state, management hooks, …
– modules ship with ceph itself
– ‘ceph mgr module enable <name>’ to enable
26
BALANCER
●
correct for normal variance in
pseudorandom data placement
●
builds and evaluates statistical model of
(current or proposed) PG distribution
●
automatically optimizes placement to
minimize variance in OSD utilization
– adjusts hidden CRUSH weights (backward
compatible) or pg-upmap (luminous+)
– throttles itself to avoid too much data
movement at once
●
‘ceph balancer on’
– commands to manually test if automated
operation untrusted
disk utilization
27
PG_NUM (SHARDING)
●
pg_num controls the shard count for pools
– necessary for good performance
– (used to be) necessary for balanced data distribution
– afects resource utilization—many users end up with too many
– implications for data reliability too
●
picking pg_num for pools is “black magic”
– not easy to provide generically applicable guidance
– web-based tool helps, but...
●
high stakes
– resharding moves data around
– can only be adjusted up
●
This should be not be something the typical operator is thinking about!
28
MAKING PG_NUM A NON-ISSUE (MIMIC?)
●
RADOS work in progress to allow PG merging
– once pg_num can scale both up and down, most of the risk of automation
goes away
●
plan a mgr module to automatically adjust pg_num
– utilization of pool (actual # of objects or bytes)
– user intent (allow means for user to hint how much of cluster the pool or
use-case is expected to consume)
●
automated but conservative adjustments
– throttle changes, just like the balancer module
29
SERVICEMAP
●
generic facility for daemons to
register with cluster
– metadata (immutable)
●
host, version, etc.
– status (mutable)
●
current task, progress, etc.
●
in-tree users
– radosgw
– rbd-mirror daemon
●
visibility in ‘ceph -s’
●
will enable better insight into rgw
multisite sync, rbd mirroring...
cluster:
id: 0554f6f9-6061-425d-a343-f246020f1464
health: HEALTH_OK
services:
mon: 1 daemons, quorum a
mgr: x(active)
osd: 3 osds: 3 up, 3 in
rgw: 1 daemon active
data:
pools: 5 pools, 40 pgs
objects: 42 objects, 4492 bytes
usage: 1486 GB used, 1306 GB / 2793 GB avail
pgs: 40 active+clean
30
MANAGE
31
DASHBOARD
●
web-based UI for managing ceph
– ‘ceph mgr module enable dashboard’
– luminous version is read-only, no authentication
●
front page similar to ‘ceph -s’ and ‘ceph -w’
●
RBD
– show pools, images
– mirror daemons and mirroring status
●
RGW
– zonegroups, zones, daemons
●
CephFS
– file systems, clients, metadata ops sparklines, etc.
32
33
DASHBOARD
●
designed for simplicity
– rivets framework – low barrier to entry for contributors
– shake out internal interfaces to ensure cluster state can be meaningfully
surfaces in a UI
●
example: pool tags
– RADOS metadata associated with pools to identify application etc.
– allows dashboard to identify which pools to present on RBD panel, etc.
– will allow CLI and other tools to prevent user mistakes (e.g., reusing RBD
pool for CephFS)
●
out of tree management implementations awkward
– separate tree; overhead of maintaining stable APIs
– deployment complexity (dependencies, HA, etc.)
34
OPENATTIC → DASHBOARD V2
●
openATTIC is SUSE’s external ceph management tool
– featured, robust, out-of-tree
●
consensus around developing full-featured, in-tree dashboard v2
– cluster management operations (creating pools, file systems, configuring
cluster, etc.)
– embedding rich Grafana metrics dashboards (ala OpenAttic, ceph-metrics)
– deployment tasks (expanding cluster, managing OSDs and other ceph
daemons)
●
initial work porting dashboard to angular2 up for review on github
●
openATTIC team porting their backend API to ceph-mgr
●
will become default as soon as superset of functionality is covered
35
PROVISIONING AND DEPLOYMENT
●
dashboard v2 will include ability to orchestrate ceph itself
– in kubernetes/openshift environments, provision OSDs, replace OSDs, etc.
– some subset of functionality on bare metal deployments
●
common tasks
– expanding cluster to a new host or to new storage devices
– replacing/reprovisioning failed OSDs
36
DEPLOYMENT TOOLS
●
traditional ceph-deploy tool is very basic, limited
●
ceph-ansible (Red Hat)
– ansible-based
●
DeapSea (SUSE)
– salt-based
●
(also puppet, chef, ...)
37
WHAT ABOUT CONTAINERS?
●
ceph-ansible has basic container support
– run daemons via docker…
●
(most) people really want a container
orchestrator (e.g., kubernetes)
– stateful services (e.g., OSDs) are super
annoying
– Ceph has lots of stateless services (radosgw,
ceph-mds, rbd-mirror, ceph-mgr. Also
ganesha, samba, …)
●
real value for small, hyperconverged clusters
●
container orchestrators as the new
distributed OS
38
ROOK
●
Kubernetes operator for ceph started by Quantum
– uses native kubernetes interfaces
– deploy ceph clusters
– provision ceph storage (object, block, file)
●
Smart enough to manage ceph daemons properly
– don’t stop/remove mon containers if it breaks quorum
– follow proper upgrade procedure for luminous → mimic
●
Makes Ceph “easy” (for Kubernetes users)
– control storage with kubernetes CRDs
●
Plan to make Rook the recommended/default choice for ceph in
kubernetes
– dashboard will call out to kubernetes/rook to manage cluster
daemons
39
MIMIC
40
MANAGEMENT
CONTAINERS
PERFORMANCE
41
UX
●
central config management
●
slick deployment in Kubernetes
with Rook
●
vastly improved dashboard
based on ceph-mgr and
openATTIC
– storage management and
cluster management
●
progress bars for recovery etc.
●
PG merging (maybe)
Other
●
QoS beta (RBD)
●
CephFS snapshots
●
cluster-managed NFS CephFS gateways
●
Lots of performance work
– new RGW frontend
– OSD refactoring for ongoing
optimizations for flash
– Seastar, DPDK, SPDK
COMING IN MIMIC
42
●
UX feedback wanted!
●
Mailing list and IRC
– http://ceph.com/IRC
●
Github
– https://github.com/ceph/
●
Ceph Developer Monthly
– first Weds of every month
– video conference (Bluejeans)
– alternating APAC- and EMEA-
friendly times
●
Ceph Days
– http://ceph.com/cephdays/
●
Meetups
– http://ceph.com/meetups
●
Ceph Tech Talks
– http://ceph.com/ceph-tech-talks/
●
‘Ceph’ Youtube channel
– (google it)
●
Twitter
– @ceph
GET INVOLVED
CEPHALOCON APAC
2018.03.22 and 23
BEIJING, CHINA
https://ceph.com/cephalocon
44
Sage Weil
Ceph Project Lead / Red Hat
sage@redhat.com
@liewegas
THANK YOU
●
Free and open source scalable
distributed storage
●
Minimal IT staf training!

Contenu connexe

Tendances

Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelColleen Corrice
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackRed_Hat_Storage
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Dude where's my volume, open stack summit vancouver 2015
Dude where's my volume, open stack summit vancouver 2015Dude where's my volume, open stack summit vancouver 2015
Dude where's my volume, open stack summit vancouver 2015Sean Cohen
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to CephCeph Community
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for youTaco Scargo
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage systemItalo Santos
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Community
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous updateinwin stack
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 

Tendances (20)

Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to Jewel
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Bluestore
BluestoreBluestore
Bluestore
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Dude where's my volume, open stack summit vancouver 2015
Dude where's my volume, open stack summit vancouver 2015Dude where's my volume, open stack summit vancouver 2015
Dude where's my volume, open stack summit vancouver 2015
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous update
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 

Similaire à Making distributed storage easy: usability in Ceph Luminous and beyond

Enabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via KubernetesEnabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via Kubernetesmountpoint.io
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...Nagios
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisationgrooverdan
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and topOpenVZ
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev
 
Time to rethink /proc
Time to rethink /procTime to rethink /proc
Time to rethink /procKir Kolyshkin
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaFromDual GmbH
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
 
Grabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the TrunkGrabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the TrunkHarold Giménez
 

Similaire à Making distributed storage easy: usability in Ceph Luminous and beyond (20)

The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Enabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via KubernetesEnabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via Kubernetes
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Strata - 03/31/2012
Strata - 03/31/2012Strata - 03/31/2012
Strata - 03/31/2012
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Scale 10x 01:22:12
Scale 10x 01:22:12Scale 10x 01:22:12
Scale 10x 01:22:12
 
Time to rethink /proc
Time to rethink /procTime to rethink /proc
Time to rethink /proc
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schema
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Grabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the TrunkGrabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the Trunk
 

Dernier

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Dernier (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

Making distributed storage easy: usability in Ceph Luminous and beyond

  • 1. MAKING DISTRIBUTED STORAGE EASY: USABILITY IN CEPH LUMINOUS AND BEYOND SAGE WEIL – RED HAT 2018.01.26
  • 3. 3 CEPH IS... ● Object, block, and file storage in a single cluster ● All components scale horizontally ● No single point of failure ● Hardware agnostic, commodity hardware ● Self-managing whenever possible ● Free and open source software (LGPL)
  • 7. 7 ● RADOS – BlueStore (a new OSD backend) ● stable and default ● full data checksums ● compression – Erasure coding overwrites ● now usable by RBD, CephFS – ceph-mgr ● scalability ● prometheus, zabbix, restful ● new web dashboard – AsyncMessenger by default ● RGW (object) – metadata search – compression and encryption – NFS gateway (v3 and v4) ● RBD (block) – HA iSCSI (finally!) – async mirroring improvements ● CephFS (file) – multiple MDS daemons – subtree pinning – auto directory fragmentation LUMINOUS GOODNESS
  • 8. 8 CEPH RELEASES Jewel (LTS) Spring 2016 Kraken Fall 2016 Luminous Summer 2017 12.2.z 10.2.z 14.2.z WE ARE HERE Mimic Spring 2018 Nautilus Winter 2019 13.2.z New release cadence ● Named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time (e.g., Luminous → Nautilus)
  • 10. 10 CEPH -S (BEFORE) cluster 8ba08162-a390-479c-a698-6d8911c4f451 health HEALTH_OK monmap e2: 3 mons at {a=172.21.9.34:6789/0,b=172.21.9.34:6790/0,c=172.21.9.34:6791/0} election epoch 8, quorum 0,1,2 a,b,c fsmap e15: 1/1/1 up {0=b=up:active}, 1 up:standby mgr active: x osdmap e21: 1 osds: 1 up, 1 in pgmap v191: 26 pgs, 3 pools, 4145 kB data, 23 objects 494 GB used, 436 GB / 931 GB avail 26 active+clean
  • 11. 11 CEPH -S (AFTER) cluster: id: 0554f6f9-6061-425d-a343-f246020f1464 health: HEALTH_OK services: mon: 1 daemons, quorum a mgr: x(active) mds: cephfs_a-1/1/1 up {[cephfs_a:0]=b=up:active}, 1 up:standby osd: 3 osds: 3 up, 3 in data: pools: 5 pools, 40 pgs objects: 42 objects, 4492 bytes usage: 1486 GB used, 1306 GB / 2793 GB avail pgs: 40 active+clean
  • 12. 12 HEALTH WARNINGS health HEALTH_WARN 4 pgs degraded 5 pgs peering 1 pgs recovering 3 pgs recovery_wait recovery 609/5442 objects degraded (11.191%) health: HEALTH_WARN Degraded data redundancy: 959/4791 objects degraded (20.017%), 5 pgs degraded
  • 13. 13 cluster [INF] osdmap e20: 4 osds: 4 up, 3 in cluster [INF] pgmap v142: 24 pgs: 3 active+recovery_wait+degraded, 21 active+clean; 56 74 kB data, 1647 GB used, 1146 GB / 2793 GB avail; 818 kB/s wr, 230 op/s; 516/2256 objects degraded (22.872%); 0 B/s, 7 keys/s, 1 objects/s recovering cluster [INF] pgmap v143: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77 19 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 1428 kB/s wr, 577 op/s; 1021/2901 objects degraded (35.195%); 321 kB/s, 65 keys/s, 76 objects/s recovering cluster [INF] pgmap v144: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77 30 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 1090 kB/s wr, 483 op/s; 1021/3006 objects degraded (33.965%); 244 kB/s, 49 keys/s, 58 objects/s recovering cluster [INF] pgmap v145: 24 pgs: 8 active+recovery_wait+degraded, 16 active+clean; 77 30 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 905 kB/s wr, 401 op/s; 1021/3006 objects degraded (33.965%); 203 kB/s, 41 keys/s, 48 objects/s recovering cluster [INF] pgmap v146: 24 pgs: 5 active+recovery_wait+degraded, 19 active+clean; 80 83 kB data, 1647 GB used, 1145 GB / 2793 GB avail; 0 B/s rd, 959 kB/s wr, 494 op/s; 505/3711 objects degraded (13.608%); 1006 kB/s, 56 keys/s, 90 objects/s recovering CLUSTER LOG (BEFORE)
  • 14. 14 cluster [WRN] Health check failed: Degraded data redundancy: 959/4791 objects degraded (20.017%), 5 pgs degraded (PG_DEGRADED) cluster [WRN] Health check update: Degraded data redundancy: 474/3399 objects degraded (13.945%), 3 pgs degraded (PG_DEGRADED) cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 474/3399 objects degraded (13.945%), 3 pgs degraded) cluster [INF] Cluster is now healthy CLUSTER LOG (AFTER)
  • 15. 15 ● >1400 configuration options ● minimal documentation – handful on https://docs.ceph.com – comments in config_opts.h (sometimes) ● mix of – user options – developer constants – debugging options to inject errors or debugging behavior ● difficult to determine relevant set of current options ● option schema (including docs) now embedded in code (options.cc) – ceph daemon <name> config help <option> – min/max, enum, or custom validators ● option levels: basic, advanced, and dev ● easy to identify changed options – ceph daemon <name> config dif ● configure cache sizes in bytes (not objects) ● similar levels + descriptions for perf counters CONFIGURATION
  • 16. 16 ● ceph.conf management tedious and error-prone – tooling needed to manage at scale (puppet, chef, etc.) ● nobody likes ini files any more ● config stored on monitors ● new ‘ceph config …’ CLI ● prevent setting bogus values ● config changes at runtime ● “what is option X on daemon Y?” ● ‘assimilate-conf’ to import existing config files ● ceph.conf only (maybe) required for bootstrap – must identify monitor IPs – DNS SRV records can also do that CENTRAL CONFIG (COMING IN MIMIC)
  • 17. 17 ● cephx capabilities powerful but unfriendly – users must search docs for cap strings to copy/paste/modify ● ceph auth add client.foo mon ‘profile rbd’ osd ‘profile rbd’ ... ● ceph fs authorize <fsname> <entity/user> [rwp] – automatically applies to any data pools associated (now or later) with the file system SIMPLIFY AUTH[NZ] SETUP
  • 18. 18 UPGRADES $ ceph versions { "mon": { "ceph version 12.2.2": 3 }, "mgr": { "ceph version 12.2.2": 2 }, "osd": { "ceph version 12.2.2": 7, "ceph version 12.2.1": 1 }, "mds": {}, "overall": { "ceph version 12.2.2": 12, "ceph version 12.2.1": 1 } }
  • 19. 19 CLIENT COMPATIBILITY $ ceph features { ... "client": [ "group": { "features": "0x107b84a842aca", "release": "hammer", "num": 3 }, "group": { "features": "0x40107b86a842ada", "release": "jewel", "num": 1 }, "group": { "features": "0x1ffddff8eea4fffb", "release": "luminous", "num": 5 } ] } ● CRUSH tunables and other new/optional features afect client compat – often without admin realizing it ● new command declares compatibility – ceph osd set-require-min-compat- client <release> – prevent settings that break compat promise – cannot change compat promise if current settings do not allow it
  • 21. 21 ● MTU sized ping messages between OSDs – identify network/switch issues early ● disable auto-out on small clusters ● diferent (and sane) default values for HDDs and SSDs ● ceph-volume replacement for ceph-disk – adds support for dm-cache, (soon) VDO – LVM-based instead of GPT+udev based (reliable) EASY STUFF
  • 22. 22 CEPH-MGR – WHAT ● a new core RADOS component – sibling of ceph-mon, ceph-osd – written in C++ to communicate natively (efficiently) with cluster ● mandatory – failed mgr afects reporting, introspection, APIs – does not afect not data path ● hosts python modules that implement monitoring/management ● initially added in Kraken, mandatory in Luminous
  • 23. 23 CEPH-MGR – WHY ● ceph-mon not a good home for high-level management – mon stability is very important – no sloppy 3rd party code – mon performance is important – minimize footprint, maximize scalability – mon’s state view is synchronous, expensive ● ceph-mgr has fast, async view of cluster state – lightweight and efficient – sufficient for introspection and management ● ceph-mon shrinks – drops stats responsibility – demonstrated scale of >10k OSDs (~40PB)
  • 24. 24 CEPH-MGR ARCHITECTURE mons Other daemons (OSD, MDS, RGW, etc) read latest state persist health, stats, config, etc mgr status + stats Local copy of cluster maps Table of latest daemon stats Python dashboard restful prometheus influx status Ceph CLI Browser Script prometheus InfluxDB Ceph CLI commands
  • 25. 25 MODULES ARE EASY AND ROBUST ● easy – trivially implement new CLI commands (e.g., status) – expose cluster state (e.g., prometheus, influx, zabbix) ● a few 100s of lines of code each ● robust – control the cluster (e.g., restful implements a full REST API) with cherrypy – dashboard module is a full web-based GUI ● Ceph handles the details – HA, failover, plumbing for cluster state, management hooks, … – modules ship with ceph itself – ‘ceph mgr module enable <name>’ to enable
  • 26. 26 BALANCER ● correct for normal variance in pseudorandom data placement ● builds and evaluates statistical model of (current or proposed) PG distribution ● automatically optimizes placement to minimize variance in OSD utilization – adjusts hidden CRUSH weights (backward compatible) or pg-upmap (luminous+) – throttles itself to avoid too much data movement at once ● ‘ceph balancer on’ – commands to manually test if automated operation untrusted disk utilization
  • 27. 27 PG_NUM (SHARDING) ● pg_num controls the shard count for pools – necessary for good performance – (used to be) necessary for balanced data distribution – afects resource utilization—many users end up with too many – implications for data reliability too ● picking pg_num for pools is “black magic” – not easy to provide generically applicable guidance – web-based tool helps, but... ● high stakes – resharding moves data around – can only be adjusted up ● This should be not be something the typical operator is thinking about!
  • 28. 28 MAKING PG_NUM A NON-ISSUE (MIMIC?) ● RADOS work in progress to allow PG merging – once pg_num can scale both up and down, most of the risk of automation goes away ● plan a mgr module to automatically adjust pg_num – utilization of pool (actual # of objects or bytes) – user intent (allow means for user to hint how much of cluster the pool or use-case is expected to consume) ● automated but conservative adjustments – throttle changes, just like the balancer module
  • 29. 29 SERVICEMAP ● generic facility for daemons to register with cluster – metadata (immutable) ● host, version, etc. – status (mutable) ● current task, progress, etc. ● in-tree users – radosgw – rbd-mirror daemon ● visibility in ‘ceph -s’ ● will enable better insight into rgw multisite sync, rbd mirroring... cluster: id: 0554f6f9-6061-425d-a343-f246020f1464 health: HEALTH_OK services: mon: 1 daemons, quorum a mgr: x(active) osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 5 pools, 40 pgs objects: 42 objects, 4492 bytes usage: 1486 GB used, 1306 GB / 2793 GB avail pgs: 40 active+clean
  • 31. 31 DASHBOARD ● web-based UI for managing ceph – ‘ceph mgr module enable dashboard’ – luminous version is read-only, no authentication ● front page similar to ‘ceph -s’ and ‘ceph -w’ ● RBD – show pools, images – mirror daemons and mirroring status ● RGW – zonegroups, zones, daemons ● CephFS – file systems, clients, metadata ops sparklines, etc.
  • 32. 32
  • 33. 33 DASHBOARD ● designed for simplicity – rivets framework – low barrier to entry for contributors – shake out internal interfaces to ensure cluster state can be meaningfully surfaces in a UI ● example: pool tags – RADOS metadata associated with pools to identify application etc. – allows dashboard to identify which pools to present on RBD panel, etc. – will allow CLI and other tools to prevent user mistakes (e.g., reusing RBD pool for CephFS) ● out of tree management implementations awkward – separate tree; overhead of maintaining stable APIs – deployment complexity (dependencies, HA, etc.)
  • 34. 34 OPENATTIC → DASHBOARD V2 ● openATTIC is SUSE’s external ceph management tool – featured, robust, out-of-tree ● consensus around developing full-featured, in-tree dashboard v2 – cluster management operations (creating pools, file systems, configuring cluster, etc.) – embedding rich Grafana metrics dashboards (ala OpenAttic, ceph-metrics) – deployment tasks (expanding cluster, managing OSDs and other ceph daemons) ● initial work porting dashboard to angular2 up for review on github ● openATTIC team porting their backend API to ceph-mgr ● will become default as soon as superset of functionality is covered
  • 35. 35 PROVISIONING AND DEPLOYMENT ● dashboard v2 will include ability to orchestrate ceph itself – in kubernetes/openshift environments, provision OSDs, replace OSDs, etc. – some subset of functionality on bare metal deployments ● common tasks – expanding cluster to a new host or to new storage devices – replacing/reprovisioning failed OSDs
  • 36. 36 DEPLOYMENT TOOLS ● traditional ceph-deploy tool is very basic, limited ● ceph-ansible (Red Hat) – ansible-based ● DeapSea (SUSE) – salt-based ● (also puppet, chef, ...)
  • 37. 37 WHAT ABOUT CONTAINERS? ● ceph-ansible has basic container support – run daemons via docker… ● (most) people really want a container orchestrator (e.g., kubernetes) – stateful services (e.g., OSDs) are super annoying – Ceph has lots of stateless services (radosgw, ceph-mds, rbd-mirror, ceph-mgr. Also ganesha, samba, …) ● real value for small, hyperconverged clusters ● container orchestrators as the new distributed OS
  • 38. 38 ROOK ● Kubernetes operator for ceph started by Quantum – uses native kubernetes interfaces – deploy ceph clusters – provision ceph storage (object, block, file) ● Smart enough to manage ceph daemons properly – don’t stop/remove mon containers if it breaks quorum – follow proper upgrade procedure for luminous → mimic ● Makes Ceph “easy” (for Kubernetes users) – control storage with kubernetes CRDs ● Plan to make Rook the recommended/default choice for ceph in kubernetes – dashboard will call out to kubernetes/rook to manage cluster daemons
  • 41. 41 UX ● central config management ● slick deployment in Kubernetes with Rook ● vastly improved dashboard based on ceph-mgr and openATTIC – storage management and cluster management ● progress bars for recovery etc. ● PG merging (maybe) Other ● QoS beta (RBD) ● CephFS snapshots ● cluster-managed NFS CephFS gateways ● Lots of performance work – new RGW frontend – OSD refactoring for ongoing optimizations for flash – Seastar, DPDK, SPDK COMING IN MIMIC
  • 42. 42 ● UX feedback wanted! ● Mailing list and IRC – http://ceph.com/IRC ● Github – https://github.com/ceph/ ● Ceph Developer Monthly – first Weds of every month – video conference (Bluejeans) – alternating APAC- and EMEA- friendly times ● Ceph Days – http://ceph.com/cephdays/ ● Meetups – http://ceph.com/meetups ● Ceph Tech Talks – http://ceph.com/ceph-tech-talks/ ● ‘Ceph’ Youtube channel – (google it) ● Twitter – @ceph GET INVOLVED
  • 43. CEPHALOCON APAC 2018.03.22 and 23 BEIJING, CHINA https://ceph.com/cephalocon
  • 44. 44 Sage Weil Ceph Project Lead / Red Hat sage@redhat.com @liewegas THANK YOU ● Free and open source scalable distributed storage ● Minimal IT staf training!