SlideShare une entreprise Scribd logo
1  sur  30
Future of CephFS
Sage Weil
APP
APP

LIBRADOS
LIBRADOS

APP
APP

RADOSGW
RADOSGW

AA bucket-based
bucket-based
AA library allowing REST gateway,
library allowing
REST gateway,
apps to directly
compatible with S3
apps to directly
compatible with S3
access RADOS,
access RADOS, and Swift
and Swift
with support for
with support for
C, C++, Java,
C, C++, Java,
Python, Ruby,
Python, Ruby,
and PHP
and PHP

HOST/VM
HOST/VM

CLIENT
CLIENT

RBD
RBD

CEPH FS

AA reliable and fullyreliable and fullydistributed block
distributed block
device, with a a Linux
device, with Linux
kernel client and a a
kernel client and
QEMU/KVM driver
QEMU/KVM driver

A POSIX-compliant
distributed file system,
with a Linux kernel
client and support for
FUSE

RADOS
RADOS
AA reliable, autonomous, distributed object store comprised of self-healing, self-managing,
reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
CLIENT
CLIENT

metadata

01
01
10
10

data

M
M
M
M

M
M
M
M
M
M

M
M
Metadata Server
• Manages metadata for a
POSIX-compliant shared
filesystem
• Directory hierarchy
• File metadata (owner,
timestamps, mode, etc.)

• Stores metadata in RADOS
• Does not serve file data to
clients
• Only required for shared
filesystem
legacy metadata storage
●

a scaling disaster
●

name → inode → block list →
data

●

no inode table locality

●

fragmentation
–

inode table

–

directory

●

many seeks

●

difficult to partition

etc
home
usr
var
vmlinuz
…

hosts
mtab
passwd
…

bin
include
lib
…
ceph fs metadata storage
●

●

100
1
etc
home
usr
var
vmlinuz
…

hosts
mtab
passwd
…

block lists unnecessary
inode table mostly useless
●

102
bin
include
lib
…

●

●

APIs are path-based, not
inode-based
no random table access,
sloppy caching

embed inodes inside
directories
●

good locality, prefetching

●

leverage key/value object
controlling metadata io
●

view ceph-mds as cache
●

reduce reads
–

●

reduce writes
–

●

dir+inode prefetching
journal

consolidate multiple writes

large journal or log
●

stripe over objects

●

two tiers
–
–

●

journal for short term
per-directory for long term

fast failure recovery

directories
one tree

three metadata servers

??
load distribution
●

coarse (static subtree)
●

●

high management overhead

fine (hash)
●

always balanced

●

less vulnerable to hot spots

●

●

static subtree

preserve locality

●

good locality

destroy hierarchy, locality

can a dynamic approach
capture benefits of both
extremes?

hash directories

good balance

hash files
DYNAMIC SUBTREE PARTITIONING
dynamic subtree partitioning
●

scalable
●

●

arbitrarily partition
metadata

adaptive
●

●

●

move work from busy to
idle servers
replicate hot metadata

efficient
●

●

hierarchical partition
preserve locality

dynamic
●

daemons can join/leave

●

take over for failed nodes
Dynamic partitioning
many directories

same directory
Failure recovery
Metadata replication and availability
Metadata cluster scaling
client protocol
●

highly stateful
●

●

consistent, fine-grained caching

seamless hand-off between ceph-mds daemons
●

●

●

when client traverses hierarchy
when metadata is migrated between servers

direct access to OSDs for file I/O
an example
●

mount -t ceph 1.2.3.4:/ /mnt
●

●

●

3 ceph-mon RT
2 ceph-mds RT (1 ceph-mds to -osd RT)

2 ceph-mds RT (2 ceph-mds to -osd RT)

ls -al
●

open

●

readdir
–

1 ceph-mds RT (1 ceph-mds to -osd RT)

●

stat each file

●

●

ceph-osd

cd /mnt/foo/bar
●

●

ceph-mon

close

cp * /tmp
●

N ceph-osd RT

ceph-mds
recursive accounting
●

ceph-mds tracks recursive directory stats
●

file sizes

●

file and directory counts

●

modification time

●

virtual xattrs present full stats

●

efficient
$ ls ­alSh | head
total 0
drwxr­xr­x 1 root            root      9.7T 2011­02­04 15:51 .
drwxr­xr­x 1 root            root      9.7T 2010­12­16 15:06 ..
drwxr­xr­x 1 pomceph         pg4194980 9.6T 2011­02­24 08:25 pomceph
drwxr­xr­x 1 mcg_test1       pg2419992  23G 2011­02­02 08:57 mcg_test1
drwx­­x­­­ 1 luko            adm        19G 2011­01­21 12:17 luko
drwx­­x­­­ 1 eest            adm        14G 2011­02­04 16:29 eest
drwxr­xr­x 1 mcg_test2       pg2419992 3.0G 2011­02­02 09:34 mcg_test2
drwx­­x­­­ 1 fuzyceph        adm       1.5G 2011­01­18 10:46 fuzyceph
drwxr­xr­x 1 dallasceph      pg275     596M 2011­01­14 10:06 dallasceph
snapshots
●

volume or subvolume snapshots unusable at petabyte
scale
●

●

snapshot arbitrary subdirectories

simple interface
●

hidden '.snap' directory

●

no special tools

$ mkdir foo/.snap/one
$ ls foo/.snap
one
$ ls foo/bar/.snap
_one_1099511627776
$ rm foo/myfile
$ ls -F foo
bar/
$ ls -F foo/.snap/one
myfile bar/
$ rmdir foo/.snap/one

# create snapshot

# parent's snap name is mangled

# remove snapshot
multiple client implementations
●

Linux kernel client
●

●

mount -t ceph 1.2.3.4:/
/mnt
export (NFS), Samba (CIFS)

●

ceph-fuse

●

libcephfs.so
●

your app

●

Ganesha (NFS)

●

Hadoop (map/reduce)

Ganesha
libcephfs

Samba
libcephfs

Hadoop
libcephfs

your app
libcephfs

Samba (CIFS)

●

SMB/CIFS

NFS

ceph

ceph-fuse
fuse
kernel
APP
APP

LIBRADOS
LIBRADOS

APP
APP

HOST/VM
HOST/VM

RADOSGW
RADOSGW

RBD
RBD

AA bucket-based
bucket-based
AA library allowing REST gateway,
library allowing
REST gateway,
apps to directly
compatible with S3
apps to directly
compatible with S3
access RADOS,
access RADOS, and Swift
and Swift
with support for
with support for
C, C++, Java,
C, C++, Java,
Python, Ruby,
Python, Ruby,
AWESOME
and PHP
and PHP

CEPH FS
CEPH FS

AA reliable and fullyreliable and fullydistributed block
distributed block
device, with a a Linux
device, with Linux
kernel client and a a
kernel client and
QEMU/KVM driver
QEMU/KVM driver

AA POSIX-compliant
POSIX-compliant
distributed file system,
distributed file system,
with a a Linux kernel
with Linux kernel
client and support for
client and support for
FUSE
FUSE

AWESOME

AWESOME
RADOS
RADOS

CLIENT
CLIENT

NEARLY
AWESOME

AWESOME

AA reliable, autonomous, distributed object store comprised of self-healing, self-managing,
reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
Path forward
●

Testing
●

●

●

Various workloads
Multiple active MDSs

Test automation
●

●

●

Simple workload generator scripts
Bug reproducers

Hacking
●

●

●

Bug squashing
Long-tail features

Integrations
●

Ganesha, Samba, *stacks
hard links?
●

rare

●

useful locality properties
●

●

●

intra-directory
parallel inter-directory

on miss, file objects provide per-file
backpointers
●

degenerates to log(n) lookups

●

optimistic read complexity
what is journaled
●

lots of state
●

●

●

journaling is expensive up-front, cheap to recover
non-journaled state is cheap, but complex (and somewhat
expensive) to recover

yes
●

●

●

client sessions
actual fs metadata modifications

no
●

●

●

cache provenance
open files

lazy flush
●

client modifications may not be durable until fsync() or visible by
another client

Contenu connexe

Tendances

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformJean-Paul Azar
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCeph Community
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph Community
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesJean-Paul Azar
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ian Colle
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Community
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsJean-Paul Azar
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeRicardo Paiva
 

Tendances (20)

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming Platform
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examples
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and Ops
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in Europe
 

En vedette

London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress Ceph Community
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph Ceph Community
 
London Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuLondon Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuCeph Community
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNCeph Community
 
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiLondon Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiCeph Community
 
London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? Ceph Community
 
London Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemLondon Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemCeph Community
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Community
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012Ceph Community
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Community
 
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Community
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Community
 
Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Community
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Community
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Community
 
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Qué es un virus informático
Qué es un virus informáticoQué es un virus informático
Qué es un virus informáticojoss_24_jvvg
 

En vedette (20)

London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
 
London Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuLondon Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with Juju
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiLondon Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
 
London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet?
 
London Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemLondon Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the Echosystem
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem Update
 
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome
 
Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
 
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Qué es un virus informático
Qué es un virus informáticoQué es un virus informático
Qué es un virus informático
 

Similaire à London Ceph Day: The Future of CephFS

Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 
New features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondNew features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondCeph Community
 
New Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondNew Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondOpenStack Foundation
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with cephIan Colle
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemNETWAYS
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Ceph Community
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Ceph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Ceph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver MeetupCeph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver Meetupktdreyer
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatOpenStack
 
Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Sharma Aashish
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisAnkit Singhal
 
Ceph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageCeph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageItalo Santos
 

Similaire à London Ceph Day: The Future of CephFS (20)

Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
New features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondNew features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and Beyond
 
New Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondNew Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and Beyond
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Ceph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph Fundamentals
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver MeetupCeph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver Meetup
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
 
Ceph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageCeph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define Storage
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

London Ceph Day: The Future of CephFS

  • 2. APP APP LIBRADOS LIBRADOS APP APP RADOSGW RADOSGW AA bucket-based bucket-based AA library allowing REST gateway, library allowing REST gateway, apps to directly compatible with S3 apps to directly compatible with S3 access RADOS, access RADOS, and Swift and Swift with support for with support for C, C++, Java, C, C++, Java, Python, Ruby, Python, Ruby, and PHP and PHP HOST/VM HOST/VM CLIENT CLIENT RBD RBD CEPH FS AA reliable and fullyreliable and fullydistributed block distributed block device, with a a Linux device, with Linux kernel client and a a kernel client and QEMU/KVM driver QEMU/KVM driver A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOS RADOS AA reliable, autonomous, distributed object store comprised of self-healing, self-managing, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes intelligent storage nodes
  • 5. Metadata Server • Manages metadata for a POSIX-compliant shared filesystem • Directory hierarchy • File metadata (owner, timestamps, mode, etc.) • Stores metadata in RADOS • Does not serve file data to clients • Only required for shared filesystem
  • 6. legacy metadata storage ● a scaling disaster ● name → inode → block list → data ● no inode table locality ● fragmentation – inode table – directory ● many seeks ● difficult to partition etc home usr var vmlinuz … hosts mtab passwd … bin include lib …
  • 7. ceph fs metadata storage ● ● 100 1 etc home usr var vmlinuz … hosts mtab passwd … block lists unnecessary inode table mostly useless ● 102 bin include lib … ● ● APIs are path-based, not inode-based no random table access, sloppy caching embed inodes inside directories ● good locality, prefetching ● leverage key/value object
  • 8. controlling metadata io ● view ceph-mds as cache ● reduce reads – ● reduce writes – ● dir+inode prefetching journal consolidate multiple writes large journal or log ● stripe over objects ● two tiers – – ● journal for short term per-directory for long term fast failure recovery directories
  • 10. load distribution ● coarse (static subtree) ● ● high management overhead fine (hash) ● always balanced ● less vulnerable to hot spots ● ● static subtree preserve locality ● good locality destroy hierarchy, locality can a dynamic approach capture benefits of both extremes? hash directories good balance hash files
  • 11.
  • 12.
  • 13.
  • 14.
  • 16. dynamic subtree partitioning ● scalable ● ● arbitrarily partition metadata adaptive ● ● ● move work from busy to idle servers replicate hot metadata efficient ● ● hierarchical partition preserve locality dynamic ● daemons can join/leave ● take over for failed nodes
  • 19. Metadata replication and availability
  • 21. client protocol ● highly stateful ● ● consistent, fine-grained caching seamless hand-off between ceph-mds daemons ● ● ● when client traverses hierarchy when metadata is migrated between servers direct access to OSDs for file I/O
  • 22. an example ● mount -t ceph 1.2.3.4:/ /mnt ● ● ● 3 ceph-mon RT 2 ceph-mds RT (1 ceph-mds to -osd RT) 2 ceph-mds RT (2 ceph-mds to -osd RT) ls -al ● open ● readdir – 1 ceph-mds RT (1 ceph-mds to -osd RT) ● stat each file ● ● ceph-osd cd /mnt/foo/bar ● ● ceph-mon close cp * /tmp ● N ceph-osd RT ceph-mds
  • 23. recursive accounting ● ceph-mds tracks recursive directory stats ● file sizes ● file and directory counts ● modification time ● virtual xattrs present full stats ● efficient $ ls ­alSh | head total 0 drwxr­xr­x 1 root            root      9.7T 2011­02­04 15:51 . drwxr­xr­x 1 root            root      9.7T 2010­12­16 15:06 .. drwxr­xr­x 1 pomceph         pg4194980 9.6T 2011­02­24 08:25 pomceph drwxr­xr­x 1 mcg_test1       pg2419992  23G 2011­02­02 08:57 mcg_test1 drwx­­x­­­ 1 luko            adm        19G 2011­01­21 12:17 luko drwx­­x­­­ 1 eest            adm        14G 2011­02­04 16:29 eest drwxr­xr­x 1 mcg_test2       pg2419992 3.0G 2011­02­02 09:34 mcg_test2 drwx­­x­­­ 1 fuzyceph        adm       1.5G 2011­01­18 10:46 fuzyceph drwxr­xr­x 1 dallasceph      pg275     596M 2011­01­14 10:06 dallasceph
  • 24. snapshots ● volume or subvolume snapshots unusable at petabyte scale ● ● snapshot arbitrary subdirectories simple interface ● hidden '.snap' directory ● no special tools $ mkdir foo/.snap/one $ ls foo/.snap one $ ls foo/bar/.snap _one_1099511627776 $ rm foo/myfile $ ls -F foo bar/ $ ls -F foo/.snap/one myfile bar/ $ rmdir foo/.snap/one # create snapshot # parent's snap name is mangled # remove snapshot
  • 25. multiple client implementations ● Linux kernel client ● ● mount -t ceph 1.2.3.4:/ /mnt export (NFS), Samba (CIFS) ● ceph-fuse ● libcephfs.so ● your app ● Ganesha (NFS) ● Hadoop (map/reduce) Ganesha libcephfs Samba libcephfs Hadoop libcephfs your app libcephfs Samba (CIFS) ● SMB/CIFS NFS ceph ceph-fuse fuse kernel
  • 26. APP APP LIBRADOS LIBRADOS APP APP HOST/VM HOST/VM RADOSGW RADOSGW RBD RBD AA bucket-based bucket-based AA library allowing REST gateway, library allowing REST gateway, apps to directly compatible with S3 apps to directly compatible with S3 access RADOS, access RADOS, and Swift and Swift with support for with support for C, C++, Java, C, C++, Java, Python, Ruby, Python, Ruby, AWESOME and PHP and PHP CEPH FS CEPH FS AA reliable and fullyreliable and fullydistributed block distributed block device, with a a Linux device, with Linux kernel client and a a kernel client and QEMU/KVM driver QEMU/KVM driver AA POSIX-compliant POSIX-compliant distributed file system, distributed file system, with a a Linux kernel with Linux kernel client and support for client and support for FUSE FUSE AWESOME AWESOME RADOS RADOS CLIENT CLIENT NEARLY AWESOME AWESOME AA reliable, autonomous, distributed object store comprised of self-healing, self-managing, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes intelligent storage nodes
  • 27. Path forward ● Testing ● ● ● Various workloads Multiple active MDSs Test automation ● ● ● Simple workload generator scripts Bug reproducers Hacking ● ● ● Bug squashing Long-tail features Integrations ● Ganesha, Samba, *stacks
  • 28.
  • 29. hard links? ● rare ● useful locality properties ● ● ● intra-directory parallel inter-directory on miss, file objects provide per-file backpointers ● degenerates to log(n) lookups ● optimistic read complexity
  • 30. what is journaled ● lots of state ● ● ● journaling is expensive up-front, cheap to recover non-journaled state is cheap, but complex (and somewhat expensive) to recover yes ● ● ● client sessions actual fs metadata modifications no ● ● ● cache provenance open files lazy flush ● client modifications may not be durable until fsync() or visible by another client

Notes de l'éditeur

  1. {"5":"If you aren’t running Ceph FS, you don’t need to deploy metadata servers.\n","11":"If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.\n","12":"When the second one comes along, it will intelligently partition the work by taking a subtree.\n","1":"<number>\n","13":"When the third MDS arrives, it will attempt to split the tree again.\n","2":"Finally, let’s talk about Ceph FS. Ceph FS is a parallel filesystem that provides a massively scalable, single-hierarchy, shared disk. If you use a shared drive at work, this is the same thing except that the same drive could be shared by everyone you’ve ever met (and everyone they’ve ever met).\n","14":"Same with the fourth.\n","3":"Remember all that meta-data we talked about in the beginning? Feels so long ago. It has to be stored somewhere! Something has to keep track of who created files, when they were created, and who has the right to access them. And something has to remember where they live within a tree. Enter MDS, the Ceph Metadata Server. Clients accessing Ceph FS data first make a request to an MDS, which provides what they need to get files from the right OSDs.\n","9":"So how do you have one tree and multiple servers?\n","26":"Ceph FS is feature-complete but still lacks the testing, quality assurance, and benchmarking work we feel it needs to recommend it for production use.\n","15":"A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.\n","4":"There are multiple MDSs!\n"}