2. Growing With Hardware Appliances
C D First PB C D Second PB
C D • Proprietary C D • Proprietary storage
C D storage hardware C D hardware
C D • Well-known C D • Same storage
C D storage vendor C D vendor
C D C D
C D C D
$14 b’zillion Another
C D C D
C D C D
$14 b’zillion
C D C D
C D C D
C D C D
47
3. C D
C D
C C D
C D
D
C D
C D
C++ C D
C D
C D
C D
C D
52
4. X
C D
C D
C C D
C D
D
C D
C D
C++ C D
C D
C D
C D
C D
53
5. C D
C D
C D
C D
C D
HUMAN !! C D
[DEVELOPER] C D
C D
C D
C D
C D
C D
54
6. Hard Drives Are Tiny Record Players and They Fail Often
jon_a_ross, Flickr / CC BY 2.0 71
7. D D
D D
D D =
D D
x 1 MILLION
55 times / day
72
9. philosophy design
OPEN SOURCE SCALABLE
COMMUNITY-FOCUSED NO SINGLE POINT OF FAILURE
SOFTWARE BASED
SELF-MANAGING
10. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
79
11. OSD OSD OSD OSD OSD
btrfs
FS FS FS FS FS
xfs
ext4
DISK DISK DISK DISK DISK
M M M
81
13. Monitors:
M
• Maintain cluster map
• Provide consensus for
distributed decision-making
• Must have an odd number
• These do not serve stored
objects to clients
OSDs:
• One per disk (recommended)
• At least three in a cluster
• Serve stored objects to
clients
• Intelligently peer to perform
replication tasks
• Supports object classes
83
14. C D
C D
C D
C D
C D
??
APP C D
C D
C D
C D
C D
C D
C D
15. C D
C D
C D
C D
C D
APP C D
C D
C D
C D
C D
C D
C D
16. C D
C D A-G
C D
C D
C D H-N
F
APP * C D
C D
C D O-T
C D
C D
C D U-Z
C D
24. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
84
26. LIBRADOS
L
• Provides direct access to
RADOS for applications
• C, C++, Python, PHP, Java
• No HTTP overhead
27. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
87
28. APP APP
REST
RADOSGW RADOSGW
LIBRADOS LIBRADOS
native
M
M M
88
29. RADOS Gateway:
• REST-based interface to
RADOS
• Supports buckets,
accounting
• Compatible with S3 and
Swift applications
89
30. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
90
34. RADOS Block Device:
• Storage of virtual disks in RADOS
• Allows decoupling of VMs and
containers
• Live migration!
• Images are striped across the
cluster
• Thin-provisioning
• Snapshots and cloning
40. old-style VM image creation
local disk Nova Glance
(VM images) compute (templates)
read X
● ephemeral
● expensive to create
X
X'
29
41. Why use block storage?
• Persistent
•
More familiar to users
• Not tied to a single host
•
Decouples compute and storage
•
Enables Live migration
•
Extra capabilities of storage system
•
Efficient snapshots
•
Different types of storage available
• Cloning for fast restore or scaling
42. Cinder volume creation
Cinder Cinder volume Glance
API volume driver (templates)
create image from X
locate X
location of X
read X
X
flexibility in where VM
images are stored
X'
reference to X'
31
43. Efficient volume creation
Cinder Cinder volume Glance
API volume driver (templates)
create image from X
locate X
location of X
clone X to X'
X
fast CoW clone
X'
X' complete
reference to X'
32