Distributed Block-level Storage Management for OpenStack, by Danile lee
1. Distributed Block-level Storage
Management for OpenStack
OpenStack APAC Conference 2012
Daniel Lee
CCMA/ITRI
Cloud Computing Center for Mobile Applications
Industrial Technology Research Institute
( 雲端運算行動運用科技中心 )
2012/8/11 OpenStack APAC Conference 1 1
2012
Copyright 2009 ITRI 工業技術研究院
2. Outline
• Overview
• Brief on ITRI Cloud OS
• Cloud storage system in ITRI Cloud OS
• Distributed Main Storage System
• Distributed Secondary Storage System
• Integration of Cloud OS Storage System with
OpenStack
• Summary
• Q&A
2012/8/11 OpenStack APAC Conference 2
2012
Copyright 2009 ITRI 工業技術研究院
3. Overview
• Data is growing exponentially and moving on to Could. It
needs to be available at all time, secured and protected.
• 911 in Twin Towers, 311 earthquake in Japan and 420 outage
of Amazon datacenters are disasters.
– Block-level and fully redundant
– Backup / restore + wide-area
– Thin provisioning
– De-duplication
– Others
• Cloud OS Storage System has these functionality and is ready
to integrate to OpenStack
2012/8/11 OpenStack APAC Conference 3
2012
Copyright 2009 ITRI 工業技術研究院
4. ITRI Cloud OS
(An All-in-one IaaS Total Solution)
Virtual Data Center Management
Primary/Secondary
Virtual Data Center
Storage Management Physical Data
Provisioning (VMWare)
(EMC) Center
Management
Physical (IBM)
Resource Security
Management (Checkpoint)
(HP/Dell)
Inter-VM Load
Balancing (F5)
Power Management
(IBM) Inter-PM Load Balancing
and VM Fail-over (VMWare)
*Blue name in bracket indicates
the leading company of the component
2012/8/11 OpenStack APAC Conference 4
2012
Copyright 2009 ITRI 工業技術研究院
5. ITRI Cloud OS
Service Interface
Multiplexing VDC’s in a Physical DC
Multiplexing VDC’s in a Physical DC
Virtual Data Center Management Physical Data Center Management
Virtual Data Center Management Physical Data Center Management
Photo Video Web
Sharing Streami Confere
ng nce
VDC VDC
Provision VDC
and Deploy
Physical
•Cloud Application
Cluster •Cloud Service
Infrastructure
Developer Administrator
•Cloud Service •Carrier
Provider
Monitor and Configure Monitor, Diagnose and
Virtual Resources Configure
Physical Resources
2012/8/11 OpenStack APAC Conference 5
2012
Copyright 2009 ITRI 工業技術研究院
9. Cloud Storage System in Cloud OS
• Cloud storage aims at cloud-scale data
centers, and is designed to be scalable,
available and low-cost.
• Main components
– Distributed Main Storage subsystem – DMS
• Provide current image I/O request processing
– Distributed Secondary Storage subsystem – DSS
• Block-level incremental metadata only backup system
• Take volume snapshot / restore, de-duplication engine,
garbage collection and WADB engine
2012/8/11 OpenStack APAC Conference 9
2012
Copyright 2009 ITRI 工業技術研究院
10. Key Features of Cloud Storage
System from Users’ Perspective
• Use like raw, unformatted block devices
• Support volume size from 1 GB to 2 TB (64TB architecture-wise)
• Can be attached by different instances (one at a time for now)
• Multiple volumes can be mounted to the same instance
• Data is automatically replicated on different physical storage
server (N=3)
• Ability to clone volumes and/or create point-in-time snapshots of
volumes
• Wide-area backup for disaster recovery
• Snapshot can be scheduled by assigning a backup policy with time,
frequency, retention, and WADB option
• Save system image for starting multiple VMs using the same image
2012/8/11 OpenStack APAC Conference 10
2012
Copyright 2009 ITRI 工業技術研究院
11. Key Features of Cloud Storage
System from Providers’ Perspective
• Use of commodity hardware – reducing cost
(JBOD)
• Disk space management for up to multiple
petabytes
• Add disk, remove disk without interruption
• Thin provisioning
• Enables you to provision a specific level of I/O
performance if desired, by setting I/O throttle
• Block-level de-duplication for reducing space
requirement
2012/8/11 OpenStack APAC Conference 11
2012
Copyright 2009 ITRI 工業技術研究院
12. DMS in Cloud OS
• DMS - Distributed Main Storage
• Goal: A cloud scale network storage solution that provide
high capacity, high reliability, high availability and high
performance
• Features:
– Volume operations: Create / Attach / Detach / Delete
– High Reliability: data replication (N=3)
– High Availability: No single point of failure
– High Performance: client-side metadata/data cache for performance
– Thin provisioning is utilized to optimize utilization of available storage.
– Fast Volume Cloning
– Save Image; FastBoot
– Disk I/O Throttle
2012/8/11 OpenStack APAC Conference 12
2012
Copyright 2009 ITRI 工業技術研究院
13. DMS System Architecture
Compute Node
…
VM VM VM Service Node
Volume Volume Volume
Metadata
DMS In-kernel client NameNode
Data RR
Storage Node
…
DMS three components:
1. Namenode
DataNode 2. DMS in-kernel client
3. Datanode
2012/8/11 OpenStack APAC Conference 13
2012
Copyright 2009 ITRI 工業技術研究院
14. DMS Compute Node
Architecture
Compute Node
Domain U User Space Domain U Kernel Space
Applications
RPM
Read/Write
Sectors /dev/hda1 Block Frontend
Driver
Attach/Detach Attach a disk to Dom U
Volume
Xen / KVM I/O Virtualization
Block Backend
User-space Daemon
Driver
NameNode IOCTL
Metadata
In-Kernel Client
Payload
DataNodes Block
Devices
/dev/dms01
/dev/dms02
.
..
..
Domain 0 User Space Domain 0 Kernel Space
2012/8/11 OpenStack APAC Conference 14
2012
Copyright 2009 ITRI 工業技術研究院
15. Read Flow
Compute Node
VM
Volume Service Node
6. Response
1. Read request
(payload) 2. query metadata: (VolID, LBID)
DMS In-kernel Client 3. HBID, N-locations
NameNode
7. Report failed locations
4. Read Data 5. Read Data from another datanode
if previous-try failed
Storage Node Storage Node Storage Node
DataNode DataNode DataNode …
OpenStack APAC Conference 2012
Copyright 2008 ITRI 工業技術研究院 2012/8/11 15
16. Write Flow – Memory-Ack Approach
Compute Node
VM
Volume Service Node
6. Response 1. Write request
2. Request for new space:(VolID, LBID)
DMS In-kernel Client 3. HBID, N-locations
NameNode
7. Commit (Success, or
4. Write Data report failed locations)
5. Memory Ack
6. Disk Ack
• Memory-ack approach:
Storage Node Storage Node Storage Node In client, I/O response to
VM(step6) is right after
DataNode DataNode DataNode … receiving memory ack from
Datanodes (step5) .
OpenStack APAC Conference 2012
Copyright 2008 ITRI 工業技術研究院 2012/8/11 16
17. DSS in Cloud OS
• Share the homogeneous space with DMS
• Block-level Incremental backup
– Copy-on-write (COW) and volume cloning techniques
– Backup is just a snapshot of meta-data
• Support volume-level restore
• Block-level de-duplication
• Garbage collection of expired, un-used and redundant disk
blocks
• Wide-area data backup / restore and near-realtime
replication
• The secondary storage itself is failure-tolerant (HA)
2012/8/11 OpenStack APAC Conference 17
2012
Copyright 2009 ITRI 工業技術研究院
20. DSS – WADB
Backing up the data to a remote location at
the time volume snapshot is taken…
snapshot
Network / restore
VPN
Beijing Shang-Hai
2012/8/11 OpenStack APAC Conference 20
2012
Copyright 2009 ITRI 工業技術研究院
25. Integration of Cloud OS Storage
System with OpenStack
User portal
Web Services
User (OpenStack VDCM LOG PDCM PDCM User
API enabled)
VMM L2 PRM
Nova API Glance
Nova Manager DB DMS
VM1 VM2… VMn Nova Volume volume
…
Nova Compute Nova Network
SECS
RS
SLB
2012/8/11 OpenStack APAC Conference 25
2012
Copyright 2009 ITRI 工業技術研究院
26. Summary of Advantages
Feature Benefits
Unlimited storage highly scalable read/write access
Leverages commodity hardware (JBOD) No lock-in, lower price/GB
Built-in replication Fully redundant (default: N=3)
Used as system volume Save Image, Fast Boot, etc.
Thin Provisioning Optimize utilization of available storage
Highly available (HA) storage servers Server failover for high available
(DMS/DSS)
Snapshot and backup for block volumes Data protection and recovery for VM data
Policy based backup Free of worry – set it and forget about it
Wide-area data backup For disaster recovering
Block-level data de-duplication Store more data in the limited space
Standalone volume API, CLI available Easy integration with other compute systems
2012/8/11 OpenStack APAC Conference 26
2012
Copyright 2009 ITRI 工業技術研究院
27. More Feature in V2.0
Feature Benefits
Volume Cloning Fast cloning without copying data
Share virtual disk within cluster Support sharing of volume and cluster file system
Disk bandwidth QoS guarantee Dynamic adjustment at I/O request level
Storage space optimization N-way replication for write-intensive blocks
N-parity-group for read-intensive blocks
Wide-area data replication Near real-time wide-area replication
Integration with Compute / Volume Fully integrated to Compute for attaching block
volumes and reporting on usage
2012/8/11 OpenStack APAC Conference 27
2012
Copyright 2009 ITRI 工業技術研究院
Why don’t we just use Hadoop as block level storage What are the advantages over iSCSI – RAID + SW for backup restore and WADB When will it be available for OpenStack – 4Q2012? The limitations? Volume / total space, size of metadata, network, … Performance? 12 MB/sec or 49 MB/sec (multiple concurrent clients) How does it compare to EMC, Amazon S3? S3: Object Store; AWS Import/Export Amazon Elastic Block Storage (EBS): 1GB~1TB; ss saved in S3 Cinder, NetApp, Nexenta and SolidFire CIFS, GlusterFS, OCSF2 XCP
CRITICAL CHANGES NO HDFS In-kernal driver “ Native” networking disk driver (not xen back-end driver) Key design Why we Why not use iscii protocol Datanode prioritization Datanode hardware dependency
Provides a global storage abstraction on a large number of distributed disks Each storage block remains available despite failure in switch, server, or disk drive Scales to a very large number of concurrent accesses Exploits distributed object caching Initial thought: Use HDFS as the base for global storage management Support concurrency control among accesses to common storage blocks Support QoS guarantee or performance isolation among virtual clusters