SlideShare a Scribd company logo
1 of 37
Download to read offline
Cinder Enhancements for Replication (and more)
using Stateless Snapshots

Using Stateless Snapshots with Taskflow
•

Caitlin.Bestler@nexenta.com
Snapshots
• With Havana, all Cinder Volume Drivers support snapshots.
• But some vendors provide “stateless” volume snapshots:
– Taking the snapshot does not interfere with use of the Volume.
– The Volume remains fully readable and writeable
• Stateless/Low-overhead snapshots are useful for many other activities
– Replication, Migration, Fail-over, Archiving, Deploying Master Images, …
• What is proposed:
– A set of optional enhancements for Cinder Volume Drivers.
– A pattern of usage for Taskflows to take advantage of stateless
snapshots.
Backup, Migration, Replication and pre-failover Preparation

• Multiple methods, but a common pattern with the same
issues:
• Need for NDMP/OST-style direct appliance-to-appliance transfers.
– Volumes are big, transferring them twice is not acceptable
– Transferring them “through” the volume-manager is not
acceptable either
• Low-cost snapshots enable “stateless” methods
• Volume Drivers must report their capabilities:
– can-snap-stateless, storage-assist, etc.
Handful of Stones, Many Birds
• Proposal: Volumes Drivers to optionally implement:

– Snapshot Replication.
– Severing the tie to the Volume status.
– Reporting capabilities.
• Variety of ways that Taskflow could use those to:
– Backup, Migrate, Implement a variety of data protection strategies,
Enhance automatic failover, Improve deployment of cloned images
– Implement sophisticated snap-retention policies
– https://review.openstack.org/#/c/53480/
– https://blueprints.launchpad.net/cinder/+spec/volume-backup-createtask-flow
When Cinder manages non-local storage
•

•

This deployment
was cited as one
of two in the
deep dive
presentation.
But the first
implementation
of backup does
not work
acceptably for
these
deployments.

Nova

Cinder

VM Instance

Specific Volume
Manager

/dev/vda

Hypervisor

iSCSI
Initiator

Storage Backend
Storage Controller

iSCSI
Target
Current Cinder Backup
1. Volume Driver fetches content
2. Volume Driver puts Backup Object
as client for Object Storage
• Problem: doubles network traffic
– OK, compression reduces the
second step.
– But even with 90% compression
it would still be 1.1x just
transferring the data.

•

What we want is to do direct
transfer (3 on the diagram), which
would match other Cinder
backend actions

Backup Target

Cinder
Specific Volume
Manager

2

Swift

Block
Initiator

1

3
Storage Backend
Storage Controller

Block
Target
Shorten this - Ongoing Use of Volume with Concurrent
Backup/Replication/Etc.
• Existing Cinder pattern for
volume migration could be
applied:
– Use of override flag can
enable doing long
operations on an attached
volume
– Allows clients to continue
to use the volume while the
backup/replication (or
whatever) is in progress.

Client

Volume
Manager

Cinder

Storage
Controller

Backup
Target

Replicate
Replicate

Snapshot/
Replicate Snapshot/
[ Release Snapshot ]
Snap Write
I/O

Ack

I/O
Snap Write
Ack
Agenda

• Optional Cinder Enhancements

–Track Status Independent of the
Volume Status
– Snapshot Replication
– Volume Driver Attributes

• Taskflow Usage
Volume Status alone blocks 24/7 Volumes
• The problem is that the Volume Status is set to Backing Up
– Or Migrating, or Replicating, etc.
• Other Cinder actions are blocked by this:
– At most one backup/migration/whatever can be in progress at a time.
– You cannot reassign a volume while it is being backed up.
• Proposed Solution: Use a different Status variable
– Allow Backends to modify the Task state, rather than Volume state.
• Backend must declare itself to be “stateless” for this method.
• Progress is reported via the Task state just as it would have via the
Volume state.
Impact of allowing Alternate Status
• First, it is optional
– It allows implementations that can do long-term actions without
restricting access to the Volume to do so.
– Stateful implementations are not required to change their code.
• If taking a snapshot is expensive, you don’t want Cinder using this as a “shortcut”.

• This is safe. No reliance on end user knowing when to override.
• For “stateless” Volume Drivers:
– Cinder understands that launching long term methods (such as backup or
replicate) has no impact on the Volume itself.
– The action is actually being performed on a low-cost snapshot.
States of a Taskflow Using a Cinder Volume (such as Backup)
cinder.backup.manager
cinder.backup.flows.backup_volume_flow

from taskflow imports states
....
transitions = FlowObjectsTransitions()
transitions.add_transition(volume,
states.IN_PROGRESS,
“BACKING-UP”)
transitions.add_transition(volume, states.ERROR,
“FAILED”)
transitions.add_transition(volume, states.SUCCESS,
“SUCCESS”)
backup_flow = Flow(“backup_flow_api”, transitions)

https://review.openstack.org/#/c/54590/

In Progress
(Backing-up)

Error
(Failed)

Success
(Success)
Agenda

• Optional Cinder Enhancements
– Track Status Independent of the Volume
Status

–Snapshot Replication
– Volume Driver Attributes

• Taskflow Usage
Proposed Method: Replicate Snapshot
•

Why?
– Why not?
– For many/most implementations
snapshots can be migrated.
– Certain tasks are simpler with snapshots.
• Snapshots are not volatile.

•

Method on an existing snapshot.
– Specifies a different backend as the target.
• Must be under the same Volume
Driver.
• Snapshot formats are inherently
vendor specific.
– Optionally suppresses incremental
transfers, requiring a full copy from
scratch.

Cinder
Specific Volume
Manager
Replicate Snapshot X to Backend Y

Storage Backend

Storage Backend Y

Storage Controller

Storage Controller
Snapshot
X
Replicating Snapshots differs from Volume Migration
• Replicates snapshot rather than a volume.
• Original snapshot is not deleted.
• Volume Drivers may use incremental transfer techniques.
– Such as ZFS incremental snapshots.

• Snapshots have vendor specific formats
– So method to replicate them is inherently vendor specific.
– This allows for vendor specific optimization beyond incremental
snapshots:
• Compression.
• Multi-path transfer.
Periodic Incremental Snapshots approaches
Continuous Data Replication
• Replicate snapshot can provide Continuous Data
Replication if
– The Volume Driver supports incremental snapshots.
– The snapshots are performed quickly enough.
– Old snapshots are cleaned up automatically.

• Difference between “snapshots” and “remote
mirroring” is more a matter of degree than a
fundamental difference.
Benefits of Snapshot Replication

• Several tasks where Snapshot Replication helps
– “Warm Standby” – pool of server synchronized at snapshot
frequency.
– Enhanced deployment of VM boot images from a common master.
– Disaster Recovery.
– “Backup” to other servers.
– Volume migration.
– Check-in/Check-out of Volumes from a central storage server as
VM is deployed.
Replicated Snapshots are versatile
1. Restore a volume from a
Snapshot where the snapshot
was replicated.
– Fast restore of a volume, but not
at the optimum location.
• Or:
1. Replicate the Snapshot to a
preferred location
2. And clone it there.

Storage Backend holding
Snapshot
Snapshot
V.s3

Volume V

1

2
Preferred Location
Snapshot
V.s3

3
Volume V
Other Issues

• Where does Storage Backend come from?
– At least two methods:
• From a backend_id in the DB, as suggested in Avishay’s Volume
Mirroring proposal.
• By querying the Volume Driver for a list of backends that it controls.

• Volume Driver and/or Backend is responsible for tracking
dependencies created by any incremental snapshot feature.
– The delta snapshot must be made a full snapshot before the
referenced prior snapshot can be deleted on a given server.
Agenda

• Optional Cinder Enhancements
– Track Status Independent of the Volume
Status
– Snapshot Replication

–Volume Driver Attributes
• Taskflow Usage
Why Volume Driver Attributes
• We do not want to mandate that all snapshots be stateless.
– It’s relatively easy for copy-on-write systems, but not everyone is
copy-on-write.
• My philosophy for building consensus on open-source and standards:
– They should be flexible enough to allow my competition to be
stupid.
– Especially since they think what I’m doing is stupid.
• Volume Driver Attributes let vendor-neutral code decide what will
work well and what will not.
– Taking a snapshot does not optimize replication if it requires
making a copy of the data before making a copy of the data.
Proposed Attributes: Volume Driver Capabilities
• Problem: how to optimize long (bulk data intensive) operations of Cinder
volumes.
– Vendor specific algorithms are needed.
– But do we want to require every task be implemented by each vendor.

• Proposal: Have each Volume Driver advertise when they have certain
optional capabilities.
– If the capability is advertised, vendor independent taskflow code can take
advantage of it.
– One method can be useful for many taskflows.

• Publication of these attributed is optional
– If you don’t do X you don’t have to do anything to say you don’t do X.
– If you have no optional capabilities then you don’t have to say anything.
Suggested Implementation for Volume Driver attributes
• Suggestion use python
capabilities
• Included in source code of
Volume Driver
– Already used by some
Volume Drivers
• Easily referenced in code
• https://review.openstack.org
/#/c/54803/
• https://blueprints.launchpad.
net/cinder/+spec/backendactivity

cinder.volume.drivers.storwize_svc:
from cinder.volume import capabilities
class StorwizeSVCDriver(san.SanDriver):
...
@capabilities.storage_assist
def migrate_volume(self, ctxt, volume, host):
....

cinder.volume.manager:
from cinder import capabilities
class VolumeManager(manager.SchedulerDependentManager):
...
@utils.require_driver_initialized
def migrate_volume(self, ctxt, volume_id, host, force_host_copy=False):
...
if capabilities.is_supported(self.driver.migrate_volume, 'storage_assist'):
# Then check that destination host is the same backend.
elif capabilities.is_supported(self.driver.migrate_volume, 'local'):
# Then check that destination host is the same host.
...
Agenda

• Optional Cinder Enhancements
• Taskflow Usage

– Maintain Volume Pool / “Warm Standby”
–
–
–
–

Optimized provisioning of master image
Live Migration with Incremental Snapshots
Apply policy for Snapshot Retention and Replication
Check-in/Checkout Volume from Central Repository
Warm Standby – Before the Failover
1. Snapshot Volume V
2. Fully Replicate to new
Backend.
Periodically/Contiously:
3. Take new snapshot.
4. Transfer incremental
snapshot to standby
Backend.
5. Apply incremental
snapshot to make
new full image
versioned snapshot.

Backend currently
hosting Volume V

Backend selected
as standby for
Volume V

Volume V

1

2

Snapshot
V.s1

Snapshot
V.s1

Backend currently
hosting Volume V
Volume V

3
Snapshot
V.s2

4

Backend selected
as standby for
Volume V
Snapshot 5
V.s2
Failing Over to the Warm Standby
1. Current host Fails
2. Clone new Volume V from
Snapshot.
3. Select new standby target
and repeat prior slide.

Backend currently
hosting Volume V
Volume V

1

Backend selected
as standby for
Volume V
Snapshot
V.s1

2
Volume V
Adjacent to proposed Volume Mirroring solution
• Not fully overlapping, but frequently taken snapshots replicated
incrementally begins to resemble Volume Mirroring.
– It cannot match Volume Mirroring with near-instant relay of
transactions.
– But it consumes a lot less of network resources, especially
peak network resources.
– It is more flexible operationally. There is no need to setup
one-to-one mirror relationships in Cinder.
• We can offer both solutions and let end users decide which is
best for their needs.
Agenda

• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”

–Optimized provisioning of master image
– Live Migration with Incremental Snapshots
– Apply policy for Snapshot Retention and Replication
– Check-in/Checkout Volume from Central Repository
Proposed New Taskflow:
Provision boot volume with minimal overhead
• Optimize for common boot
images that provide a
common OS for many
VMs.
• Creating two VMs from the
same template should not
require 2x the bandwidth.

Glance
Volume Template VT

Cinder Backend
Volume V1 based on
Template VT

Volume V2 based on
Template VT
Snapshot optimized Image Provisioning - 1
1. Use Glance to create
reference image
– Already adapted for a
specific deployment
format.
2. Take snapshot of that
volume.
3. Clone additional targets
from that snapshot.
4. Repeat as more VMs from
the same template are
launched.

Glance
Volume Template VT

Cinder Backend
Volume V1 based on
Template VT

2

Volume V2 based on
Template VT

3
Snapshot V-Prime based
on initial Volume V1

Volume V2 based on
Template VT

4
Agenda

• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
– Optimized provisioning of master image

– Live Migration with Incremental Snapshots
– Apply policy for Snapshot Retention and Replication
– Check-in/Checkout Volume from Central Repository
Add live migration using incremental snapshots

• This is essentially how Hypervisors Live Migrate VMs
– Loop
• Make incremental snapshot
• If empty
– Break

• Send incremental snapshot to destination

– De-activate source volume
– Clone volume at destination from snapshots.
Agenda

• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
– Optimized provisioning of master image
– Live Migration with Incremental Snapshots

– Apply policy for Snapshot Retention and
Replication
– Check-in/Checkout Volume
Possible Taskflow: Manage retention/replication of snapshots

• Set a policy for retention of snapshots
– Frequency for taking snapshots.
– Which snapshots to retain.

• Automatically replicate some snapshots to other
backend targets.
• Backup some to Object storage
Agenda

• Optional Cinder Enhancements
• Taskflow Usage
–
–
–
–

Maintain Volume Pool / “Warm Standby”
Optimized provisioning of master image
Live Migration with Incremental Snapshots
Apply policy for Snapshot Retention and Replication

– Check-in/Checkout Volume
Possible taskflow: Check-out and Check-in of Volumes
• Use Case: Persistent disk images for intermittent computer jobs.
– Example: non-continuous compute job needs disk image near
it whenever it is launched.
– Example: VDI desktop needs access to persistent disk image.
• This is especially useful when this is a thin image that relies on the
central image for blocks not altered or referenced yet.

– Periodically snapshot and post delta to the central repository.
– Check-in when done with final snapshot.
– Then delete the remote volume and change status in central
repository to allow a new check-out.
Steps for Check-out/Check-In.
1.
2.

Snapshot the Volume being checked out
Replicate the Snapshot to a HostAdjacent (or co-located backend)
3. De-activate the Volume on the Master.
4. Clone the Volume on the host-adjacent
backend.
5. Periodically snapshot the Volume on the
host-adjacent backend.
6. Replicate those Snapshots to the Master
storage backend
7. Snapshot a final time on the Storage
Backend.
8. Replicate to the Master
9. Remove volume on the host-adjacent
backend.
10. Clone new volume on the master
backend from the final snapshot

Host-Adjacent
Storage
Backend
Volume V

4 9

Snapshot
V.s3

2

Master Storage
Backend

3 10
Snapshot
V.s1

Volume V

Snapsho
V.s2

Snapshot
V.s4

5

Snapshot
V.s3

1

Snapshot
V.s5

7

Snapshot
V.s4

6

Snapshot
V.s5

8
Summary
• Taskflow can automate several Cinder releated tasks
– This logic can be vendor neutral
• But to do so efficiently it needs a handful of Cinder enhancements

•

– Optional separation from Volume Status for long-running activities.
– Snapshot Replication.
– Volume Driver Attributes.
Wiki: https://wiki.openstack.org/wiki/CERSS

• Questions?:
– Caitlin.Bestler@nexenta.com, irc:caitlin56.
– Victor.Rodionov@nexenta.com, irc:vito-ordaz.

More Related Content

What's hot

QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 
KVM Tuning @ eBay
KVM Tuning @ eBayKVM Tuning @ eBay
KVM Tuning @ eBayXu Jiang
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoopChiou-Nan Chen
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleThe Linux Foundation
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...The Linux Foundation
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Studyshane_gibson
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...The Linux Foundation
 
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebula Project
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformanceProfessionalVMware
 
XPDDS18: Xenwatch Multithreading - Dongli Zhang, Oracle
XPDDS18: Xenwatch Multithreading - Dongli Zhang, OracleXPDDS18: Xenwatch Multithreading - Dongli Zhang, Oracle
XPDDS18: Xenwatch Multithreading - Dongli Zhang, OracleThe Linux Foundation
 
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NECXPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NECThe Linux Foundation
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingThe Linux Foundation
 
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...OpenNebula Project
 
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...Ceph Community
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project
 
KVM tools and enterprise usage
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usagevincentvdk
 

What's hot (20)

Xen Debugging
Xen DebuggingXen Debugging
Xen Debugging
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
KVM Tuning @ eBay
KVM Tuning @ eBayKVM Tuning @ eBay
KVM Tuning @ eBay
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
 
Kdump
KdumpKdump
Kdump
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Study
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
 
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting Performance
 
XPDDS18: Xenwatch Multithreading - Dongli Zhang, Oracle
XPDDS18: Xenwatch Multithreading - Dongli Zhang, OracleXPDDS18: Xenwatch Multithreading - Dongli Zhang, Oracle
XPDDS18: Xenwatch Multithreading - Dongli Zhang, Oracle
 
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NECXPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC
XPDS14 - Towards Massive Server Consolidation - Filipe Manco, NEC
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debugging
 
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
 
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
KVM tools and enterprise usage
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usage
 

Similar to Cinder enhancements-for-replication-using-stateless-snapshots

IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
Master VMware Performance and Capacity Management
Master VMware Performance and Capacity ManagementMaster VMware Performance and Capacity Management
Master VMware Performance and Capacity ManagementIwan Rahabok
 
The road to enterprise ready open stack storage as service
The road to enterprise ready open stack storage as serviceThe road to enterprise ready open stack storage as service
The road to enterprise ready open stack storage as serviceSean Cohen
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_hamlraviol
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)MarkTaylorIBM
 
Backup Exec Blueprints▶ Deduplication
Backup Exec Blueprints▶ DeduplicationBackup Exec Blueprints▶ Deduplication
Backup Exec Blueprints▶ DeduplicationSymantec
 
Paul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackPaul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackShapeBlue
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Architecting for the cloud scability-availability
Architecting for the cloud scability-availabilityArchitecting for the cloud scability-availability
Architecting for the cloud scability-availabilityLen Bass
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBMariaDB plc
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
Training netbackup6x2
Training netbackup6x2Training netbackup6x2
Training netbackup6x2M Shariff
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategyMariaDB plc
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryMarkTaylorIBM
 
Sample Solution Blueprint
Sample Solution BlueprintSample Solution Blueprint
Sample Solution BlueprintMike Alvarado
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategyMariaDB plc
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 

Similar to Cinder enhancements-for-replication-using-stateless-snapshots (20)

IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Master VMware Performance and Capacity Management
Master VMware Performance and Capacity ManagementMaster VMware Performance and Capacity Management
Master VMware Performance and Capacity Management
 
The road to enterprise ready open stack storage as service
The road to enterprise ready open stack storage as serviceThe road to enterprise ready open stack storage as service
The road to enterprise ready open stack storage as service
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
 
Backup Exec Blueprints▶ Deduplication
Backup Exec Blueprints▶ DeduplicationBackup Exec Blueprints▶ Deduplication
Backup Exec Blueprints▶ Deduplication
 
Paul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackPaul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStack
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Architecting for the cloud scability-availability
Architecting for the cloud scability-availabilityArchitecting for the cloud scability-availability
Architecting for the cloud scability-availability
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
Training netbackup6x2
Training netbackup6x2Training netbackup6x2
Training netbackup6x2
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Sample Solution Blueprint
Sample Solution BlueprintSample Solution Blueprint
Sample Solution Blueprint
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Cinder enhancements-for-replication-using-stateless-snapshots

  • 1. Cinder Enhancements for Replication (and more) using Stateless Snapshots Using Stateless Snapshots with Taskflow • Caitlin.Bestler@nexenta.com
  • 2. Snapshots • With Havana, all Cinder Volume Drivers support snapshots. • But some vendors provide “stateless” volume snapshots: – Taking the snapshot does not interfere with use of the Volume. – The Volume remains fully readable and writeable • Stateless/Low-overhead snapshots are useful for many other activities – Replication, Migration, Fail-over, Archiving, Deploying Master Images, … • What is proposed: – A set of optional enhancements for Cinder Volume Drivers. – A pattern of usage for Taskflows to take advantage of stateless snapshots.
  • 3. Backup, Migration, Replication and pre-failover Preparation • Multiple methods, but a common pattern with the same issues: • Need for NDMP/OST-style direct appliance-to-appliance transfers. – Volumes are big, transferring them twice is not acceptable – Transferring them “through” the volume-manager is not acceptable either • Low-cost snapshots enable “stateless” methods • Volume Drivers must report their capabilities: – can-snap-stateless, storage-assist, etc.
  • 4. Handful of Stones, Many Birds • Proposal: Volumes Drivers to optionally implement: – Snapshot Replication. – Severing the tie to the Volume status. – Reporting capabilities. • Variety of ways that Taskflow could use those to: – Backup, Migrate, Implement a variety of data protection strategies, Enhance automatic failover, Improve deployment of cloned images – Implement sophisticated snap-retention policies – https://review.openstack.org/#/c/53480/ – https://blueprints.launchpad.net/cinder/+spec/volume-backup-createtask-flow
  • 5. When Cinder manages non-local storage • • This deployment was cited as one of two in the deep dive presentation. But the first implementation of backup does not work acceptably for these deployments. Nova Cinder VM Instance Specific Volume Manager /dev/vda Hypervisor iSCSI Initiator Storage Backend Storage Controller iSCSI Target
  • 6. Current Cinder Backup 1. Volume Driver fetches content 2. Volume Driver puts Backup Object as client for Object Storage • Problem: doubles network traffic – OK, compression reduces the second step. – But even with 90% compression it would still be 1.1x just transferring the data. • What we want is to do direct transfer (3 on the diagram), which would match other Cinder backend actions Backup Target Cinder Specific Volume Manager 2 Swift Block Initiator 1 3 Storage Backend Storage Controller Block Target
  • 7. Shorten this - Ongoing Use of Volume with Concurrent Backup/Replication/Etc. • Existing Cinder pattern for volume migration could be applied: – Use of override flag can enable doing long operations on an attached volume – Allows clients to continue to use the volume while the backup/replication (or whatever) is in progress. Client Volume Manager Cinder Storage Controller Backup Target Replicate Replicate Snapshot/ Replicate Snapshot/ [ Release Snapshot ] Snap Write I/O Ack I/O Snap Write Ack
  • 8. Agenda • Optional Cinder Enhancements –Track Status Independent of the Volume Status – Snapshot Replication – Volume Driver Attributes • Taskflow Usage
  • 9. Volume Status alone blocks 24/7 Volumes • The problem is that the Volume Status is set to Backing Up – Or Migrating, or Replicating, etc. • Other Cinder actions are blocked by this: – At most one backup/migration/whatever can be in progress at a time. – You cannot reassign a volume while it is being backed up. • Proposed Solution: Use a different Status variable – Allow Backends to modify the Task state, rather than Volume state. • Backend must declare itself to be “stateless” for this method. • Progress is reported via the Task state just as it would have via the Volume state.
  • 10. Impact of allowing Alternate Status • First, it is optional – It allows implementations that can do long-term actions without restricting access to the Volume to do so. – Stateful implementations are not required to change their code. • If taking a snapshot is expensive, you don’t want Cinder using this as a “shortcut”. • This is safe. No reliance on end user knowing when to override. • For “stateless” Volume Drivers: – Cinder understands that launching long term methods (such as backup or replicate) has no impact on the Volume itself. – The action is actually being performed on a low-cost snapshot.
  • 11. States of a Taskflow Using a Cinder Volume (such as Backup) cinder.backup.manager cinder.backup.flows.backup_volume_flow from taskflow imports states .... transitions = FlowObjectsTransitions() transitions.add_transition(volume, states.IN_PROGRESS, “BACKING-UP”) transitions.add_transition(volume, states.ERROR, “FAILED”) transitions.add_transition(volume, states.SUCCESS, “SUCCESS”) backup_flow = Flow(“backup_flow_api”, transitions) https://review.openstack.org/#/c/54590/ In Progress (Backing-up) Error (Failed) Success (Success)
  • 12. Agenda • Optional Cinder Enhancements – Track Status Independent of the Volume Status –Snapshot Replication – Volume Driver Attributes • Taskflow Usage
  • 13. Proposed Method: Replicate Snapshot • Why? – Why not? – For many/most implementations snapshots can be migrated. – Certain tasks are simpler with snapshots. • Snapshots are not volatile. • Method on an existing snapshot. – Specifies a different backend as the target. • Must be under the same Volume Driver. • Snapshot formats are inherently vendor specific. – Optionally suppresses incremental transfers, requiring a full copy from scratch. Cinder Specific Volume Manager Replicate Snapshot X to Backend Y Storage Backend Storage Backend Y Storage Controller Storage Controller Snapshot X
  • 14. Replicating Snapshots differs from Volume Migration • Replicates snapshot rather than a volume. • Original snapshot is not deleted. • Volume Drivers may use incremental transfer techniques. – Such as ZFS incremental snapshots. • Snapshots have vendor specific formats – So method to replicate them is inherently vendor specific. – This allows for vendor specific optimization beyond incremental snapshots: • Compression. • Multi-path transfer.
  • 15. Periodic Incremental Snapshots approaches Continuous Data Replication • Replicate snapshot can provide Continuous Data Replication if – The Volume Driver supports incremental snapshots. – The snapshots are performed quickly enough. – Old snapshots are cleaned up automatically. • Difference between “snapshots” and “remote mirroring” is more a matter of degree than a fundamental difference.
  • 16. Benefits of Snapshot Replication • Several tasks where Snapshot Replication helps – “Warm Standby” – pool of server synchronized at snapshot frequency. – Enhanced deployment of VM boot images from a common master. – Disaster Recovery. – “Backup” to other servers. – Volume migration. – Check-in/Check-out of Volumes from a central storage server as VM is deployed.
  • 17. Replicated Snapshots are versatile 1. Restore a volume from a Snapshot where the snapshot was replicated. – Fast restore of a volume, but not at the optimum location. • Or: 1. Replicate the Snapshot to a preferred location 2. And clone it there. Storage Backend holding Snapshot Snapshot V.s3 Volume V 1 2 Preferred Location Snapshot V.s3 3 Volume V
  • 18. Other Issues • Where does Storage Backend come from? – At least two methods: • From a backend_id in the DB, as suggested in Avishay’s Volume Mirroring proposal. • By querying the Volume Driver for a list of backends that it controls. • Volume Driver and/or Backend is responsible for tracking dependencies created by any incremental snapshot feature. – The delta snapshot must be made a full snapshot before the referenced prior snapshot can be deleted on a given server.
  • 19. Agenda • Optional Cinder Enhancements – Track Status Independent of the Volume Status – Snapshot Replication –Volume Driver Attributes • Taskflow Usage
  • 20. Why Volume Driver Attributes • We do not want to mandate that all snapshots be stateless. – It’s relatively easy for copy-on-write systems, but not everyone is copy-on-write. • My philosophy for building consensus on open-source and standards: – They should be flexible enough to allow my competition to be stupid. – Especially since they think what I’m doing is stupid. • Volume Driver Attributes let vendor-neutral code decide what will work well and what will not. – Taking a snapshot does not optimize replication if it requires making a copy of the data before making a copy of the data.
  • 21. Proposed Attributes: Volume Driver Capabilities • Problem: how to optimize long (bulk data intensive) operations of Cinder volumes. – Vendor specific algorithms are needed. – But do we want to require every task be implemented by each vendor. • Proposal: Have each Volume Driver advertise when they have certain optional capabilities. – If the capability is advertised, vendor independent taskflow code can take advantage of it. – One method can be useful for many taskflows. • Publication of these attributed is optional – If you don’t do X you don’t have to do anything to say you don’t do X. – If you have no optional capabilities then you don’t have to say anything.
  • 22. Suggested Implementation for Volume Driver attributes • Suggestion use python capabilities • Included in source code of Volume Driver – Already used by some Volume Drivers • Easily referenced in code • https://review.openstack.org /#/c/54803/ • https://blueprints.launchpad. net/cinder/+spec/backendactivity cinder.volume.drivers.storwize_svc: from cinder.volume import capabilities class StorwizeSVCDriver(san.SanDriver): ... @capabilities.storage_assist def migrate_volume(self, ctxt, volume, host): .... cinder.volume.manager: from cinder import capabilities class VolumeManager(manager.SchedulerDependentManager): ... @utils.require_driver_initialized def migrate_volume(self, ctxt, volume_id, host, force_host_copy=False): ... if capabilities.is_supported(self.driver.migrate_volume, 'storage_assist'): # Then check that destination host is the same backend. elif capabilities.is_supported(self.driver.migrate_volume, 'local'): # Then check that destination host is the same host. ...
  • 23. Agenda • Optional Cinder Enhancements • Taskflow Usage – Maintain Volume Pool / “Warm Standby” – – – – Optimized provisioning of master image Live Migration with Incremental Snapshots Apply policy for Snapshot Retention and Replication Check-in/Checkout Volume from Central Repository
  • 24. Warm Standby – Before the Failover 1. Snapshot Volume V 2. Fully Replicate to new Backend. Periodically/Contiously: 3. Take new snapshot. 4. Transfer incremental snapshot to standby Backend. 5. Apply incremental snapshot to make new full image versioned snapshot. Backend currently hosting Volume V Backend selected as standby for Volume V Volume V 1 2 Snapshot V.s1 Snapshot V.s1 Backend currently hosting Volume V Volume V 3 Snapshot V.s2 4 Backend selected as standby for Volume V Snapshot 5 V.s2
  • 25. Failing Over to the Warm Standby 1. Current host Fails 2. Clone new Volume V from Snapshot. 3. Select new standby target and repeat prior slide. Backend currently hosting Volume V Volume V 1 Backend selected as standby for Volume V Snapshot V.s1 2 Volume V
  • 26. Adjacent to proposed Volume Mirroring solution • Not fully overlapping, but frequently taken snapshots replicated incrementally begins to resemble Volume Mirroring. – It cannot match Volume Mirroring with near-instant relay of transactions. – But it consumes a lot less of network resources, especially peak network resources. – It is more flexible operationally. There is no need to setup one-to-one mirror relationships in Cinder. • We can offer both solutions and let end users decide which is best for their needs.
  • 27. Agenda • Optional Cinder Enhancements • Taskflow Usage – Maintain Volume Pool / “Warm Standby” –Optimized provisioning of master image – Live Migration with Incremental Snapshots – Apply policy for Snapshot Retention and Replication – Check-in/Checkout Volume from Central Repository
  • 28. Proposed New Taskflow: Provision boot volume with minimal overhead • Optimize for common boot images that provide a common OS for many VMs. • Creating two VMs from the same template should not require 2x the bandwidth. Glance Volume Template VT Cinder Backend Volume V1 based on Template VT Volume V2 based on Template VT
  • 29. Snapshot optimized Image Provisioning - 1 1. Use Glance to create reference image – Already adapted for a specific deployment format. 2. Take snapshot of that volume. 3. Clone additional targets from that snapshot. 4. Repeat as more VMs from the same template are launched. Glance Volume Template VT Cinder Backend Volume V1 based on Template VT 2 Volume V2 based on Template VT 3 Snapshot V-Prime based on initial Volume V1 Volume V2 based on Template VT 4
  • 30. Agenda • Optional Cinder Enhancements • Taskflow Usage – Maintain Volume Pool / “Warm Standby” – Optimized provisioning of master image – Live Migration with Incremental Snapshots – Apply policy for Snapshot Retention and Replication – Check-in/Checkout Volume from Central Repository
  • 31. Add live migration using incremental snapshots • This is essentially how Hypervisors Live Migrate VMs – Loop • Make incremental snapshot • If empty – Break • Send incremental snapshot to destination – De-activate source volume – Clone volume at destination from snapshots.
  • 32. Agenda • Optional Cinder Enhancements • Taskflow Usage – Maintain Volume Pool / “Warm Standby” – Optimized provisioning of master image – Live Migration with Incremental Snapshots – Apply policy for Snapshot Retention and Replication – Check-in/Checkout Volume
  • 33. Possible Taskflow: Manage retention/replication of snapshots • Set a policy for retention of snapshots – Frequency for taking snapshots. – Which snapshots to retain. • Automatically replicate some snapshots to other backend targets. • Backup some to Object storage
  • 34. Agenda • Optional Cinder Enhancements • Taskflow Usage – – – – Maintain Volume Pool / “Warm Standby” Optimized provisioning of master image Live Migration with Incremental Snapshots Apply policy for Snapshot Retention and Replication – Check-in/Checkout Volume
  • 35. Possible taskflow: Check-out and Check-in of Volumes • Use Case: Persistent disk images for intermittent computer jobs. – Example: non-continuous compute job needs disk image near it whenever it is launched. – Example: VDI desktop needs access to persistent disk image. • This is especially useful when this is a thin image that relies on the central image for blocks not altered or referenced yet. – Periodically snapshot and post delta to the central repository. – Check-in when done with final snapshot. – Then delete the remote volume and change status in central repository to allow a new check-out.
  • 36. Steps for Check-out/Check-In. 1. 2. Snapshot the Volume being checked out Replicate the Snapshot to a HostAdjacent (or co-located backend) 3. De-activate the Volume on the Master. 4. Clone the Volume on the host-adjacent backend. 5. Periodically snapshot the Volume on the host-adjacent backend. 6. Replicate those Snapshots to the Master storage backend 7. Snapshot a final time on the Storage Backend. 8. Replicate to the Master 9. Remove volume on the host-adjacent backend. 10. Clone new volume on the master backend from the final snapshot Host-Adjacent Storage Backend Volume V 4 9 Snapshot V.s3 2 Master Storage Backend 3 10 Snapshot V.s1 Volume V Snapsho V.s2 Snapshot V.s4 5 Snapshot V.s3 1 Snapshot V.s5 7 Snapshot V.s4 6 Snapshot V.s5 8
  • 37. Summary • Taskflow can automate several Cinder releated tasks – This logic can be vendor neutral • But to do so efficiently it needs a handful of Cinder enhancements • – Optional separation from Volume Status for long-running activities. – Snapshot Replication. – Volume Driver Attributes. Wiki: https://wiki.openstack.org/wiki/CERSS • Questions?: – Caitlin.Bestler@nexenta.com, irc:caitlin56. – Victor.Rodionov@nexenta.com, irc:vito-ordaz.