Truly non-intrusive OpenStack Cinder backup for mission critical systems

Truly non-intrusive cinder backup for
mission critical systems
Lightening talk by
Dipak Kumar Singh &
Deepak Gupta
on 06 Nov 2017 at OpenStack Summit, Sydney, Australia

Table of Content
 Challenges of Backup of Live system
Example of sanity error of data, Implication of OS buffer cache on
sanity at crash or live backup etc.
 Current Approach & Proposed Approach
Current approach, Idea on which proposed solution is based etc.
 Proposed Solution & POC
Design information of POC, result of POC’s validation, Known
Limitations, Next Step etc.
 Appendix
Common Questions, References, Experimental Data etc.

3
Background
▌Reliability of data and its availability is one of the key
requirements for Mission critical systems.
▌Openstack ensures data availability by keeping multiple copies of
data in storage nodes. It also facilitates backup for Disaster
Recovery.
▌However, when it comes to Point-in-Time backup of live system,
Openstack relies on volume snapshot and VM pause. Those
solutions are not fool proof because impact of OS buffer cache is
not accounted. Secondly, file system journaling & fsck does not
always work.
▌This presentation talks about problem scenarios and their
probable solutions for truly non-intrusive backup in Openstack.

Challenges of Backup of live system

5
Simple example of Sanity of Data (1/3)
Let’s understand sanity of data first in context of backup.
During a backup process, an application makes two changes shown below
Name: Rahul
Id: 11755
Blood Test Status: Wait
Blood Test File:
Name: Vicky
Id: 11755
…
File: PatientsRecord.txt
Size: 200 bytes
Name: Rahul
Id: 11755
Blood Test Status: NA
Blood Test File:
Name: Vicky
Id: 11755
…
Size: 200 bytes
...
File: Test_79.pdf
Size: 200 bytes
Name: Rahul
Id: 11755
Blood Test Status: OK
Blood Test File: Test79.pdf
Name: Vicky
Id: 11755
…
Size: 209 bytes
...
File: Test79.pdf
Size: 200 bytes
TIME State0(initially) State1(at T1) State2(at T2)
Second Change:
Two lines modified
at time T2
First Change:
A new file created
at time T1
One of change
at T2 points to
file created at
T1

6
Total of three states created by the application during backup process as
shown below.
Name: Rahul
Id: 11755
Blood Test File:
Name: Vicky
Id: 11755
…
Size: 200 bytes
Name: Rahul
Id: 11755
Blood Test File:
Name: Vicky
Id: 11755
…
Size: 200 bytes
...
File: Test_79.pdf
Size: 200 bytes
Name: Rahul
Id: 11755
Name: Vicky
Id: 11755
…
Size: 209 bytes
...
File: Test79.pdf
Size: 200 bytes
TIME State0(initially) State1(at T1) State2(at T2)
Data restore
must bring back
to any one of
these three
states created
by application.
Any other data
restore is a
sanity error.

7
What if second change is saved in backup but not first. Recovered File-
system will look like as shown in RHS.
Name: Rahul
Id: 11755
Blood Test File:
Name: Vicky
Id: 11755
…
Size: 200 bytes
Name: Rahul
Id: 11755
Blood Test File:
Name: Vicky
Id: 11755
…
Size: 200 bytes
...
File: Test_79.pdf
Size: 200 bytes
Name: Rahul
Id: 11755
Name: Vicky
Id: 11755
…
Size: 209 bytes
...
File: Test79.pdf
Size: 200 bytes
TIME State0(initially) State1(at T1) State2(at T2) Restored(at T3)
Name: Rahul
Id: 11755
Name: Vicky
Id: 11755
…
Size: 209 bytes
Restored a state
which was never
generated by the
application.
Data Restore

8
Point In Time Backup & Snapshot
▌Backup must be taken for any point of time in the history. For
example state0,state1 or state2 in the example of previous slide.
▌Technically, this concept is called ‘Point In Time’ Backup. PIT is
commonly used in context of Disaster Recovery. Backup software
are expected to take PIT backup.
▌Snapshot of volume ensures PIT data of the volume in use.
Then backup is taken from snapshot.
▌Since snapshot is created for the volume, data in Operating
Systems buffer cache is not captured. So snapshot is not
enough for PIT backup of live system.

9
Implication of Buffer Cache on sanity of data
▌Data from buffer cache is written to disk with goal of better I/O.
Invariably, the order in which applications writes to OS is different than order in
which data is written to Disk.
This is technically called out-or-order write. Example shown in diagram in next
slide. Therefore, sanity of data is lost.
▌Journaling is used to ensure sanity of data.
Journaling has its cost too in I/O. Therefore, default ext4 journaling mode
ensures file system consistency only, not file’s data.
An experiment has shown that file size became zero for new files created
within 30 seconds of crash in ext4 mounted with default options.
Refer to Appendix section for experimental data.
Note that in OS, file-system integrity is tradeoff against performance.

10
Example of Out-of-Order Write of OS buffer cache data
TIME S0 at T0 S1 at T1 S2 at T2 BadState1 at T2’ S2 at T2’’
AsinVolume
Aswrittenby
Applicationand
availableinOS
No such state
ever in OS
What if this goes
in snapshot.
Name: Rahul
Id: 11755
Test Status:
Test File:
…
File: Patients.txt
Size: 209 bytes
Name: Rahul
Id: 11755
Test Status:
Test File:
…
File: Patients.txt
Size: 209 bytes
...
File: Test_79.pdf
Size: 200 bytes
Name: Rahul
Id: 11755
Test Status: OK
Test File: Test79.pdf
…
File: Patients.txt
Size: 209 bytes
...
File: Test_79.pdf
Size: 200 bytes
Name: Rahul
Id: 11755
Test Status: OK
…
File: Patients.txt
Size: 209 bytes
Name: Rahul
Id: 11755
Test Status: OK
…
File: Patients.txt
Size: 209 bytes
...
File: Test79.pdf
Size: 200 bytes
Out of Order
write to disk
Second application
write flushed to
Volume before first
write
Both writes
flushed. OK.
An application’s
state S2 reached
Name: Rahul
Id: 11755
Test Status:
Test File:
…
File: Patients.txt
Size: 209 bytes

Current Approach & Proposed Approach

12
Current approach to take backup of Live system
▌The most common solution to take backup of live volume is a two
step process
a) Create a snapshot of live volume. That might involve momentarily
stopping the volume effectively the VM.
b) Take backup from snapshot.
▌Pause time by (a) is usually imperceptible in current implementations
of Virtual machines and Storage making it practically non-intrusive.
▌This approach is same as removing power cable from a machine,
then take backup of disks attached to the machine.
▌Journaling would ensure sanity of file-system but not data
unless performance costly journaling of ‘metadata+data’ is used.

13
Idea of Proposed Solution – Briefly stop buffer caching of OS
▌Proposed solution is based on very simple idea of
disabling buffer cache briefly for taking snapshot.
▌That is effectively making OS write-through
▌Simple CLI call on Linux will do this job
▌Proposed design and POC has been shared in subsequent slides
of this presentation.
▌Feedback on its benefit in real world, proposed design and POC is
solicited from the audience so that this idea can see the light of
the day.

14
How to disable buffer cache on Linux?
▌/proc/sys/vm/dirty_bytes
defines max dirty bytes in Linux buffer cache.
▌When this value of dirty pages is reached, subsequent write()
call will become write-through.
▌dirty_bytes can be changed to low value, not zero, in running
Linux using CLI sysctl
▌Lower dirty_bytes value increases write() latency to the tune of
milliseconds. Therefore, original value is reverted back as soon
as snapshot is created.

16
Design Guidelines
▌Disable buffer cache of Guest Linux OS temporarily
▌Use other standard steps of Cinder
Cinder’s snapshot and cinder backup from snapshot are used in solution so that
code change is minimal.
▌Output file:
Backup file produced by live system backup is exactly same as cinder’s regular
backup files.
Therefore, exactly same standard restore process of OpenStack is used for
restoring the data.
▌Proposed Use Model
 Adding new option in CLI cinder looks a good solution
$ cinder backup-create –-livebackup <instanceId> ...

17
Sequence of Steps for Backup
1. Retrieve current buffer
cache config
2. Make buffer cache zero
4. Restore back buffer
cache config
Guest OSCinder
3. Generate snapshot of
Volume(s) attached to
Guest OS
5. Take backup from
snapshot(s)
6. Delete snapshot(s)
Live Backup
Controller
Access to
Guest OS is
required.
Only impact is
on write I/O
latency in this
phase.
Guest OS
continue to run

18
Entity Relationship Diagram
Controller
Node
POC – Overview
▌POC was based on “Sequence of Steps for Backup” mentioned in
previous slide.
User
POC
Code
Guest
OS
(Linux)
Cinder
SWIFT
Storage
User Input
1. Instance Id,
2. Guest OS login
& password
Backup is
produced on
SWIFT as
usualDatabase

19
POC Validation - Using count of ‘new files with zero size’
▌The POC was able to take backup and restore data.
▌POC was also validated by checking impact of ‘delayed block
allocation’ in ext4 in the new approach.
▌Steps of Validation
Take backup of Guest OS when following script is running
for i in {0..600} # Create new files every second
do
echo hello > NewFile_$i.txt # Creating new file of 6 bytes
sleep 1
done
Restore backup. Count number of zero size files.
Does count decrease in new approach? Observation shared in next slide.

20
POC Validation – Result of ‘new files with zero size’
▌ Number of zero size files seen after restore of live backup of Linux Guest OS.
▌ At most one file of zero size is expected
Run 1 Run 2 Run 3 Run 4 Run a Run b Run c Run d
Zero size file count 32 6 34 30 1 1 1 1
32
6
34
30
1 1 1 1
Run Environment
Guest OS ubunto16 4.4.0-97-generic
File-system Default ext4 mount option used (barrier=1,data=ordered)
Run environment No load on the Hypervisor and Guest OS during testing
Steps Standard OpenStack forced
snapshot then backup used
POC enabled ( dirty_bytes made
very low during snapshot)
▌ Result: Magnitude of files with zero size has drastically reduced.

21
Known Limitation & Next step
▌Known Limitation
Min values supported by dirty_bytes is two pages only, not zero. It can’t be
disabled completely.
This is documented behavior of Linux. Reason of limit is being explored to find
way to make it absolute zero.
▌Next Step
Based on feedback received on idea and POC result, next course of AIs will
have to be decided.

Appendix
Common Questions, References etc.

23
Common Questions
▌ Can all volumes attached to a Guest OS be backed up together?
Solution does facilitate it. However, snapshot feature of OpenStack
should support it too.
▌ Can multiple Guest OSes be backup up together?
Same answer as above.
▌ Is it truly non-intrusive?
 Yes. Guest OS continues to run
 However, IO write latency would go high for some time.
 Depending of snapshot feature of Cinder, Guest OS might be
momentarily paused. If SAN’s hardware level snapshot is used, then
no pause would be involved.

24
References and Useful reading (1/2)
▌Impact of ‘Delayed Block Allocation’ on ext4’s sanity
 Linus Torvalds Upset over Ext3 and Ext4 http://www.linux-
magazine.com/Online/News/Linus-Torvalds-Upset-over-Ext3-and-Ext4
 ext4 and data loss by Jonathan Corbet https://lwn.net/Articles/322823/
▌Filesystem Journal
 Description ‘2.1 File-system consistency’ of
https://www.usenix.org/system/files/conference/fast12/chidambaram.pdf
 Ext4 journal options at section ‘3. Options’ of
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
▌Linux buffer cache
 http://www.tldp.org/LDP/sag/html/buffer-cache.html
 https://www.kernel.org/doc/Documentation/sysctl/vm.txt
 Section ‘14.3.2 Writeback Parameters’
https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.me
mory.html#cha.tuning.memory.vm

25
References and Useful reading (2/2)
▌Alternative solutions to take backup of live systems
 Volume Shadow Copy Service (VSS) on Windows
 Vmsync on VMware Hypervisor
▌Openstack’s backup and restore
 Back up and restore volumes and snapshots
https://docs.openstack.org/cinder/latest/admin/blockstorage-volume-backups.html
 Cinder CLI https://docs.openstack.org/python-cinderclient/latest/cli/details.html
▌Code used in experiments
At https://github.com/saurabh0095/Unix-IO-test

26
Experiment – Impact of ‘Delayed Block Allocation’ on ext4 (1/2)
▌Test Objective :
To demonstrate magnitude of data loss at OS crash due to delayed block
allocation, an experiment was performed.
▌Test Steps & Observations
Create New files with small amount of data every second.
The system is crashed by removing power cable.
At system recovery, 30-35 recent files are seen of zero size.
Expectation is that at most one file, which was getting written at the time of
crash, should be of zero size.
▌Cause
Zero file size is seen because default journaling mode of ext4 write metadata .
But actual data write is delayed due to ‘Delayed Block Allocation’ leading to
inconsistency by losing data of file.

27
Experimental Data – ‘Delayed Block Allocation’ on ext4 (2/2)
▌ Number of zero size files seen after Guest OS was crashed by simulating power
cable removal on two different virtual machines.
▌ At most one file of zero size is expected. Around 30 files were seen.
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10
Zero size file count 35 30 32 35 33 36 30 35 33 29
35
30
32
35
33
36
30
35
33
29
Two Run Environments - same result
Hypervisor VirtualBox 4.3.28 on Window Microsoft Hyper-V (2016 Standard)
Guest OS ubunto16 4.4.0-97-generic RHEL 7 3.10.0-123.el7.x86_64
Default ext4 mount option used (barrier=1,data=ordered)
No load on the Hypervisor and Guest OS during testing

28
Contact Information of Authors
Dipak Kumar Singh Deepak Gupta
Senior Solutions Architect, IT Platforms Deputy General Manager, IT Platform
NEC Technologies India Pvt. Ltd.
dipak.singh@india.nec.com deepak.gupta@india.nec.com
dipak123@gmail.com dkumargupta@gmail.com
http://linkedin.com/in/dipak123 https://www.linkedin.com/in/dkumargupta/
• https://www.openstack.org/summit/sydney-2017/summit-schedule/events/19305/truly-non-intrusive-openstack-cinder-
backup-for-mission-critical-systems
• https://www.openstack.org/assets/presentation-media/OpenStack-Truly-non-intrusive-Cinder-backup-1.1.pptx

Truly non-intrusive OpenStack Cinder backup for mission critical systems

Truly non-intrusive OpenStack Cinder backup for mission critical systems

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Truly non-intrusive OpenStack Cinder backup for mission critical systems

Similaire à Truly non-intrusive OpenStack Cinder backup for mission critical systems (20)

Dernier

Dernier (20)

Truly non-intrusive OpenStack Cinder backup for mission critical systems