5. Motivation
• Backup and replication are vital
– We have our own cloud infrastructure
– Requiring high availability of customers’ data
– With cost-effective commodity hardware and software
5
Operating data
Backup data
Replicated data
Primary DC Secondary DC
6. Requirements
• Functionality
– Getting consistent diff data for backup/replication
achieving small RPO
• Performance
– Without usual full-scans
– Small overhead
• Various kinds of data support
– Databases
– Blob data
– Full-text-search indexes
– ...
6
7. • A Linux kernel device driver to provide wrapper
block devices and related userland tools
• Provides consistent diff extraction functionality
for efficient backup and replication
• “WalB” means “Block-level WAL”
A solution: WalB
7
WalB
storage
Backup
storage
Replicated
storage
Diffs
9. How to get diffs
• Full-scan and compare
• Partial-scan with bitmaps (or indexes)
• Scan logs directly from WAL storages
9
a b c d e a b’ c d e’ b’ e’1 4
a b c d e a b’ c d e’ b’ e’1 4
00000 01001
a b c d e a b’ c d e’ b’ e’1 4
b’ e’1 4
Data at t0 Data at t1 Diffs from t0 to t1
11. WalB architecture
A walb device
as a wrapper
A block device for data
(Data device)
A block device for logs
(Log device)
Read Write Logs
Not special format An original format
Any application
(File system, DBMS, etc)
Walb log
Extractor
Walb dev
controller
Control
11
13. Checksum
Ring buffer inside
13
2nd
written data
Ring buffer
Log pack
Logpack
header
block
1st
written data …
The oldest logpackThe latest logpack
1st log record
IO address
IO size
...
Logpack lsid
Log pack
header
block
Num of records
Total IO size
2nd log record
IO address
IO size
...
...
14. Redo/undo logs
• Redo logs
– to go forward
• Undo logs
– to go backward
14
0 2
Data at t0
0 2
Data at t1
Redo logs
Undo logs
15. How to create redo/undo logs
• Undo logs requires additional read IOs
• Walb devices do not generate undo-logs
15
Data Log
WalB
(1) write IOs
submitted
(2) read current
data
(3) write
undo logs
Data Log
WalB
(1) write IOs
submitted
(2) write
redo logs
Generating redo logs Generating undo logs
16. Consistent full backup
• The ring buffer must have enough capacity to store the
logs generated from t0 to t1
Primary host Backup host
Walb device
(online)
Log device
Full archive
Logs
Apply
(A)
(B)
Read the online volume
(inconsistent)
t1
Get consistent
image at t1
(C)
(A) (B) (C)
Time
t0 t2
Get logs from
t0 to t1
16
17. Consistent incremental backup
• To manage multiple backup generations,
defer application or generate undo-logs during application
Primary host Backup host
Walb device
(online)
Log device
Full archive
Logs
Apply
(A)
Get logs from
t0 to t1
(B)
(A) (B)
Time
t0 t2t1
Previous backup
timestamp
Backuped Application
can be deferred
17
18. Backup and replication
18
Primary host Backup host
Walb device
(online)
Log device
Full archive
Logs
Apply
(A)
Get logs from
t0 to t1 Replicated
(A’)
(A)
Time
t1 t3
Remote host
Full archive
Logs
Apply (B’)
(B)
(B)
t2
Backuped
t0
Replication delay
20. Alternative solutions
• DRBD
– Specialized for replication
– DRBD proxy is required for long-distance replication
• Dm-snap
– Snapshot management using COW (copy-on-write)
– Full-scans are required to get diffs
• Dm-thin
– Snapshot management using COW and
reference counters
– Fragmentation is inevitable
20
22. WalB pros and cons
• Pros
– Small response overhead
– Fragmentation never occur
– Logs can be retrieved with sequential scans
• Cons
– 2x bandwidth is required for writes
22
25. Requirements to be a block device
• Read/write consistency
– It must read the latest written data
• Storage state uniqueness
– It must replay logs not changing the history
• Durability of flushed data
– It must make flushed write IOs be persistent
• Crash recovery without undo
– It must recover the data using redo-logs only
25
26. WalB algorithm
• Two IO processing methods
– Easy: very simple, large overhead
– Fast: a bit complex, small overhead
• Overlapped IO serialization
• Flush/fua for durability
• Crash recovery and checkpointing
26
27. IO processing flow (easy algorithm)
27
Submitted
Time
Completed
Packed
Log submitted Log completed
Wait for log flushed and overlapped IOs done
Data submitted Data completed
Write
Submitted
Time
Completed
Data submitted Data completed
Read
Log IO response Data IO response
WalB write IO response
Data IO response
28. IO processing flow (fast algorithm)
28
Submitted
Time
Completed
Packed
Log submitted Log completed
Wait for log flushed and overlapped IOs done
Data submitted Data completed
Log IO response Data IO response
WalB write IO response
Pdata inserted Pdata deleted
Write
Submitted
Time
Completed
(Data submitted) (Data completed)
Read
Pdata copied
Data IO response
29. Pending data
• A red-black tree
– provided as a kernel
library
• Sorted by IO address
– to find overlapped IOs
quickly
• A spinlock
– for exclusive accesses
29
Node0
addr
size
data
NodeN
addr
size
data
Node1
addr
size
data
...
...
30. Overlapped IO serialization
• Required for storage state uniqueness
• Oldata (overlapped data)
– similar to pdata
– counter for each IO
– FIFO constraint
31
Time
Wait for overlapped IOs done
Data submitted Data completed
Data IO response
Oldata inserted Oldata deletedGot notice Sent notice
31. Flush/fua for durability
• The condition for a log to be persistent
– All logs before the log and itself are persistent
• Neither FLUSH nor FUA is required for data device IOs
– Data device persistence will be guaranteed by checkpointing
34
Property to guarantee
All write IOs submitted
before the flush IO are
persistent
REQ_FLUSH
What WalB need to do
The fua IO is persistentREQ_FUA
Set FLUSH flag of the
corresponding log IOs
Set FLUSH and FUA flags to
the corresponding log IOs
32. Crash recovery and checkpointing
• Crash recovery
– Crash will make recent write IOs not be persistent
in the data device
– Generate write IOs from recent logs and
execute them in the data device
• Checkpointing
– Sync the data device and update the superblock
periodically
– The superblock contains recent logs information
35
39. MEM 512B random: response
42
Read Write
• Smaller overhead with bio interface
than with request interface
• Serializing write IOs of WalB seems
to enlarge response time as number
of threads increases
40. MEM 512B random: IOPS
43
Read Write
• Large overhead with request
interface
• Pdata search seems overhead of
walb-bio
• IOPS of walb decreases as num of
threads increases
due to decrease of cache-hit ratio
or increase of spinlock wait time
Queue length Queue length
41. Pdata and oldata overhead
44
MEM 512B random write
Kernel 3.2.24, walb req prototype
Overhead
of oldata
Overhead
of pdata
Queue length
42. HDD 4KB random: IOPS
45
Read Write
• Negligible overhead • IO scheduling effect was observed
especially with walb-req
Queue length Queue length
43. HDD 256KB sequential: Bps
46
Read Write
• Request interface is better
• IO size is limit to 32KiB with bio
interface
• Additional log header blocks
decrease throughput
Queue length Queue length
44. SSD 4KB random: IOPS
47
Read Write
• WalB performance is almost the
same as that of wrap req
• Fast algo. is better then easy algo.
• IOPS overhead is large with smaller
number of threads
Queue length Queue length
45. SSD 4KB random: IOPS
(two partitions in a SSD)
48
Read Write
• Almost the same result as with two
SSD drives
• A half throughput was observed
• Bandwidth of a SSD is the
bottleneck
Queue length Queue length
46. SSD 256KB sequential: Bps
49
Read Write
• Non-negligible overhead was
observed with queue length 1
• Fast is better than easy
• Larger overhead of walb with
smaller queue length
Queue length Queue length
47. Log-flush effects with HDDs
50
IOPS (4KB random write) Bps (256KB sequential write)
• IO sorting for the data device is
effective for random writes
• Enough memory for pdata is
required to minimize the log-flush
overhead
Queue length Queue length
48. Log-flush effects with SSDs
51
IOPS (4KB random write) Bps (32KB sequential write)
• Small flush interval is better
• Log flush increases IOPS with
single-thread
• Log-flush effect is negligible
Queue length Queue length
49. Evaluation summary
• WalB overhead
– Non-negligible for writes with small concurrency
– Log-flush overhead is large with HDDs,
negligible with SSDs
• Request vs bio interface
– bio is better except for workloads with large IO size
• Easy vs fast algorithm
– Fast is better
52
51. WalB summary
• A wrapper block device driver for
– incremental backup
– asynchronous replication
• Small performance overhead with
– No persistent indexes
– No undo-logs
– No fragmentation
54
52. Current status
• Version 1.0
– For Linux kernel 3.2+ and x86_64 architecture
– Userland tools are minimal
• Improve userland tools
– Faster extraction/application of logs
– Logical/physical compression
– Backup/replication managers
• Submit kernel patches
55
53. Future work
• Add all-zero flag to the log record format
– to avoid all-zero blocks storing to the log device
• Add bitmap management
– to avoid full-scans in ring buffer overflow
• (Support snapshot access)
– by implementing pluggable persistent address indexes
• (Support thin provisioning)
– if a clever defragmentation algorithm was available
56
54. Thank you for your attention!
• GitHub repository:
– https://github.com/starpos/walb/
• Contact to me:
– Email: hoshino AT labs.cybozu.co.jp
– Twitter: @starpoz (hashtag: #walbdev)
57