HybridStore is an efficient data management system for hybrid flash-based sensor devices. It partitions data streams into segments, creates indexes for each segment, and organizes segments with an inter-segment index. This allows it to skip unnecessary segments and have small per-segment indexes. HybridStore features include fully occupying NAND pages written sequentially, avoiding in-place updates, efficiently processing queries over large datasets, and being sensor-friendly with low memory requirements. It was implemented on TinyOS and shown to outperform alternatives through trace-driven simulations involving millions of records.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Presentation hybrid store-ewsn-2013
1. HybridStore: An Efficient Data Management System for
Hybrid Flash-based Sensor Devices
Baobing Wang and John S. Baras
Department of Electrical and Computer Engineering
Institute for Systems Research
University of Maryland, College Park, USA
briankw@umd.edu
10th European Conference on Wireless Sensor Networks (EWSN)
February 14, 2013
Brian (UMD@USA) HybridStore February 14, 2013 1 / 15
2. Motivation
In-situ Data Storage on Sensor Motes
Centralized data collection: energy wastes (e.g., TinyDB)
LoCal project1 : 455 nodes, > 900M readings/year
Only aggregated data are required: average noise level, peak power
consumption, usage pattern
Sensors store data locally: sensor database
Flash memory: high capacity, energy efficient
Figure:
1
http://local.cs.berkeley.edu/
Brian (UMD@USA) HybridStore February 14, 2013 2 / 15
3. Motivation
In-situ Data Storage on Sensor Motes
Centralized data collection: energy wastes (e.g., TinyDB)
LoCal project1 : 455 nodes, > 900M readings/year
Only aggregated data are required: average noise level, peak power
consumption, usage pattern
Sensors store data locally: sensor database
Flash memory: high capacity, energy efficient
Figure: Per-byte cost: storage, computation and communication [Mathur’06]
1
http://local.cs.berkeley.edu/
Brian (UMD@USA) HybridStore February 14, 2013 2 / 15
4. Motivation
Design Challenges
Unlike magnetic disks, no in-place updates on flash memories
NOR flash: byte-oriented, random-accessible, low capacity
NAND flash: page-oriented, high capacity, more energy-efficient
Random writes are 100× more expensive than sequential writes
Very limited RAM: 4KB to 10KB
Brian (UMD@USA) HybridStore February 14, 2013 3 / 15
5. Related Work
Flash-based Storage Systems
Only time-window queries: TL-Tree [Li’12], FlashLog [Nath’09]
Large RAM footprint: FlashDB [Nath’07], LA-Tree [Agrawal’09]
Antelope [Tsiftes’11]: NOR flash only, discrete values
MicroHash [Lin’06]: long chain of partial pages, extensive page reads
and writes, complex failure recovery
No efficient joint queries support, global index
Brian (UMD@USA) HybridStore February 14, 2013 4 / 15
6. Contributions
HybridStore Interface
insert(float key , void* record, uint8 t length)
select(uint32 t t1 , uint32 t t2 , float k1 , float k2 )
HybridStore Features
All NAND pages are fully occupied and written purely sequentially
In-place updates and out-of-place writes are completely avoided
Process typical joint queries efficiently, even on large-scale datasets
Data aging without overhead
Sensor-friendly: 16.5KB ROM and 3.2KB RAM in TinyOS 2.1
Potential Applications
Storage layer abstraction: Squirrel [Mottola’10]
Brian (UMD@USA) HybridStore February 14, 2013 5 / 15
7. Contributions
HybridStore Interface
insert(float key , void* record, uint8 t length)
select(uint32 t t1 , uint32 t t2 , float k1 , float k2 )
HybridStore Features
All NAND pages are fully occupied and written purely sequentially
In-place updates and out-of-place writes are completely avoided
Process typical joint queries efficiently, even on large-scale datasets
Data aging without overhead
Sensor-friendly: 16.5KB ROM and 3.2KB RAM in TinyOS 2.1
Potential Applications
Storage layer abstraction: Squirrel [Mottola’10]
Brian (UMD@USA) HybridStore February 14, 2013 5 / 15
8. Contributions
HybridStore Interface
insert(float key , void* record, uint8 t length)
select(uint32 t t1 , uint32 t t2 , float k1 , float k2 )
HybridStore Features
All NAND pages are fully occupied and written purely sequentially
In-place updates and out-of-place writes are completely avoided
Process typical joint queries efficiently, even on large-scale datasets
Data aging without overhead
Sensor-friendly: 16.5KB ROM and 3.2KB RAM in TinyOS 2.1
Potential Applications
Storage layer abstraction: Squirrel [Mottola’10]
Brian (UMD@USA) HybridStore February 14, 2013 5 / 15
9. HybridStore: Overview
Partition the data stream into segments
Create an in-segment index for each segment
Create an inter-segment index to organize segments
Benefits: skip unnecessary segments, small index per segment
Brian (UMD@USA) HybridStore February 14, 2013 6 / 15
10. HybridStore: Index Management
Inter-segment skip list: addr , tmin , locate segments within [t1 , t2 ]
NULL
Header
In-segment β-Tree: locate records within [k1 , k2 ]
In-segment Bloom filter: check the existence of key values if k1 = k2
Brian (UMD@USA) HybridStore February 14, 2013 7 / 15
11. HybridStore: Index Management
Inter-segment skip list: addr , tmin , locate segments within [t1 , t2 ]
NULL
Header
In-segment β-Tree: locate records within [k1 , k2 ]
In-segment Bloom filter: check the existence of key values if k1 = k2
Brian (UMD@USA) HybridStore February 14, 2013 7 / 15
13. HybridStore: In-segment Index
In-segment Bloom filter: check the existence of key values if k1 = k2
1 qn q
v bits, q hash functions, represent n items: p = 1 − 1 − v
Must be maintained in RAM: NOR flash is byte-oriented
If q = 3, n = 4096, p ≈ 3.06%, then v = 32768 (i.e., 4KB)
Horizontal partition: fixed small bloom filter sections (e.g., 256B)
Vertical partition: group fragments with the same offset in the same
NAND page
Brian (UMD@USA) HybridStore February 14, 2013 9 / 15
14. HybridStore: In-segment Index
In-segment Bloom filter: check the existence of key values if k1 = k2
1 qn q
v bits, q hash functions, represent n items: p = 1 − 1 − v
Must be maintained in RAM: NOR flash is byte-oriented
If q = 3, n = 4096, p ≈ 3.06%, then v = 32768 (i.e., 4KB)
Horizontal partition: fixed small bloom filter sections (e.g., 256B)
Vertical partition: group fragments with the same offset in the same
NAND page
Brian (UMD@USA) HybridStore February 14, 2013 9 / 15
15. HybridStore: In-segment Index
In-segment Bloom filter: check the existence of key values if k1 = k2
1 qn q
v bits, q hash functions, represent n items: p = 1 − 1 − v
Must be maintained in RAM: NOR flash is byte-oriented
If q = 3, n = 4096, p ≈ 3.06%, then v = 32768 (i.e., 4KB)
Horizontal partition: fixed small bloom filter sections (e.g., 256B)
Vertical partition: group fragments with the same offset in the same
NAND page
Brian (UMD@USA) HybridStore February 14, 2013 9 / 15
16. HybridStore: Storage Hierarchy
NOR flash: circular array, fixed segment size
NAND flash: circular array, logical segment (multiple erase blocks)
Index structure: updated in a NOR segment, copied to the NAND
segment later
Header: [T1 , T2 ], [K1 , K2 ], dataAddr , idxAddr , bfAddr , skipList
Skip List Header
Bloom Write Read
Filter Buffer RAM
Readings ... Readings
...
Buffer Buffer
Bloom Filter
NOR NOR
Adaptive
Segment ... Segment
NOR
Binary Tree Readings ... Readings
Bloom Filter Tree
...
Segment Segment
...
Segment
NAND
Tree }Header
Page
(a) Storage Hierarchy (b) NAND Segment Structure
Brian (UMD@USA) HybridStore February 14, 2013 10 / 15
17. HybridStore: Operations
Insertion
Update the β tree: allocate new bucket if necessary
Update the Bloom filter buffer: flush it out to NOR flash if necessary
NOR segment is full: copy to the NAND segment, update the skip list,
start a new segment
Querying: t1 , t2 , k1 , k2
t1 = t2 t1 < t2
k1 = k2 skip list skip list + Bloom filter + β-Tree
k1 < k2 skip list skip list + β-Tree
Skip a segment if [K1 , K2 ] ⊂ [k1 , k2 ]
Data Aging: delete the oldest NAND segment
No need to update any pointer
No need to move any data page
Brian (UMD@USA) HybridStore February 14, 2013 11 / 15
18. HybridStore: Operations
Insertion
Update the β tree: allocate new bucket if necessary
Update the Bloom filter buffer: flush it out to NOR flash if necessary
NOR segment is full: copy to the NAND segment, update the skip list,
start a new segment
Querying: t1 , t2 , k1 , k2
t1 = t2 t1 < t2
k1 = k2 skip list skip list + Bloom filter + β-Tree
k1 < k2 skip list skip list + β-Tree
Skip a segment if [K1 , K2 ] ⊂ [k1 , k2 ]
Data Aging: delete the oldest NAND segment
No need to update any pointer
No need to move any data page
Brian (UMD@USA) HybridStore February 14, 2013 11 / 15
19. HybridStore: Operations
Insertion
Update the β tree: allocate new bucket if necessary
Update the Bloom filter buffer: flush it out to NOR flash if necessary
NOR segment is full: copy to the NAND segment, update the skip list,
start a new segment
Querying: t1 , t2 , k1 , k2
t1 = t2 t1 < t2
k1 = k2 skip list skip list + Bloom filter + β-Tree
k1 < k2 skip list skip list + β-Tree
Skip a segment if [K1 , K2 ] ⊂ [k1 , k2 ]
Data Aging: delete the oldest NAND segment
No need to update any pointer
No need to move any data page
Brian (UMD@USA) HybridStore February 14, 2013 11 / 15
20. HybridStore: Implementation and Evaluation
TinyOS implementation: 16.5KB ROM, 3.2KB RAM
Trace-driven simulation: over 2.6 million weather records in 5 years
Insertion: 13% ∼ 18% improvement
2 90 40
β−Tree Static tree β−Tree Static tree β−Tree Static tree
1.8 80 35
1.6
70
30
1.4
60
Space Overhead (%)
25
1.2
Energy (µJ)
Time (ms)
50
1 20
40
0.8
15
30
0.6
10
20
0.4
10 5
0.2
0 0 0
64 128 256 64 128 256 64 128 256
NOR Flash Segment Size (KB) NOR Flash Segment Size (KB) NOR Flash Segment Size (KB)
(a) Latency (b) Energy (c) Space Overhead
Figure: Performance per insertion
Brian (UMD@USA) HybridStore February 14, 2013 12 / 15
21. HybridStore: Value-based Equality Query
Key detection: 26.18ms and 1.5mJ over 0.5 million readings
Nonexistent keys: more than 3× improvement
300 18
β−Tree (64KB) β−Tree (64KB)
β−Tree (128KB) 16 β−Tree (128KB)
250
β−Tree (256KB) β−Tree (256KB)
14
β−Tree (64KB w/o BF) β−Tree (64KB w/o BF)
200 Static (128KB) 12 Static (128KB)
Energy (mJ)
Time (ms)
10
150
8
100 6
4
50
2
0 0
1 day 1 week 1 month 3 month 1 year 1 day 1 week 1 month 3 month 1 year
Time Range Time Range
(a) Latency (b) Energy
Figure: Impact of Bloom filter for nonexistent keys
Brian (UMD@USA) HybridStore February 14, 2013 13 / 15
22. HybridStore: Full Query
Retrieve 120K readings in 11.08 seconds from 0.5 million records
[SenSys ’11]: over 20 seconds to get 50% from 50, 000 records
12 700
1 degree 1 degree
3 degree 3 degree
600
10 5 degree 5 degree
7 degree 7 degree
9 degree 500 9 degree
Energy (mJ) / Query
8
Time (s) / Query
400
6
300
4
200
2
100
0 0
1 day 1 week 1 month 3 months 6 months 1 year 1 day 1 week 1 month 3 months 6 months 1 year
Time Range Time Range
(a) Total Latency per query (b) Total energy per query
Figure: HybridStore performance per query of full queries
Brian (UMD@USA) HybridStore February 14, 2013 14 / 15
23. Conclusion and Future Work
Conclusion
HybridStore: efficient, light-weight, and sensor-friendly
Process typical joint queries efficiently
Process large-scale dataset efficiently
Future Work2
Failure recovery mechanism
Distributed database system based on HybridStore
Testbed experiments
2
B. Wang and J. S. Baras. HybridDB: An Efficient Database System Supporting
Incremental epsilon-Approximate Querying for Storage-Centric Sensor Networks.
Submitted to the ACM Transactions on Sensor Networks, 2013, pp. 1–35
Brian (UMD@USA) HybridStore February 14, 2013 15 / 15