Paper presented by LEGaTO researcher Christian Göttel, from University of Neuchâtel, at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS 2020) on 17 July 2020.
Abstract: The number of connected Internet of Things (IoT) devices is continuing to grow and will result in a significant increase of network traffic. We propose data compression as an approach to reduce the network traffic generated by IoT devices. Generalized deduplication (GD) is novel technique to effectively compress data by identifying similar data chunks yielding reduced storage cost, and reducing the network traffic load.
We introduce Hermes - an application-level protocol for the data-plane that can operate using GD as well as classical deduplication. Hermes significantly reduces the data transmission traffic while effectively decreasing the energy footprint. In this presentation we outline the problem of increasing network traffic and briefly introduce GD. Then we describe Hermes' architecture and share our evaluation results on a small-scale network deployment using Raspberry Pi 4B devices.
Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
1. 14th ACM International Conference on Distributed and Event-based Systems
Montreal, Quebec, Canada
Hermes: Enabling Energy-efficient IoT Networks with
Generalized Deduplication
Christian Göttel∗, Lars Nielsen†, Niloofar Yazdani†,
Pascal Felber∗, Daniel E. Lucani†, Valerio Schiavoni∗
July 17, 2020
∗University of Neuchâtel - Computer Science Department
†Aarhus University - Department of Engineering
2. Introduction
Problem: more and more IoT devices
Growing number of IoT devices
Increasing pressure on the network
Solution: data compression
Computationally demanding
[L. Peter Deutsch et al. 1996]
High memory requirement
Weak performance on small data
chunks [N. Yazdani et al. 2019]
Lightweight, memory-efficient
approaches have poor compression
[J. Ziv et al. 1977]
16 128
Chunk length (B)
0
1
2
3
4
5
6
Compressionratio
LZW
DEFLATE
DD
GD-vanilla
GD-reduced
GD-dual
GD-vanilla, offset removal
GD-reduced, offset removal
GD-dual, offset removal
Figure 1: Ambient water and energy data set
[N. Batra et al. 2013] (higher compression is better)
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 1 / 10
3. Background
Generalized deduplication (GD)
Introduced in [Vestergaard et al. 2019]
Reduces Cloud storage cost
Finds equal & similar data chunks
Multiple transformations
Error correcting codes
(e.g., Hamming [J. C. Moreira 2006])
Our contribution
Hermes: a data transmission
protocol
Operates using GD and DD
Reduces the data transmission
Decreases the storage footprint
Chunks
Bases
Deviations
Figure 2: Data deduplication (DD) over bases
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 2 / 10
4. Architecture
Nodes
*
source
source
intermediate
sink
IoT
IoT
edge
cloud
Node types
Source introduce data into a Hermes
based system
Sink is the destination for data
Intermediate any node between a
source and a sink
A node is not restricted to a single type
Node classification
Basic performs no data processing
Deduplication can perform data
deduplication
Gen. Deduplication can perform data
deduplication
A node is restricted to a single classification
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 3 / 10
5. Architecture
Data transmission mechanism
Source node Sink node
Checks whether
fingerprint is
available or not.
Ack. Ack. & Request for basis
Basis
(a) Fingerprint is available at the sink node. (b) fingerprint is not available at the sink node.
Ack.
Fingerprint Deviation
Time
Sink node
Checks whether
fingerprint is
available or not.
Fingerprint Deviation
Source node
Figure 3: Example message exchange between two Gen. Deduplication nodes
Notice that every source node is taking advantage of all fingerprints in the network
independent of their origin [Yazdani et al. 2019].
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 4 / 10
6. Evaluation
Simulation results
Synthetic data set:
Best case scenario
Parameterizable generator
Applying compression on each chunk
Neither DEFLATE nor LZW are able
to compress the data set
DD: compression of 1.6×
GD: compression up to 668×
21 23 25 27 29 211
Chunk length (B)
10 1
100
101
102
103
Compressionratio
LZW
DEFLATE
DD
GD
Original size
Figure 4: Comparison of compression schemes
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 5 / 10
7. Evaluation
Experimental settings
Micro-benchmark setup
Raspberry Pi 4B
Alciom PowerSpy2 power analyzer
Auxiliary machine for data collection
and monitoring
Macro-benchmark setup
16 Raspberry Pi 4B with PoE HAT
Dell PowerEdge R330
Ubiquiti Networks UniFi
USW-48P-750 switch
Auxiliary machine for data collection
and monitoring
Clock synchronization with NTP
Figure 5: Raspberry Pi 4B cluster
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 6 / 10
8. Evaluation
Micro-benchmark
1 2 4 8 16 32 64 256 1024 4096
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Chunk size [B]
Energy[µJ/bit]
DD
GD
1 2 4 8 163264 256 1024 4096
10
-1
10
0
10
1
10
2
10
3
Chunk size [B]
Compressionratio
DD
GD
1 2 4 8 16 32 64 256 1024 4096
0
10
20
30
40
Chunk size [B]
Throughput[Mbit/s]
DD
GD
Energy is comparable
Overhead for small
chunks
Many repetitive chunks
for small chunks
GD compresses orders
of magnitude better
Large number of chunks
match a single basis
35 Mbit/s max
GD overhead is
negligible
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 7 / 10
9. Evaluation
Macro-benchmark
1 2 4 8 16 32 64 256 1024 4096
10
0
10
1
10
2
10
3
10
4
Chunk size [B]
Energy[nJ/bit]
raw
DD
GD
1 2 4 8 16 32 64 256 1024 4096
10
0
10
1
10
2
10
3
10
4
Chunk size [B]
Bytessent[MiB]
raw
DD
GD
1 2 4 8 16 32 64 256 1024 4096
0
100
200
300
400
500
600
Chunk size [B]
Throughput[Mbit/s]
raw
DD
GD
Similar to micro-bench
Significant overhead for
GD with large chunks
DD converges to raw
for large chunks
GD reduces network
traffic greatly
Significant drop for GD
with large chunks
GD achieves higher
throughput than DD
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 8 / 10
10. Conclusion & Future Work
Conclusion
We contribute, implement and evaluate Hermes protocol on Raspberry Pi 4B
No need to compare with pool of values or to use similarity fingerprints
Significant benefits in compression of data
Reduction of data transmission without loss of information
Future work
Developing data-aware transformations
Enhance overall compression and system performance
Large-scale deployments with computationally limited devices
Operation over unreliable transport protocols
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 9 / 10
11. Thank you
Thank you for your attention!
The research leading to these results has received funding from
the European Union’s Horizon 2020 research and innovation
programme under the LEGaTO Project (legato-project.eu),
grant agreement No 780681.
This work was partially financed by the SCALE-IoT Project
(Grant No. 7026-00042B) granted by the Independent
Research Fund Denmark, by the Aarhus Universitets
Forskningsfond (AUFF) Starting Grand Project AUFF-
2017-FLS-7-1, and Aarhus University’s DIGIT Centre.
17.07.2020 − Christian Göttel − Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication 10 / 10