With the continuous increase of cloud storage adopters, data deduplication has become a necessity for cloud providers. By storing a unique copy of duplicate data, cloud providers greatly reduce their storage and data transfer costs. Unfortunately, deduplication introduces a number of new security challenges.
We propose PerfectDedup, a novel scheme for secure data deduplication, which takes into account the popularity of the data segments and leverages the properties of Perfect Hashing in order to assure block-level deduplication and data condentiality at the same time. We show that the client-side overhead is minimal and the main computational load is outsourced to the cloud storage provider.
4. Deduplication vs Encryption
… but it does not work on encrypted data!
D = Hello
World
D = Hello
World
ENCRYPTION with K1 ENCRYPTION with K2
owhfgr0wgr[w
hfrw0[h0[ergh
e0[gh0[eg
dfjl;dbfrwbfirbf
roepthwobgfr
ugtwertgrtwu
4
5. Convergent Encryption
• Data Encryption key derived from Data
K = hash(Data)
• Deterministic & Symmetric Encryption
D = Hello
World
D = Hello
World
ENCRYPTION with H(D) ENCRYPTION with H(D)
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
5
Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002.
Proceedings. 22nd International Conference on. IEEE, 2002.
6. Convergent Encryption
MISSING
INFORMATION
How to achieve safe
Convergent Encryption
in the Cloud ?
6
Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20
https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
7. Data Popularity
• Different protection based on data-segment
popularity
• Popular data Not confidential To be
deduplicated Convergent Encryption
• Unpopular data Confidential To be
protected Semantically-Secure Encryption
7
Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014. 99-118.
8. How to securely detect popularity ?
CSP
.
.
.
B
.
.
.
Is block B popular ?
YES / NO
• Block B must not be disclosed if it is unpopular (sensitive)
CLIENT
8
9. PHF-based Lookup
9
ID
Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin
Heidelberg, 2009. 682-693.
10. PerfectDedup
• Based on «Secure» Perfect Hashing
– One-wayness
• Popular block IDs Collision-free hash
function (PHF)
• BENEFITS:
– Efficient (linear) generation of a new PHF
(outsourced to the Cloud)
– Compact representation of PHF
– Very efficient (constant) evaluation on a block ID
10
Hello everyone, my name’s Pasquale Puzio. I’m a PhD student at EURECOM & SecludIT under the supervision of Refik MOLVA and Sergio LOUREIRO. Today I’m gonna talk about PerfectDedup, which is our last work on secure data deduplication
Data Deduplication + Confidentiality
Let’s talk quickly about the agenda. Today I’ll first explain what data deduplication is and why it became interesting for researchers. Then I’ll explain how deduplication can be combined with encryption, in particular convergent encryption. This will bring me to the vulnerabilities of CE. Finally I’ll present our solution based on data popularity and perfect hashing.
Basic idea: store duplicated data only once
Explain
Mention experiments
Key and encryption are deterministic
Researchers noticed that data may need different levels of protection depending on its popularity
This assumption works pretty well in all common scenarios, except for a few extreme cases
However in our scheme the user can skip the protocol and just encrypt his file
Explain when a block becomes popular -> popularity threshold is reached
Mention an example
The problem is shifted to secure popularity detection: if popular do this, if unpopular do that
PIR would not be efficient in the case of block-level deduplication
Explain that different encryption requires the user to know if data is popular
Simple solution -> look for convergent encrypted block -> not secure
Let’s go into more detail
Index does not reveal anything on the block because of collisions
Lookup protocols use hash tables, databases use perfect hashing based indices, we need a secure lookup protocol
We decided to design a new protocol based on perfect hashing
Secure because we added the one-wayness property which is foundamental for the security of the protocol
No confidentiality issue because block is popular
On the other hand, collisions protect unpopular data
Several pre-images corresponding to the same image
Now let’s have a closer look at the architecture
We need a trusted index service in order to handle the popularity transition, that is that phase in which a block that was unpopular becomes popular after reaching a popularity threshold
Explain protocol
Focus on CMPH
Mention that we modified CMPH in order to make it secure (one-way)
Upload of a 10MB file in three different scenarios: file was unpopular, triggered a popularity transition, was popular
The take-away from this slide is that all client operations are really lightweight
Example: popularity check -> outperforms PIR by far
Costly operations are outsourced to the cloud
Fix colors