This document proposes a new method called Inner Product Predicate (IPP) for performing private range queries over encrypted data. The IPP method adds perturbations to attribute values and queries through matrix-based encryption to prevent frequency analysis attacks. Experimental results show the transformed query distributions are different from the originals and query processing time is linear in the number of tuples. Open problems remain around reducing computational costs and defending against attacks using aggregate query results.
Private Range Query by Perturbation and Matrix Based Encryption
1. Private Range Query
by Perturbation and Matrix Based Encryption
Junpei Kawamoto and Masatoshi Yoshikawa
Kyoto University, Japan
2. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 2
Cloud database and its security
• Recent research topics about security of cloud computing
• Mainly focusing on service providers
• How to analyze data without privacy problems (PPDM)
• How to share data and manage encryption keys
• How to execute queries over encrypted data
web Recently
focused
User Client Service Provider
• Less studies about compromise from queries
• But, queries (i.e. what a user searched for) have important
information about the user.
• Security model about this problem was introduced only recently.
3. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 3
Purpose and basic notions
• Private (range) query
• We focus range queries, which include exact match queries as a
special case.
• obtains data without exposing any information about what the
users requested to third persons including service providers.
• We do not perfectly believe in service providers
• Actually, service providers are unlikely to become an attacker but…
• Servers could be fallen by attackers or stolen physically
• Users can’t know the actual life of their data stored in servers.
We should make a database service
which doesn’t ask users to believe in service providers.
• We assume the scheme of databases is (Key, Value)
• Users request queries over only the Key attribute
4. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 4
Related work
In our method, clients
• Encrypted databases transform queries, too.
• To avoid leaks all data are encrypted by clients
• Main topic is how to handle queries over encrypted data
1-to-1 mapping (hash function, etc.)
15:00 4hwr2g 15:00 “4hwr2g”
~ or
15:12 teg2b1 15:12 “teg2b1”
many-to-1 mapping (k-anonymizer, etc.)
14:45
15:00
15:00 15:00 ~ 15:00
15:12
15:12
They achieve some kind of private query but not enough!
5. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 5
Frequency Analysis Attack (FAA)
• Attackers who know the distribution of queries could
guess plain queries from transformed ones.
mapping
q q*
Dist. of plain queries Dist. of transformed queries
1-to-1 mapping (eg. hashing) Many-to-1 mapping (eg. avg)
q* q*
Dist. of transformed queries Dist. of transformed queries
6. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 6
Key idea for protecting FAA
• Using 1-to-many mapping to make the dist. of transformed
queries different from the original distributions
Tk1(15:00) Tq1(15:00-15:12)
15:00
15:00 Tk2(15:00) ~ Tq2(15:00-15:12)
15:12
q q*
Dist. of plain queries mapping Dist. of transformed queries
To ensure this properties, we add perturbations to queries and then
encrypt them.
7. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 7
Inner Product Predicate (IPP) method
• Employs polynomials f(k) as queries to add perturbations
• Query [a, b] is described as f(k) ≤ 0 with perturbation r.
f(k) NOT match f(k)
match
-r’ 0
a b k
-r 0 k
a b Different r produces different query.
• Uses matrix based encryption
• Matrix based encryption enables query processing w/o decryption
• Query f(k) ≤ 0 are expressed by vector q, k as q・k ≤ 0
• Encryption key is a regular matrix M
• q and k are encrypted as Mtq and M-1k
• The inner product is computed as Mtq・M-1k = qtMM-1k = q・k
canceled
8. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 8
Inner Product Predicate (IPP) method
• Perturbation-added polynomials f(k) f(k)
• fr(k) = (k – a)(k – b)(k + r) perturbation
• Vector form of attr. values and queries -r 0 a b k
• Key vector k = (k3, k2, k, 1)t
• Query vector q = (1, r–a–b, ab–ar–br, abr)t Different r produces
• The inner-product is q・k = (k – a)(k – b)(k + r) different query.
• Encrypting both vectors
Key
matrix Mt q ・ M-1 k = qt M M-1 k = q ・ k
Encrypted query Inner product can be computed
Encrypted attr. value
w/o decryption
• IPP method also adds perturbation to attr. values
• For details, please see our paper.
9. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 9
Scheme of IPP method
• Adding tuples
Transformed tuple: (Tkr(k), v)
where Tkr(k) = M-1(k3, k2, k, 1)t
New tuple: (k, v) Store (Tkr(k), v)
web
User Client Service Provider
• Searching tuples
Transformed query: Tq(a ≤ k ≤ b)
where Tq(a ≤ k ≤ b) = Mt(–1, a+b–r, ar+br–ab, –abr)t
Query: a ≤ k ≤ b Compute
web inner-products
for all tuples
User Client Service Provider
Server’s computational cost is O(n) (n: the number of tuples)
10. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 10
Comparison of necessary memory size
Plain Transformed
Key attribute values lK 12lK + 4(lφ + 3lm + lrk)
Queries 2lK 8lK + 4(ld + lm + lrq)
• lk: bit length of key attribute values
• lφ: bit length of perturbations for key attribute values
• ld: bit length of perturbations for queries
• lm: bit length of encryption keys
• lrk, lrm: bit length of random values used to encryption
• Summary
• Attribute values requires 12 times larger cost than plain case.
• Queries requires four times larger cost than plain case.
11. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 11
Experimental evaluations
• We have conducted to evaluate
• The correlations between dist. of plain queries and transformed
ones is low enough.
• Query proc. time is O(n) with the number of tuples n.
• Common conditions
• All programs are implemented in Python (2.6.4).
• Experiments were performed on one 2.66GHz processor virtual
machine with 512MB running on Virtual Box.
• We chose parameters of IPP method as lK = lφ = lm = lrk = lrp = 32.
• default size in many programming language
12. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 12
Exp. 1: Correlations of queries
• Query set
• 1,000 queries which requested [a, a + 100] (a : 1, 2, ・ ・ ・ , 1000).
A range query [500, 600] is mapped to 3.0×1013
Transformed queries
This graph shows only 1st
elem. of query vectors
Query vectors were distributed in
wide range without depending the
plain values.
Left side of plain range queries
• Coefficient of correlations: 0.014679
13. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 13
Exp. 2: Query processing time
• Conditions
• Five databases which had different numbers of tuples
• Requesting random one million queries to each database
the query proc. time is according to O(n)
with the number of tuples n
×2
×2
14. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 14
Open problems
• Reducing computational cost of servers.
• O(n) is min. cost because if servers could prune candidate tuples, it
means servers, somehow, know what users request.
• There is a trade off between security and computational cost.
• Attackers may guess the plain queries and attribute
values by gathering and analyzing results of queries.
• However, in general, each result of queries consists many tuples.
• Gathering the results needs much more storage space.
• We suppose that it is also necessary to argue about effectiveness
of attacks for the results of querying.
15. Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 15
Conclusion
• We introduce a new private query.
• Transformation algorithms are probabilistic.
• Provide 1-to-many mapping for attribute values and queries.
• The computational cost is O(n).
• Low correlation between transformed distributions and plain ones.
• IPP method is against the frequency analysis attack
• Future work
• Reducing computational cost of servers.
• Considering another attack for query results.
Thank you for your attention!