Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. We develop communication-efficient and information-theoretically secure algorithms for privacy-preserving aggregation queries using multi-party computation (MPC). Specifically, query processing techniques over secret-shared data outsourced by single or multiple database owners are developed. These algorithms allow a user to execute queries on the secret-shared database and also prevent the network and the (adversarial) clouds to learn the user’s queries, results, or the database. We further develop (non-mandatory) privacy-preserving result verification algorithms that detect malicious behaviors, and experimentally validate the efficiency of our approach over large datasets, the size of which prior approaches to secret-sharing or MPC systems have not scaled to.
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
1. OBSCURE: Information-Theoretic
Oblivious and Verifiable Aggregation
Queries
Shantanu Sharma1
Joint work with
Peeyush Gupta1, Yin Li2, Sharad Mehrotra1, Nisha Panwar1, and Sumaya Almanee1
1University of California, Irvine, USA.
2Xinyang Normal University, China.
45th International Conference on Very Large Data Bases (VLDB), 2019.
2. 2
Can we design an outsourcing solution for that is
simultaneously
Efficient – significantly better compared to downloading
cryptographically secured data, and
Secure – similar to downloading the data and local processing
Use cryptographic mechanisms to protect sensitive data on
the cloud
Secure Data Outsourcing
3. Background: Data/Computation Outsourcing
Computation over Encrypted data
[IEEE SP00, ACNS04, Cryto08,09, ICDE02,
SIGMOD02, VLDB04, Eurocrypt03, SIGMOD04,
Crypto11, STOC09, SOSP11, …]
Computation over Secret Sharing
[CACM79, Eurocrypt14,15,17]
Computationally secure, i.e., not
secure forever, and slow
Example: Homomorphic encryption
with access-pattern hiding technique
Information-theoretically secure, i.e.,
secure forever
Independent of the adversary’s
computational capabilities
Computation over Encrypted data
4. Background: Computing over Secret Shared Data
Secret Sharing
Communicating Servers
(Jana and Sharemind)
Non-communicating
servers (SSDB)
• Selection and aggregation queries
• Significant communication overheads
amongst servers
• Selection and aggregation queries
• Reveal data access-patterns and/or require
clients to maintain state
5. •Supporting databases outsourcing using SSS
•Execute complex selection (conjunctive and
disjunctive) in an oblivious manner
•No communication among servers
•Minimize work at the database owner site
Problem Statement
7. Shamir’s Secret-Sharing (SSS) [Shamir79] – Key Idea
• One point Infinite number of lines
• Two points Only one line
• Where f(0) is the secret
• Alice wants to share her secret value 5 to Bob and Carl
• Bob and Carl do not communicate with each other
• Impact of degree of the polynomial vs security
• 𝑓 servers collude polynomial degree should be 𝑓 + 1
• Servers do not collude a polynomial of the degree 1
• Fault tolerant
• Due to creating multiple shares
8. Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Mathematical operations
f(x) = S + ax
Each server
cannot learn
the secret S
Secret-Share Creation:
e.g., under the assumption that
no server will collude
9. Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Lagrange Interpolation
Secret Reconstruction
e.g., under the assumption that
no server will collude
10. Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Secret Reconstruction
e.g., under the assumption that
no server will collude
Lagrange Interpolation
11. • Similar to Order-Preserving Encryption (OPE)
• If cleartext values have a relation, such as 𝒂 < 𝒃, then
• 𝑆 𝑎 < 𝑆 𝑏
• Efficient for maximum/minimum and range queries
Background: Order-Preserving Secret-Sharing
F. Emekci et al. Dividing secrets to secure data outsourcing. Inf. Sci., 263:198–210, 2014.
12. • Outsource the above relation using Shamir’s secret-sharing
• Add all secret-shared values of ‘Salary’ attributes
• Exploit additive homomorphic property
• Challenges:
• Aggregation with complex selection obliviously, i.e., access-pattern
hiding
• No communication among servers
• Minimize work at the database owner site
Simple Aggregation using Secret-Shared Data
EmpID Name Salary Dept
E101 John 1000 Testing
E101 John 100000 Security
E102 Adam 5000 Testing
E103 Eve 2000 Design
SELECT SUM(Salary) FROM Employee
13. •Count Query and Result Verification
•Maximum Query
•Experimental Results
•Differences from the Previous Techniques
•Appendix
• String-Matching over Secret-Shares [Dolev’19]
Outline of the Talk: OBSCURE Operations
14. Data Outsourcing using OBSCURE
EmpID Name Salary
E101 John 1000
E101 John 100000
E102 Adam 5000
E103 Eve 2000
CleartextTID SSTID Salary
5 5 5000
4 4 1000
3 3 1000
2 2 100000
Employee Relation
Create shares using SSS Create shares using OP-SS
Only order of
values is revealed.
But, which row has
the highest value is
not revealed.
Fast
answering
to
maximum
finding
queries.
EmpID Name Salary TID Index
For verification
purpose
E101 John 1000 3 3
E101 John 100000 2 2
E102 Adam 5000 5 5
E103 Eve 1000 4 4
E1 E2
15. • Step 1: Convert query predicates to secret-share representation
• Step 2: Send secret-shares query predicate to the servers
OBSCURE: Conjunctive Count Query
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John
String-Matching
Operation over
Secret-Shares
1
1
0
0
Answers of
String-Matching
Operations
1000
1000
1000
1000
1
0
0
1
Query
predicate
String-Matching
Operation over
Secret-Shares
Answers of
String-Matching
Query
predicate
1
0
0
0
1
Final answer to
the query
select count(*) from Employee where Name = ‘John’ and Salary = 1000
Multiply
Add
Multiplication increases the degree of the polynomial
If we have a smaller number of servers than the desired
number of servers, then we can still solve the problem by
1. Increasing communication rounds
2. Increasing computation time
V1 V2
16. OBSCURE: Count Query – Security Guarantees
select count(*) from Employee where Name = ‘John’ and Salary = 1000
• Identical operations on each row Oblivious execution
• Hide access-patterns: The adversary cannot learn which rows have satisfied the query
• The adversary cannot learn anything
• By observing the values of the data and query predicates, since all values are secret-shared
• No output-size attack
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John
String-Matching
Operation over
Secret-Shares
1
1
0
0
Answers of
String-Matching
Operations
1000
1000
1000
1000
1
0
0
1
Query
predicate
String-Matching
Operation over
Secret-Shares
Answers of
String-Matching
Operations
Query
predicate
17. Impact of #Shares – Conjunctive Count Query
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John
1
1
0
0
1000
1000
1000
1000
1
0
0
1
1
0
0
0
1
select count(*) from Emp where
Name = ‘John’ and Salary = 1000 and Age = 40
Multiply
Add
Age
40
40
50
40
40
40
40
40
1
1
0
1
Polynomial
degree = 3
• Min. number of shares of interpolate a polynomial of the degree = 3
• Need four shares
V2
V3
V1
18. Impact of #Shares – Conjunctive Count Query
select count(*) from Emp where
Name = ‘John’ and Salary = 1000 and Age = 40
• What if you have only three shares?
• Compute the result of any two predicate, e.g., Salary = 1000 and Age = 40
• And execute the remaining query at the user side
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John
1
1
0
0
1000
1000
1000
1000
1
0
0
1
Age
40
40
50
40
40
40
40
40
1
1
0
1
Multiply
1
0
0
1
V2
V'
V1 V3
19. Count Query Result Verification
EmpID Name Salary TID Index
With
Something for
verification
E101 John 1000 3 3
E101 John 100000 2 2
E102 Adam 5000 5 5
E103 Eve 1000 4 4
EmpID Name Salary TID Index A B
E101 John 1000 3 3 1 1
E101 John 100000 2 2 1 1
E102 Adam 5000 5 5 1 1
E103 Eve 1000 4 4 1 1
What is this
here???
Two columns,
each is having
1 of SSS form
20. Count Query Result Verification
Verify the answer of the following query:
select count(*) from Employee where Name = ‘John’ and Salary = 1000
1
0
0
0
A
1
1
1
1
B
1
1
1
1
0
1
1
1
1 - Value
Multiply
1
0
0
0
0
1
1
1
3
1
Add all
values
Add all
values
MultiplyCount
query
result for
each row
21. Count Query Result Verification
Verify the answer of the following query:
select count(*) from Employee where Name = ‘John’ and Salary = 1000
1
3
The first value matches the result of the
count query →
The count query result is correct
The sum of the two values equals to the
number of rows in the dataset →
The server has scanned all the rows
to compute the answer
22. OBSCURE: Maximum Query
select * from Employee where Salary
in (select max(Salary) from Employee)
EmpID Name Salary Dept TID Index
E101 John 1000 Testing 3 3
E101 John 100000 Security 2 2
E102 Adam 5000 Testing 5 5
E103 Eve 1000 Design 4 4
CleartextTID SSTID Salary
5 5 5000
4 4 1000
3 3 1000
2 2 100000
Find the tuple with the
maximum salary
CleartextTID SSTID Salary
2 2 100000
Output
Based on string matching over TID
and SSTID, find the tuple having the
maximum salary
E101 John 100000 Security 2 2
E1
E2
23. •Count Query and Result Verification
•Maximum Query
•Experimental Results
•Differences from the Previous Techniques
•Appendix
• String-Matching over Secret-Shares [Dolev’19]
Outline
24. • Dataset
• TPC-H LineItem Table 1M and 6M rows
• Cloud Machines
• 15 AWS servers, each 144GB RAM, 3.0GHz Intel Xeon CPU with 72 cores
• Database Owner or User Machine
• A 16GB RAM machine with one core
Experimental Results
26. OBSCURE vs Downloading and Local Processing
1M rows 6M rows
At most time is
13 seconds
At most time is
50 seconds
Computation time at a resource constrained user
(1GB RAM and single core 1.35GHz CPU)
1M rows at most 13seconds < 26 seconds (downloading)
6M rows at least 50seconds < 385seconds (downloading)
27. • SSS-based Work
• Retain each polynomial, which was used to create database shares [Emekci 14]
• Reveal access-patterns [Emekci 14, Xiang 16]
• Retain tuple-ids of qualifying tuples [Xiang 16]
• Cannot perform general aggregations with selection over complex predicates
• Work for a smaller dataset
• Verification Work
• Require a trusted-third-party verifier [Jiang 08]
• Meta-database verification to check whether all the desired tuples are scanned or
not, but no result verification for all queries [Thompson 09]
Differences from the Previous Work
28. •Sum queries and result verification
•Verification of retrieved tuple
•Different types of maximum queries
•Maximum query over the dataset outsourced by multiple
DB owners
•Minimum, Top-K, Group-by queries
•Range queries
What will not be covered here???
See the full version and published version of the paper. 1. Published version 2. Full version.
29. •Data outsourcing – Exploiting SS with order-preserving
secret-sharing
•Count Query and result verification
•Experimental Results
•Differences from the previous techniques
•Appendix
• String-Matching over Secret-Shares [Dolev’19]
Outline
30. Step 1: Unary representation
Step 2: Creating secret-shares of unary represented data
Step 3: Outsourcing the data
String Matching over Secret-Shared Data [Dolev 19]
A
B
C
1, 0, 0
0, 1, 0
0, 0, 1
Polynomials
Secret-shares
Secret-shares
Secret-shares
31. String-Matching over Secret-Shared Data [Dolev 19]
Secret-Share Creation by the DB owner
B
0
1
0
0 + 5x
1 + 9x
0 + 2x
5
10
2
10
19
4
15
28
6
This is representing B 0, 1, 0 of secret-shared form
→
The adversary cannot learn the actual value, B
32. String-Matching over Secret-Shared Data
5
10
2
10
19
4
15
28
6
User wants
to search
for
B0
1
0
0 + x
1 + 2x
0 + 4x
No need to share any
polynomial b/w the DB
owner and the user
1
3
4
2
5
8
3
7
12
Secret-Share
Creation by
the user
These shares are
representing B 0, 1, 0
of secret-shared form
→
The adversary cannot
learn the actual value,
B, of either the dataset
or the query predicate
33. String-Matching over Secret-Shared Data
5
10
2
10
19
4
15
28
6
1
3
4
2
5
8
3
7
12
Cloud
operations:
Multiplication
and addition of
shares
5
30
8
20
95
32
45
196
72
43
147
313
User wants
to search
for
B
Lagrange
interpolation
Answer = 1
This is the multiplication of [0,1,0]
and [0,1,0] in secret-shared form.
So using SSS, we are hiding 1 or 0
from the adversary.
Each cloud sends only one value to the user,
regardless of dataset size →
Less communication cost
34. • Existing techniques are not enough
• Existing information-theoretically secure techniques are slower, reveal access-
patterns, or cannot allow a third-party to execute a query
•OBSCURE
• Information-theoretically secure aggregation queries with
result verification
• A scalable system
• A tradeoff between the number of shares vs the computation
time
Conclusion