Contenu connexe Similaire à Hadoop Meetup Jan 2019 - Hadoop Encryption (20) Hadoop Meetup Jan 2019 - Hadoop Encryption1. © Cloudera, Inc. All rights reserved.
Hadoop Encryption
Wei-Chiu Chuang, Cloudera
2. © Cloudera, Inc. All rights reserved.
Why Encryption
• Information leaks affect 10s to 100s of millions of people
• Personally identifiable information (PII)
• Credit cards, SSNs, account logins
• Encryption would have prevented some of these leaks
• Encryption is a regulatory requirement for many business sectors
• Finance (PCI DSS)
• Government (Data Protection Directive)
• Healthcare (DPD, HIPAA)
3. © Cloudera, Inc. All rights reserved.
Related Technologies
Data In-Motion Encryption
• SSL/TLS
• Hadoop Data Transfer Encryption
• Hadoop RPC Encryption
Data At-Rest Encryption
• At Linux volume level
• Transparent Encryption at HDFS level
• HBase Column Family level
• Parquet Column level Encryption Xinli
4. © Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
HDFS
Namespace
/
/data /tmp
/data/1 /data/f2
Encryption
zone
Encryption
zone key Data Encryption
Key (per file)
5. © Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
Client
KMS
MS
NN
DN
NameNode
DataNode
Key
Management
Server
in EZ?
6. © Cloudera, Inc. All rights reserved.
Features
• Minor performance impact on HDFS reads and writes
• OpenSSL and AES-NI acceleration
• 7.5% for reads, ~0% for writes
• Key ACLs
• Warm-up/Caching (*)
• Key rollover
7. © Cloudera, Inc. All rights reserved.
Dev History
• First released in Hadoop 2.6.0/ CDH5.3 in 2014 December
• Many, many bug fixes and enhancements
• Functional bugs, failure handling bugs, scale bugs
• Stable after Hadoop 2.8 / CDH5.11-ish
8. © Cloudera, Inc. All rights reserved.
Lesson Learned
Scale-out is not easy to deploy
Security
Endurance, scale tests are essential
Too little emphasis on KMS as a performance bottleneck
FileSystem#getDelegationToken() API/integration
High throughput REST API layer is hard
9. © Cloudera, Inc. All rights reserved.
Status Quo
Among Cloudera’s customers (pre-merger):
• 14% Data Transfer Encryption
• 16% Data at Rest Encryption
• 19% RPC Encryption
• 44% Kerberized
Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
10. © Cloudera, Inc. All rights reserved.
Troubleshooting
Performance anomaly
• Openssl-devel lib
• Entropy
• rng-tools
• Secure Random
• hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random
.OpensslSecureRandom
Proxy user configuration
11. © Cloudera, Inc. All rights reserved.
Bad Practices
• No KMS HA
• KMS enabled, RPC encryption not enabled
• KMS enabled, but no Kerberos
• KMS w/o SSL
• Data transfer encryption is enabled, but using an unoptimized crypto algorithm
• 3DES, RC4, AES-NI
12. © Cloudera, Inc. All rights reserved.
Challenges
KMS Low Throughput
• NN can sustain > 100 thousand RPC ops/second
• namespace ops, block reports
• KMS: at most a few thousand RPC ops/second
• create, append, read, reencrypt
• 3-4 KMS servers not uncommon
• Jetty
• SSL Handshake
• Impala/Parquet with wide tables (> 100 columns)
13. © Cloudera, Inc. All rights reserved.
Future
Pluggable KMS ACL Framework (HADOOP-14951)
WebHDFS At Rest Encryption Support (HDFS-12355)
NFS Gateway At Rest Encryption Support (HDFS-13521)
Performance Improvements (HADOOP-15743, HADOOP-15811)
KMS Benchmark Tool (HADOOP-15967)
KMS over Hadoop RPC?
14. © Cloudera, Inc. All rights reserved.
●Current KMS Transport Layer
KMSClient
Jetty
http
client
Name
Node
http
client
REST API/HTTP
15. © Cloudera, Inc. All rights reserved.
KMS over Hadoop RPC
Benefit of KMS over Hadoop RPC:
• Proven performance
• Code reuse
KMSClient
Name
Node
Hadoop RPCHadoop
RPC
Hadoop
RPC
Hadoop
RPC
Notes de l'éditeur Caching improves performance. But some users’ environment prohibit caching due to security concerns. KMS was designed to be horizontally scalable. However, because Cloudera recommend 2 KMS-HA and 2 Keytrustee Servers for production workload, the cost for HA is high.