SlideShare une entreprise Scribd logo
1  sur  15
© Cloudera, Inc. All rights reserved.
Hadoop Encryption
Wei-Chiu Chuang, Cloudera
© Cloudera, Inc. All rights reserved.
Why Encryption
• Information leaks affect 10s to 100s of millions of people
• Personally identifiable information (PII)
• Credit cards, SSNs, account logins
• Encryption would have prevented some of these leaks
• Encryption is a regulatory requirement for many business sectors
• Finance (PCI DSS)
• Government (Data Protection Directive)
• Healthcare (DPD, HIPAA)
© Cloudera, Inc. All rights reserved.
Related Technologies
Data In-Motion Encryption
• SSL/TLS
• Hadoop Data Transfer Encryption
• Hadoop RPC Encryption
Data At-Rest Encryption
• At Linux volume level
• Transparent Encryption at HDFS level
• HBase Column Family level
• Parquet Column level Encryption Xinli
© Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
HDFS
Namespace
/
/data /tmp
/data/1 /data/f2
Encryption
zone
Encryption
zone key Data Encryption
Key (per file)
© Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
Client
KMS
MS
NN
DN
NameNode
DataNode
Key
Management
Server
in EZ?
© Cloudera, Inc. All rights reserved.
Features
• Minor performance impact on HDFS reads and writes
• OpenSSL and AES-NI acceleration
• 7.5% for reads, ~0% for writes
• Key ACLs
• Warm-up/Caching (*)
• Key rollover
© Cloudera, Inc. All rights reserved.
Dev History
• First released in Hadoop 2.6.0/ CDH5.3 in 2014 December
• Many, many bug fixes and enhancements
• Functional bugs, failure handling bugs, scale bugs
• Stable after Hadoop 2.8 / CDH5.11-ish
© Cloudera, Inc. All rights reserved.
Lesson Learned
Scale-out is not easy to deploy
Security
Endurance, scale tests are essential
Too little emphasis on KMS as a performance bottleneck
FileSystem#getDelegationToken() API/integration
High throughput REST API layer is hard
© Cloudera, Inc. All rights reserved.
Status Quo
Among Cloudera’s customers (pre-merger):
• 14% Data Transfer Encryption
• 16% Data at Rest Encryption
• 19% RPC Encryption
• 44% Kerberized
Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
© Cloudera, Inc. All rights reserved.
Troubleshooting
Performance anomaly
• Openssl-devel lib
• Entropy
• rng-tools
• Secure Random
• hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random
.OpensslSecureRandom
Proxy user configuration
© Cloudera, Inc. All rights reserved.
Bad Practices
• No KMS HA
• KMS enabled, RPC encryption not enabled
• KMS enabled, but no Kerberos
• KMS w/o SSL
• Data transfer encryption is enabled, but using an unoptimized crypto algorithm
• 3DES, RC4, AES-NI
© Cloudera, Inc. All rights reserved.
Challenges
KMS Low Throughput
• NN can sustain > 100 thousand RPC ops/second
• namespace ops, block reports
• KMS: at most a few thousand RPC ops/second
• create, append, read, reencrypt
• 3-4 KMS servers not uncommon
• Jetty
• SSL Handshake
• Impala/Parquet with wide tables (> 100 columns)
© Cloudera, Inc. All rights reserved.
Future
Pluggable KMS ACL Framework (HADOOP-14951)
WebHDFS At Rest Encryption Support (HDFS-12355)
NFS Gateway At Rest Encryption Support (HDFS-13521)
Performance Improvements (HADOOP-15743, HADOOP-15811)
KMS Benchmark Tool (HADOOP-15967)
KMS over Hadoop RPC?
© Cloudera, Inc. All rights reserved.
●Current KMS Transport Layer
KMSClient
Jetty
http
client
Name
Node
http
client
REST API/HTTP
© Cloudera, Inc. All rights reserved.
KMS over Hadoop RPC
Benefit of KMS over Hadoop RPC:
• Proven performance
• Code reuse
KMSClient
Name
Node
Hadoop RPCHadoop
RPC
Hadoop
RPC
Hadoop
RPC

Contenu connexe

Tendances

Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
openstackindia
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 

Tendances (20)

HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Redis vs Memcached
Redis vs MemcachedRedis vs Memcached
Redis vs Memcached
 
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
 
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 

Similaire à Hadoop Meetup Jan 2019 - Hadoop Encryption

The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 
Owasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet OverviewOwasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet Overview
owaspindy
 

Similaire à Hadoop Meetup Jan 2019 - Hadoop Encryption (20)

Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Owasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet OverviewOwasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet Overview
 
Big data security
Big data securityBig data security
Big data security
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Hadoop Meetup Jan 2019 - Hadoop Encryption

  • 1. © Cloudera, Inc. All rights reserved. Hadoop Encryption Wei-Chiu Chuang, Cloudera
  • 2. © Cloudera, Inc. All rights reserved. Why Encryption • Information leaks affect 10s to 100s of millions of people • Personally identifiable information (PII) • Credit cards, SSNs, account logins • Encryption would have prevented some of these leaks • Encryption is a regulatory requirement for many business sectors • Finance (PCI DSS) • Government (Data Protection Directive) • Healthcare (DPD, HIPAA)
  • 3. © Cloudera, Inc. All rights reserved. Related Technologies Data In-Motion Encryption • SSL/TLS • Hadoop Data Transfer Encryption • Hadoop RPC Encryption Data At-Rest Encryption • At Linux volume level • Transparent Encryption at HDFS level • HBase Column Family level • Parquet Column level Encryption Xinli
  • 4. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell HDFS Namespace / /data /tmp /data/1 /data/f2 Encryption zone Encryption zone key Data Encryption Key (per file)
  • 5. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell Client KMS MS NN DN NameNode DataNode Key Management Server in EZ?
  • 6. © Cloudera, Inc. All rights reserved. Features • Minor performance impact on HDFS reads and writes • OpenSSL and AES-NI acceleration • 7.5% for reads, ~0% for writes • Key ACLs • Warm-up/Caching (*) • Key rollover
  • 7. © Cloudera, Inc. All rights reserved. Dev History • First released in Hadoop 2.6.0/ CDH5.3 in 2014 December • Many, many bug fixes and enhancements • Functional bugs, failure handling bugs, scale bugs • Stable after Hadoop 2.8 / CDH5.11-ish
  • 8. © Cloudera, Inc. All rights reserved. Lesson Learned Scale-out is not easy to deploy Security Endurance, scale tests are essential Too little emphasis on KMS as a performance bottleneck FileSystem#getDelegationToken() API/integration High throughput REST API layer is hard
  • 9. © Cloudera, Inc. All rights reserved. Status Quo Among Cloudera’s customers (pre-merger): • 14% Data Transfer Encryption • 16% Data at Rest Encryption • 19% RPC Encryption • 44% Kerberized Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
  • 10. © Cloudera, Inc. All rights reserved. Troubleshooting Performance anomaly • Openssl-devel lib • Entropy • rng-tools • Secure Random • hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random .OpensslSecureRandom Proxy user configuration
  • 11. © Cloudera, Inc. All rights reserved. Bad Practices • No KMS HA • KMS enabled, RPC encryption not enabled • KMS enabled, but no Kerberos • KMS w/o SSL • Data transfer encryption is enabled, but using an unoptimized crypto algorithm • 3DES, RC4, AES-NI
  • 12. © Cloudera, Inc. All rights reserved. Challenges KMS Low Throughput • NN can sustain > 100 thousand RPC ops/second • namespace ops, block reports • KMS: at most a few thousand RPC ops/second • create, append, read, reencrypt • 3-4 KMS servers not uncommon • Jetty • SSL Handshake • Impala/Parquet with wide tables (> 100 columns)
  • 13. © Cloudera, Inc. All rights reserved. Future Pluggable KMS ACL Framework (HADOOP-14951) WebHDFS At Rest Encryption Support (HDFS-12355) NFS Gateway At Rest Encryption Support (HDFS-13521) Performance Improvements (HADOOP-15743, HADOOP-15811) KMS Benchmark Tool (HADOOP-15967) KMS over Hadoop RPC?
  • 14. © Cloudera, Inc. All rights reserved. ●Current KMS Transport Layer KMSClient Jetty http client Name Node http client REST API/HTTP
  • 15. © Cloudera, Inc. All rights reserved. KMS over Hadoop RPC Benefit of KMS over Hadoop RPC: • Proven performance • Code reuse KMSClient Name Node Hadoop RPCHadoop RPC Hadoop RPC Hadoop RPC

Notes de l'éditeur

  1. Caching improves performance. But some users’ environment prohibit caching due to security concerns.
  2. KMS was designed to be horizontally scalable. However, because Cloudera recommend 2 KMS-HA and 2 Keytrustee Servers for production workload, the cost for HA is high.