Contenu connexe Similaire à Security implementation on hadoop (20) Security implementation on hadoop1. 1© Cloudera, Inc. All rights reserved.
Security Implementation on Hadoop
Dr. Wei-Chiu Chuang | Software
Engineer
2. 2© Cloudera, Inc. All rights reserved.
$ whoami
Software Engineer, Cloudera Apache Hadoop Committer/PMC
4. 4© Cloudera, Inc. All rights reserved.
Regulatory Compliance
Organizations can be fined up to 4% of
annual global turnover for breaching GDPR
or €20 Million
6. 7© Cloudera, Inc. All rights reserved.
Disclaimer
This talk serves as a general guideline for
security implementation on Hadoop.
The actual implementation procedures and
scope of implementation vary on a case-
by-case basis, and should be assessed by
Cloudera’s Professional Services team or
certified Cloudera SI Partners.
8. 9© Cloudera, Inc. All rights reserved.
Firewall
ActiveDirectory/KDC
Hadoop cluster
Cloudera
Manager
Gateway
node
Cloudera
NavigatorDatacenter
Applications
10. 11© Cloudera, Inc. All rights reserved.
Identity Management
Simple Authentication
File group ownership
• AD integration
• SSSD or Centrify
Consideration in large enterprises.
SSSD
via
11. 12© Cloudera, Inc. All rights reserved.
System Diagram #0
Firewall
ActiveDirectory
Master
Worker Worker Worker
Cloudera
Manager
Master
(SSSD/Centrify)
14. 15© Cloudera, Inc. All rights reserved.
Kerberos
EXAMPLE.COM
KDC
user@EXAMPLE.COM
Hadoop
user@EXAMPLE.COM
user
Strong Authentication
KDC
• MIT
• ActiveDirectory (more common)
realmprimary
15. 16© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos principals
17. 18© Cloudera, Inc. All rights reserved.
Authorization/Access Control
HDFS File ACL YARN job submission
Hbase ACLsOozie ACL
Access Control List (ACLs)
Hive
Sentry Managed
(RBAC)
Impala
19. 20© Cloudera, Inc. All rights reserved.
Backup/Disaster Recovery
Cloudera Backup/Disaster Recovery (BDR)
• A high performance data replicator
• Copies incremental data on the source cluster at specified schedules
Supports
Kerberos
Data encryption
HDFS replication to cloud
20. 21© Cloudera, Inc. All rights reserved.
Kerberized BDR Best Practice
Production DR
Cloudera BDR
PROD.EXAMPLE.COM
Cross-realm trust
KDC KDC
DR.EXAMPLE.COM
21. 22© Cloudera, Inc. All rights reserved.
Firewall
System Diagram #1
ActiveDirectory/
KDC
Master
Worker Worker Worker
Cloudera
Manager
Kerberos
Master
(SSSD/Centrify)
DR
23. 24© Cloudera, Inc. All rights reserved.
Data In-Transit Encryption
RPC encryption
Data transport encryption
• Supports AES CTR, up to 256-bit
key length
HTTP TLS/SSL encryption
• No self-signed certificates in
production
Master
Worker Worker Worker
Master
Application
RPC encryption
Transport
encryption
TLS/SSL
24. 25© Cloudera, Inc. All rights reserved.
Data At-Rest Encryption
Transparent encryption
Supports any Hadoop applications
Encryption Zone
$ hadoop key create mykey
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName mykey -path /zone
/
/tmp
/zon
e
foo bar
Encryption zone
25. 26© Cloudera, Inc. All rights reserved.
Key Management Server Deployment (non-prod)
HDFS
NameNode
Client
Java
Keystore
KMS
Keystore
file
Separation of duties
• Encryption Zone Key (EZK) is stored in
KMS server
• HDFS super user can not decrypt files
26. 27© Cloudera, Inc. All rights reserved.
Key Management Server/Key Trustee Server Deployment
HDFS
NameNode
Client
Key Trustee
KMS
Key Trustee
KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
(or more)
27. 28© Cloudera, Inc. All rights reserved.
KMS+KTS+HSM Deployment
HDFS
NameNode
Client HSM KMS
HSM KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
Key HSM
(or more)
Key HSM
HSM
HSM
29. 30© Cloudera, Inc. All rights reserved.
Troubleshooting: Encryption Performance Anomaly
• Configuration
• AES-NI Hardware acceleration
• OpenSSL library
• Entropy
30. 31© Cloudera, Inc. All rights reserved.
Fine Grained Access Control with Apache Sentry
31. 32© Cloudera, Inc. All rights reserved.
Firewall
System Diagram #2
ActiveDirectory/
KDC
Master
Worker Worker Worker
Cloudera
Manager
Kerberos
Master
KMSKMS
Firewall
KeyTrusteeKeyTrustee
(SSSD/Centrify)
33. 34© Cloudera, Inc. All rights reserved.
Data Redaction
Personal Identifiable Information
• PCI-DSS, HIPAA
Best practice
Password
• stores in credential files, not in configuration
Log, queries
• Cloudera Manager
34. 35© Cloudera, Inc. All rights reserved.
Full Encryption
Encrypt Data Spills
• MapReduce
• Impala
• Hive
• Flume
OS-level encryption
• Navigator Encrypt
36. 37© Cloudera, Inc. All rights reserved.
Vulnerability Response and Process
Vulnerability
reports
Upstream
Internal
External
Fix Publish
CVE
Cloudera TSB
38. 39© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology Partners
Data Sources Data Ingest
Process, Refine
& Prep
Data Discovery Advanced Analytics
Connected
Machines/Data sources
Other Data Sources
39. 40© Cloudera, Inc. All rights reserved.
A certified product ensures it integrates with a secure
cluster
• Authenticate via Kerberos or LDAP
Authentication
• Handle Apache Sentry with Hive, Impala, Search, HDFS
Authorization
• Support HDFS transport encryption, at-rest encryption; support
SSL/TLS connection encryption
Encryption
41. 42© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
42
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE
SERVICES
CORE SERVICES
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA CATALOG
INGEST &
REPLICATION
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
DATA
SCIENCE
S3 ADLS HDFS KUDU
STORAGE
SERVICES
42. 43© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent
controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access
to all relevant data and increases compliance
• Easy workload management – increases user productivity
and boosts job predictability
• Flexible ingest and replication – aggregates a single copy of
all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and
business context of data for new applications and partner
solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
44. 45© Cloudera, Inc. All rights reserved.
Cloudera Overview & Financial Services Focus
2000
Strong Partner
Ecosystem
+
1600 Employees
Globally
+
19 Of the 30 G-SIBs Run
on Cloudera
Strong Focus &
Momentum in
Financial Services
3 Of the Fortune 500
Top 5 Insurers Run on
Cloudera
5 Of the Top 6 Asset
Management Firms
Run on Cloudera
200+
Financial Services
Customers
45. 47© Cloudera, Inc. All rights reserved.
Building a Fantastic Customer Experience
• Improved customer experience
• 80 percent reduction in operating costs
through a wide-range of customer
service and operational improvements
• Decrease in cost to service customers
while increasing revenue through better
service
CUSTOMER 360
FINANCIAL SERVICES
» PREDICTIVE ANALYTICS
» 360 CUSTOMER VIEW
» OPERATIONAL ANALYTICS
46. 48© Cloudera, Inc. All rights reserved.
Large healthcare
provider enables
practitioners to
recommend at-home
actions to prevent
hospital visits
• Flexible, automatic
data classification for
diverse medical
ontologies
• Self-service data
discovery for real-
time, data-driven
decisions
47. 49© Cloudera, Inc. All rights reserved.
Thank you
Wei-ChiuChuang | weichiu@cloudera.com