A presentation from the Data Works conference in 2018 that looks how Worldpay, a major payments provider, deployed a secure Hadoop cluster in order to meet business requirements and in the process became on e of the few fully certified PCI compliance clusters in the world
20. 20
Background: Kerberos
⬢ Strongly authenticating and establishing a user’s identity is the basis for secure
access in Hadoop
⬢ Users need to be able to reliably “identify” themselves and have identity
propagated throughout the Hadoop cluster
⬢ Design & implementation of Kerberos security in native Apache Hadoop was
delivered by Hortonworks co-founder Owen O’Malley!
⬢ Why Kerberos?
⬢ Establishes identity for clients, hosts and services
⬢ Prevents impersonation/passwords are never sent over the wire
⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory
⬢ More granular auditing of data access/job execution
21. 21
Background: HDP + Kerberos
Service
Component
A
Service
Component
B
HDP Cluster
KDC
keytabkeytab
Service
Component
C
keytab
Service
Component
D
keytab
Service
Component
X
Service
Component
X
keytabkeytab
Service
Component
X
keytab
Service
Component
X
keytab
Kerberos is used to
secure the
Components in the
cluster. Kerberos
identities are
managed via
“keytabs” on the
Component hosts.
Principals
for the
cluster are
managed in
the KDC.
22. 22
Automated Kerberos Setup with Ambari
à Wizard driven and automated Kerberos
support (kerberos principal creation for service
accounts, keytab generation and distribution
for appropriate hosts, permissions, etc.)
à Removes cumbersome, time consuming and
error prone administration of Kerberos
à Works with existing Kerberos infrastructure,
including Active Directory to automate
common tasks, removing the burden from the
operator:
• Add/Delete Host
• Add Service
• Add/Delete Component
• Regenerate Keytabs
• Disable Kerberos
23. 23
Kerberos + Active Directory
Page 23
Cross Realm Trust
Client
Hadoop Cluster
AD /
LDAP KDC
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Use existing directory
tools to manage users
Use Kerberos tools to
manage host + service
principals
Authentication
25. 25
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query
interface
AuditingAuthorization
• Store and manage encryption
keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user
context enrichers
• Easy to add new component types
for authorization
26. 26
Ranger – ABAC Model
v ABAC Model
v Combination of the subject, action,
resource, and environment
v Uses descriptive attributes: AD group,
Apache Atlas-based tags or
classifications, geo-location, etc.
v Ranger approach is consistent with NIST
800-162
v Avoid role proliferation and
manageability issues
27. 27
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MR
N
Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply
relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie
Schwarz
876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to
see data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
Analysts
HR Marketing
29. 29
Data Protection in Hadoop
must be applied at three different layers in
Apache Hadoop
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner
Products (HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking +
Encryption
Data Protection
30. 30
Data Protection – Layered
Approach• Encryption of Data at Rest
– OS Level Encryption (LUKS)
– Certified Partners for volume encryption (e.g: Vormetric (Thales) Protegrity, HPE Voltage Security)
– HDFS TDE file/folder level encryption with keys managed by Ranger KMS, External HSM integration
• Encryption of Data on the Wire
– All wire protocols can be encrypted by HDP platform
– Wire-level encryption enhancements (SSL).
• Granular Data Protection
– Dynamic Masking + Row Filtering for Hive with Ranger
– Classification Based Security with Ranger + Atlas
– Element level encryption/masking from certified partners (HPE Voltage, Protegrity)
31. 31
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits
v Selective encryption of relevant files/folders
v Prevent rogue admin access to sensitive data
v Fine grained access controls
v Transparent to end application w/o changes
v Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of
KMS
SafeNet-
Luna HSM
32. 32
HSM integration with Ranger KMS
à HSM client needs to be setup in
KMS nodes
à When installing Ranger KMS, HSM
parameters can be specified
à If KMS is already installed with DB,
Master key can be migrated to HSM
à All other TDE functionality remains
unchanged
33. 33
HSM integration with Ranger KMS
Only master key
will be in HSM
Other keys stored
in Ranger KMS DB
35. 35
Apache Atlas Vision: Open Metadata & Governance Services
STRUCTURED
UNSTRUCTURED
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
STREAMING
Custom
Partners
Comprehensive Enterprise Data Catalog
• Lists all of your data, where it is located, its origin (lineage), owner, structure,
meaning, classification and quality
• Integrate both on-premise and cloud platforms to provide enterprise wide view
Open Enterprise Data Connectors
• Interoperable connector framework to connect to your data catalog out of the
box with many vendor technologies
• No expensive population of proprietary siloed metadata repositories
Dynamic Metadata Discovery
• Metadata is added automatically to the catalog as new data is created or data is
updated
• Extensible discovery processes that characterize and classify the data
Enabling Collaboration & Workflows
• Subject matter experts locate the data they need quickly and efficiently, share
their knowledge about the data and its usage to help others
• Interested parties and processes are notified automatically
Automated Governance Processes
• Metadata-driven access control
• Auditing, metering, and monitoring
• Quality control and exception management
• Rights (entitlement) management
Predefined standards for glossaries, data schemas, rules and regulations
Vision:
Metadata-driven foundational
governance services for enterprise data
ecosystem
• Open frameworks and APIs
• Agile and secure collaboration around data and advanced
analytics
• Reduce operational costs while extracting economic value
of data
36. 36
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
38. 38
Walk Through Items
⬢ Ranger
⬢ ABAC Fine Grained Security
⬢ Resource/Masking/Row Filtering Policies
⬢ Audits – self audits/access/plugin audits, logins
⬢ User/Group/Roles in Ranger
⬢ Atlas
⬢ Search and tag assets
⬢ Tag Attributes
⬢ Tag based policies in Ranger