SlideShare a Scribd company logo
1 of 49
Download to read offline
Securing Hadoop 
Hadoop Security Demystified…and then made more confusing. 
Presenter: 
Adam Muise 
Content: 
Balaji Ganesan 
Adam Muise 
Page 1 © Hortonworks Inc. 2014
What do we mean by Security? 
Say you have a house guest… 
- Authentication 
- Who gets in the door 
- Authorization 
- How far are they allowed in the house and what rooms 
are they allowed in 
- Auditing 
- Follow them around 
- Encryption 
- When all else fails, lock it up 
Page 2 © Hortonworks Inc. 2014
Insecurity – Not just for Teenagers 
- Security is really about risk mitigation 
- No perfect solution exists unless you 
locate your datacenter in the hull of 
the Titanic and cut all communications 
- The risks are: 
- Inappropriate access to data by internal 
resources 
- External data theft 
- Service outages 
- No knowledge of theft or inappropriate 
access 
- Hadoop’s value to a business is to centralize 
their data, that can make leaks more 
detrimental than a DDoS or stolen laptops 
Page 3 © Hortonworks Inc. 2014
Attention to Hadoop security on the rise… 
Page 4 © Hortonworks Inc. 2014 
- As Hadoop becomes more 
adopted, more sensitive 
production data is going into 
clusters, more attention is being 
paid to security 
- Intel/Cloudera working on Project Rhino 
- Hortonworks introduces Apache Knox 
- Cloudera buys Gazzang 
- Hortonworks buys XASecure and turns it 
into Apache Argus 
- HBase gets cell level security 
- … the list goes on
Watch out for those malicious attacks… 
Page 5 © Hortonworks Inc. 2014
Layers Of Hadoop Security 
Perimeter Level Security 
• Network Security (i.e. Firewalls) 
• Apache Knox (i.e. Gateways) 
Authentication 
• Kerberos 
• Delegation Tokens 
Authorization 
• Argus Security Policies 
OS Security 
• File Permissions 
• Process Isolation 
Page 6 © Hortonworks Inc. 2014 
Data Protection 
• Transport 
• Storage 
• Access
Typical Hadoop Security 
Vanilla Hadoop 
Page 7 © Hortonworks Inc. 2014
Hadoop out of the box 
- While a lot of security is built into Hadoop, out of the box not much of it 
is turned on 
- Without strong authentication, anyone with sufficient access to 
underlying OS has ability to impersonate users 
- Often paired with gateway nodes that provide stronger access 
restrictions 
- HDFS/YARN/Hive 
- Authentication - Derived from OS users local to the box the task/request is submitted from 
- Authorization – Dependent on each project/service 
Page 8 © Hortonworks Inc. 2014
Page 9 © Hortonworks Inc. 2014 
HDFS 
Typical Flow – Hive Access 
HiveServer 2 
A B C 
Beeline 
Client
Typical Hadoop Security 
Strong Authentication through Kerberos 
Page 10 © Hortonworks Inc. 2014
Kerberos Primer 
Page 11 © Hortonworks Inc. 2014 
Page 11 
KDC 
Client 
NN 
DN 
1. kinit - Login and get Ticket Granting Ticket (TGT) 
3. Get NameNode Service Ticket (NN-ST) 
2. Client Stores TGT in Ticket Cache 
4. Client Stores NN-ST in Ticket Cache 
5. Read/write file given NN-ST and 
file name; returns block locations, 
block IDs and Block Access Tokens 
if access permitted 
6. Read/write block given 
Block Access Token and block ID 
Client’s 
Kerberos 
Ticket Cache
Kerberos Summary 
• Provides Strong Authentication 
• Establishes identity for users, services and hosts 
• Prevents impersonation on unauthorized account 
• Supports token delegation model 
• Works with existing directory services 
• Basis for Authorization 
Page 12 © Hortonworks Inc. 2014 
Page 12
Hadoop Authentication 
• Users authenticate with the services 
– CLI & API: Kerberos kinit or keytab 
– Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) 
• Services authenticate with each other 
– Prepopulated Kerberos keytab 
– e.g. DN->NN, NM->RM 
• Services propagate authenticated user identity 
– Authenticated trusted proxy service 
– e.g. Oozie->RM, Knox->WebHCat 
• Job tasks present delegated user’s identity/access 
– Delegation tokens 
– e.g. Job task -> NN, Job task -> JT/RM 
• Strong authentication is the basis for authorization 
Page 13 © Hortonworks Inc. 2014 
Client 
Page 13 
Name 
Node 
Data Node 
Name 
Node 
Oozie Job 
Tracker 
Task Name 
Node 
(User) 
Kerberos 
or 
Custom 
(Service) 
Kerberos 
(Service) 
Kerberos 
+ 
(User) 
doas 
(User) 
Delegation 
Token
User Management 
• Most implementations use LDAP for user info 
– LDAP guarantees that user information is consistent across the 
cluster 
– An easy way to manage users & groups 
– The standard user to group mapping comes from the OS on the 
NameNode 
• Kerberos provides authentication 
– PAM can automatically log user into Kerberos 
Page 14 © Hortonworks Inc. 2014 
Page 14
Kerberos + Active Directory 
Page 15 © Hortonworks Inc. 2014 
Page 15 
Cross Realm Trust 
Client 
Hadoop Cluster 
AD / 
LDAP KDC 
Users: smith@EXAMPLE.COM! 
Hosts: host1@HADOOP.EXAMPLE.COM! 
Services: hdfs/host1@HADOOP.EXAMPLE.COM! 
User Store 
Use existing directory 
tools to manage users 
Use Kerberos tools to 
manage host + service 
principals 
Authentication
Groups 
• Define groups for each required role 
• Hadoop has pluggable interface 
– Mapping from user to group not stored within Hadoop 
• Defaults to the OS information on master node 
– Typically driven from LDAP on Linux 
– Existing Plugins 
– ShellBasedUnixGroupsMapping - /bin/id 
– JniBasedUnixGroupsMapping – system call 
– LdapGroupsMapping – LDAP call 
– CompositeGroupMapping – combines Unix & LDAP group mapping 
• Strong authentication and role-based groups provide protections 
enabling shared clusters 
Page 16 © Hortonworks Inc. 2014 
Page 16
Groups 
AD / 
LDAP 
User Store 
Page 17 © Hortonworks Inc. 2014 
Plugin rw! 
Page 17 
NameNode 
Client Hadoop Cluster
Kerberos FAQ 
• Where do I install KDC? 
– On a master type node 
• User Provisioning 
– Hook up to Corporate AD/LDAP to leverage existing User Provisioning 
• Growing a cluster 
– Provision new services and nodes in MIT KDC, copy keytabs to new nodes 
• Is Kerberos a SPOF? 
– Kerberos support HA, with delegation tokens the KDC load is reduced 
Page 18 © Hortonworks Inc. 2014 
Page 18
Typical Flow – Authenticate through Kerberos 
Page 19 © Hortonworks Inc. 2014 
HDFS 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode (NN) 
service ticket 
Hive creates 
map reduce 
using NN ST 
Client gets 
service ticket for 
Hive 
Beeline 
Client
Typical Hadoop Security 
Strong Authentication + Cross-cutting Authorization 
Page 20 © Hortonworks Inc. 2014
Apache Argus (aka HDP Security) Capabilities 
Page 21 © Hortonworks Inc. 2014 
Hadoop and Argus 
Authentication 
Cross Platform Security Kerberos, Integration with AD 
Gateway for REST APIs Knox for http, REST APIs 
Role Based Authorizations 
Fine grained access control HDFS – Folder, File, 
Hive – Database, Table, Column, UDFs 
HBase – Table, Column Family, Column 
Wildcard Resource Names Yes 
Permission Support HDFS – Read, Write, Execute 
Hive – Select, Update, Create, Drop, Alter, Index, Lock 
Hbase – Read, Write, Create
Authorization and Audit 
Authorization 
Fine grain access control 
• HDFS – Folder, File 
• Hive – Database, Table, Column 
• HBase – Table, Column Family, Column 
Audit 
Extensive user access auditing in 
HDFS, Hive and HBase 
• IP Address 
• Resource type/ resource 
• Timestamp 
• Access granted or denied 
Page 22 © Hortonworks Inc. 2014 
Flexibility 
in defining 
policies 
Control 
access into 
system
Central Security Administration 
Apache Argus 
• Delivers a ‘single pane of glass’ for 
the security administrator 
• Centralizes administration of 
security policy 
• Ensures consistent coverage across 
the entire Hadoop stack 
Page 23 © Hortonworks Inc. 2014
Setup Authorization Policies 
24 
Page 24 © Hortonworks Inc. 2014 
file level 
access 
control, 
flexible 
definition 
Control 
permissions
Monitor through Auditing 
25 
Page 25 © Hortonworks Inc. 2014
Authorization and Auditing with Argus 
Hadoop distributed 
file system (HDFS) 
Page 26 © Hortonworks Inc. 2014 
Argus Administration Portal 
HBase 
Hive Server2 
Argus Policy 
Server 
Argus Audit 
Server 
Argus 
Agent 
Hadoop Components Enterprise 
Users 
Argus 
Agent 
Argus 
Agent 
Legacy 
Tools 
Integration API 
RDBMS 
HDFS 
Knox 
Falcon 
Argus 
Agent* 
Argus 
Agent* 
Argus 
Agent* 
Storm 
YARN 
: 
Data 
Opera.ng 
System 
* - Future Integration
Simplified Workflow - HDFS 
Users access HDFS data 
through application Name Node 
Page 27 © Hortonworks Inc. 2014 
Argus 
Policy 
Manager 
Argus Agent 
Admin sets policies for HDFS 
files/folder 
User 
Application 
Data scientist runs a 
map reduce job 
IT users access 
HDFS through 
CLI 
Namenode uses 
Argus Agent for 
Authorization 
Audit 
Database Audit logs pushed to DB 
Namenode provides 
resource access to 
user/client 
1 
2 
2 
2 
3 
4 
5
Simplified Workflow - Hive 
28 
Page 28 © Hortonworks Inc. 2014 
Audit logs pushed to DB 
Argus Agent 
Admin sets policies for Hive db/ 
tables/columns 
Hive Server2 
HiveServer2 
provide data 
access to 
users 
1 
3 
4 
5 
IT users access 
Hive via beeline 
2 command tool 
Hive 
Authorizes with 
Argus Agent 
2 
Users access Hive data using 
JDBC/ODBC 
Argus 
Policy 
Manager 
User 
Application 
Audit 
Database
Simplified Workflow - HBase 
29 
Page 29 © Hortonworks Inc. 2014 
Audit 
Database Audit logs pushed to DB 
Argus 
Policy 
Manager 
Argus Agent 
Admin sets policies for HBase 
table/cf/column 
User 
Application 
Data scientist runs a 
map reduce job 
Hbase Server 
HBase server 
provide data 
access to users 
1 
2 
3 
4 
5 
IT users access 
Hbase via 
HBShell 
2 
HBase Authorizes 
with Argus Agent 
2 
Users access HBase data 
using Java API
Typical Flow – Add Authorization through Argus 
Page 30 © Hortonworks Inc. 2014 
HDFS 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode (NN) 
service ticket 
Argus 
Hive creates 
map reduce 
using NN ST 
Client gets 
service ticket for 
Hive 
Beeline 
Client
Typical Hadoop Security 
Strong Authentication + Cross-cutting Authorization + Perimeter 
Security 
Page 31 © Hortonworks Inc. 2014
What does Perimeter Security really mean? 
REST API 
Page 32 © Hortonworks Inc. 2014 
Hadoop 
Services 
Gateway 
REST API 
Firewall 
User 
Firewall 
required at 
perimeter 
(today) 
Knox Gateway 
controls all 
Hadoop REST 
API access 
through firewall 
Hadoop 
cluster 
mostly 
unaffected 
Firewall only 
allows 
connections 
through specific 
ports from Knox 
host
Why Knox? 
Simplified Access 
• Kerberos encapsulation 
• Extends API reach 
• Single access point 
• Multi-cluster support 
• Single SSL certificate 
Page 33 © Hortonworks Inc. 2014 
Centralized Control 
• Central REST API auditing 
• Service-level authorization 
• Alternative to SSH “edge node” 
Enterprise Integration 
• LDAP integration 
• Active Directory integration 
• SSO integration 
• Apache Shiro extensibility 
• Custom extensibility 
Enhanced Security 
• Protect network details 
• Partial SSL for non-SSL services 
• WebApp vulnerability filter
Current Hadoop Client Model 
• FileSystem and MapReduce Java APIs 
• HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) 
• Typical use of APIs is via “Edge Node” that is “inside” cluster 
• Users SSH to Edge Node and execute API commands from shell 
Page 34 © Hortonworks Inc. 2014 
Page 34 
SSH! 
User Edge Node Hadoop
Hadoop REST APIs 
Service API 
WebHDFS Supports HDFS user operations including reading files, writing to files, 
making directories, changing permissions and renaming. Learn more about 
WebHDFS. 
WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL 
• Useful for connecting to Hadoop from the outside the cluster 
• When more client language flexibility is required 
– i.e. Java binding not an option 
• Challenges 
– Client must have knowledge of cluster topology 
– Required to open ports (and in some cases, on every host) outside the cluster 
Page 35 © Hortonworks Inc. 2014 
Page 35 
commands. Learn more about WebHCat. 
Hive Hive REST API operations 
HBase HBase REST API operations 
Oozie Job submission and management, and Oozie administration. Learn more 
about Oozie.
Knox Deployment with Hadoop Cluster 
Application Tier 
DMZ 
Switch 
NN 
SNN 
Page 36 © Hortonworks Inc. 2014 
LB 
Switch Switch 
…. 
Master Nodes 
Rack 1 
Switch Switch 
DN DN 
…. 
Slave Nodes 
Rack 2 
…. 
Slave Nodes 
Rack N 
Web Tier 
Knox 
Hadoop 
CLIs
Hadoop REST API Security: Drill-Down 
Page 37 © Hortonworks Inc. 2014 
Page 37 
REST 
Client 
Enterprise 
Identity 
Provider 
LDAP/AD 
Knox Gateway 
GGWW 
Firewall 
Firewall 
DMZ 
LB 
Edge Node/ 
Hadoop 
CLIs RPC 
HTTP 
HTTP HTTP 
LDAP 
Hadoop Cluster 1 
Masters 
Slaves 
NN 
RM 
Web 
Oozie HCat 
DN NM 
HBase 
HS2 
Hadoop Cluster 2 
Masters 
Slaves 
NN 
RM 
Web 
Oozie HCat 
DN NM 
HBase 
HS2
OpenLDAP Configuration 
• In sandbox.xml: 
<param> 
<name>main.ldapRealm</name> 
<value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> 
</param> 
<param> 
<name>main.ldapRealm.userDnTemplate</name> 
<value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> 
</param> 
<param> 
<name>main.ldapRealm.contextFactory.url</name> 
<value>ldap://localhost:33389</value> 
</param> 
Page 38 © Hortonworks Inc. 2014 
Page 38
Service level authorization Configuration 
• In <cluster.xml> 
<provider> 
<role>authorization</role> 
<name>AclsAuthz</name> 
<enabled>true</enabled> 
<param> 
<name>webhdfs.acl.mode</name> 
<value>OR</value> 
</param> 
<param> 
<name>webhdfs.acl</name> 
<value>guest;*;*</value> <-Format user(s);groups;ipaddress 
</param> 
<param> 
<name>webhcat.acl</name> 
<value>hdfs;admin;127.0.0.2,127.0.0.3</value> 
</param> 
</provider> 
Page 39 © Hortonworks Inc. 2014 
Page 39
Page 40 © Hortonworks Inc. 2014 
HDFS 
Typical Flow – Firewall, Route through Knox 
Gateway 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode (NN) 
service ticket 
Argus 
Hive creates 
map reduce 
using NN ST 
Knox runs as proxy 
user using Hive ST 
Knox gets 
service ticket for 
Hive 
Original 
request w/user 
id/password 
Client gets 
query result 
Beeline 
Client
SSL 
Page 41 © Hortonworks Inc. 2014 
HDFS 
Optionally - Add Wire and File Encryption 
SSL SSL 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode (NN) 
service ticket 
Argus 
Hive creates 
map reduce 
using NN ST 
Knox runs as proxy 
user using Hive ST 
Knox gets 
service ticket for 
Hive 
Original 
request w/user 
id/password 
Client gets 
query result 
Beeline 
Client 
SASL SASL
Security Features 
Page 42 © Hortonworks Inc. 2014 
Hadoop with Argus 
Auditing 
Configurable audit Yes, auditing can be controlled through policy 
Resource access 
auditing 
User id, request type, repository, access resource, IP address, 
timestamp, access granted/denied 
Admin auditing Changes to policies, login sessions and agent monitoring, 
Data Protection 
Over the wire SASL for RPC, SSL for MR shuffle, Web HDFS 
Data at rest LUKS for Volume Encryption, Partners 
Manage 
User/ Group mapping Local, Sync with LDAP/AD, Sync with Unix 
Delegated administration Delegate policy administration to groups or users
Data Protection 
Page 43 © Hortonworks Inc. 2014 
Page 43
Data Protection 
HDP allows you to apply data protection policy at 
three different layers across the Hadoop stack 
Layer What? How ? 
Storage Encrypt data while it is at rest 3rd Party, Future Hadoop improvements 
Transmission Encrypt data as it moves Already in Hadoop 
Upon Access Apply restrictions when accessed 3rd Party 
Page 44 © Hortonworks Inc. 2014
Points of Communication 
Page 45 © Hortonworks Inc. 2014 
Page 45 
WebHDFS 
DataTransferProtocol 
Nodes 
2 DataTransfer 
3 RPC Nodes 
M/R Shuffle 
Client 
1 
2 
4 
JDBC/ODBC 
3 
Hadoop Cluster 
RPC 
4
Data Transmission Protection in HDP 2.1 
• WebHDFS 
– Provides read/write access to HDFS 
– Optionally enable HTTPS 
– Authenticated using SPNEGO (Kerberos for HTTP) filter 
– SSL based wire encryption 
• RPC 
– Communications between NNs, DNs, etc. and Clients 
– SASL based wire encryption 
– DTP encryption with SASL 
• JDBC/ODBC 
– SSL based wire encryption 
– Also available SASL based encryption 
• Shuffle 
– Mapper to Reducer over HTTP(S) with SSL 
Page 46 © Hortonworks Inc. 2014 
46
Data Storage Protection 
• Encrypt at the physical file system level (e.g. dm-crypt) 
• Encrypt via custom HDFS “compression” codec 
• Encrypt at Application level (including security service/device) 
Page 47 © Hortonworks Inc. 2014 
DEF ABC 
Page 47 
Security Service 
(e.g. Voltage) 
ABC 1a3d HDFS 
ABC DEF 
ETL App 
ENCRYPT DECRYPT
Current Open Source Initiatives 
• HDFS Encryption 
– Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community 
– Dependency on Key Management Server and Keyshell 
• Hive Column Level Encryption 
• HBase Column Level Encryption 
– Transparent Column Encryption, needs more testing/validation 
• Key Management Server 
• Key Provider API 
• Command line Key Operations 
Page 48 © Hortonworks Inc. 2014
And remember…. 
Page 49 © Hortonworks Inc. 2014

More Related Content

What's hot

Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

What's hot (20)

Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 

Viewers also liked

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 
Performance based-assessment
Performance based-assessmentPerformance based-assessment
Performance based-assessment
luisagodoy444
 

Viewers also liked (17)

Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Online assessment and data analytics - Peter Tan - Institute of Technical Edu...
Online assessment and data analytics - Peter Tan - Institute of Technical Edu...Online assessment and data analytics - Peter Tan - Institute of Technical Edu...
Online assessment and data analytics - Peter Tan - Institute of Technical Edu...
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
 
Big security for big data
Big security for big dataBig security for big data
Big security for big data
 
Big Data Security with HP ArcSight
Big Data Security with HP ArcSightBig Data Security with HP ArcSight
Big Data Security with HP ArcSight
 
What are performance assessments?
What are performance assessments?What are performance assessments?
What are performance assessments?
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
LPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to executeLPWA-Open for Business. It’s time to execute
LPWA-Open for Business. It’s time to execute
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
IoT - Big Data & Security
IoT - Big Data & SecurityIoT - Big Data & Security
IoT - Big Data & Security
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Performance based-assessment
Performance based-assessmentPerformance based-assessment
Performance based-assessment
 

Similar to 2014 sept 4_hadoop_security

Similar to 2014 sept 4_hadoop_security (20)

August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization models
 

More from Adam Muise

More from Adam Muise (20)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

2014 sept 4_hadoop_security

  • 1. Securing Hadoop Hadoop Security Demystified…and then made more confusing. Presenter: Adam Muise Content: Balaji Ganesan Adam Muise Page 1 © Hortonworks Inc. 2014
  • 2. What do we mean by Security? Say you have a house guest… - Authentication - Who gets in the door - Authorization - How far are they allowed in the house and what rooms are they allowed in - Auditing - Follow them around - Encryption - When all else fails, lock it up Page 2 © Hortonworks Inc. 2014
  • 3. Insecurity – Not just for Teenagers - Security is really about risk mitigation - No perfect solution exists unless you locate your datacenter in the hull of the Titanic and cut all communications - The risks are: - Inappropriate access to data by internal resources - External data theft - Service outages - No knowledge of theft or inappropriate access - Hadoop’s value to a business is to centralize their data, that can make leaks more detrimental than a DDoS or stolen laptops Page 3 © Hortonworks Inc. 2014
  • 4. Attention to Hadoop security on the rise… Page 4 © Hortonworks Inc. 2014 - As Hadoop becomes more adopted, more sensitive production data is going into clusters, more attention is being paid to security - Intel/Cloudera working on Project Rhino - Hortonworks introduces Apache Knox - Cloudera buys Gazzang - Hortonworks buys XASecure and turns it into Apache Argus - HBase gets cell level security - … the list goes on
  • 5. Watch out for those malicious attacks… Page 5 © Hortonworks Inc. 2014
  • 6. Layers Of Hadoop Security Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • Kerberos • Delegation Tokens Authorization • Argus Security Policies OS Security • File Permissions • Process Isolation Page 6 © Hortonworks Inc. 2014 Data Protection • Transport • Storage • Access
  • 7. Typical Hadoop Security Vanilla Hadoop Page 7 © Hortonworks Inc. 2014
  • 8. Hadoop out of the box - While a lot of security is built into Hadoop, out of the box not much of it is turned on - Without strong authentication, anyone with sufficient access to underlying OS has ability to impersonate users - Often paired with gateway nodes that provide stronger access restrictions - HDFS/YARN/Hive - Authentication - Derived from OS users local to the box the task/request is submitted from - Authorization – Dependent on each project/service Page 8 © Hortonworks Inc. 2014
  • 9. Page 9 © Hortonworks Inc. 2014 HDFS Typical Flow – Hive Access HiveServer 2 A B C Beeline Client
  • 10. Typical Hadoop Security Strong Authentication through Kerberos Page 10 © Hortonworks Inc. 2014
  • 11. Kerberos Primer Page 11 © Hortonworks Inc. 2014 Page 11 KDC Client NN DN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache
  • 12. Kerberos Summary • Provides Strong Authentication • Establishes identity for users, services and hosts • Prevents impersonation on unauthorized account • Supports token delegation model • Works with existing directory services • Basis for Authorization Page 12 © Hortonworks Inc. 2014 Page 12
  • 13. Hadoop Authentication • Users authenticate with the services – CLI & API: Kerberos kinit or keytab – Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) • Services authenticate with each other – Prepopulated Kerberos keytab – e.g. DN->NN, NM->RM • Services propagate authenticated user identity – Authenticated trusted proxy service – e.g. Oozie->RM, Knox->WebHCat • Job tasks present delegated user’s identity/access – Delegation tokens – e.g. Job task -> NN, Job task -> JT/RM • Strong authentication is the basis for authorization Page 13 © Hortonworks Inc. 2014 Client Page 13 Name Node Data Node Name Node Oozie Job Tracker Task Name Node (User) Kerberos or Custom (Service) Kerberos (Service) Kerberos + (User) doas (User) Delegation Token
  • 14. User Management • Most implementations use LDAP for user info – LDAP guarantees that user information is consistent across the cluster – An easy way to manage users & groups – The standard user to group mapping comes from the OS on the NameNode • Kerberos provides authentication – PAM can automatically log user into Kerberos Page 14 © Hortonworks Inc. 2014 Page 14
  • 15. Kerberos + Active Directory Page 15 © Hortonworks Inc. 2014 Page 15 Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Users: smith@EXAMPLE.COM! Hosts: host1@HADOOP.EXAMPLE.COM! Services: hdfs/host1@HADOOP.EXAMPLE.COM! User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication
  • 16. Groups • Define groups for each required role • Hadoop has pluggable interface – Mapping from user to group not stored within Hadoop • Defaults to the OS information on master node – Typically driven from LDAP on Linux – Existing Plugins – ShellBasedUnixGroupsMapping - /bin/id – JniBasedUnixGroupsMapping – system call – LdapGroupsMapping – LDAP call – CompositeGroupMapping – combines Unix & LDAP group mapping • Strong authentication and role-based groups provide protections enabling shared clusters Page 16 © Hortonworks Inc. 2014 Page 16
  • 17. Groups AD / LDAP User Store Page 17 © Hortonworks Inc. 2014 Plugin rw! Page 17 NameNode Client Hadoop Cluster
  • 18. Kerberos FAQ • Where do I install KDC? – On a master type node • User Provisioning – Hook up to Corporate AD/LDAP to leverage existing User Provisioning • Growing a cluster – Provision new services and nodes in MIT KDC, copy keytabs to new nodes • Is Kerberos a SPOF? – Kerberos support HA, with delegation tokens the KDC load is reduced Page 18 © Hortonworks Inc. 2014 Page 18
  • 19. Typical Flow – Authenticate through Kerberos Page 19 © Hortonworks Inc. 2014 HDFS HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Client gets service ticket for Hive Beeline Client
  • 20. Typical Hadoop Security Strong Authentication + Cross-cutting Authorization Page 20 © Hortonworks Inc. 2014
  • 21. Apache Argus (aka HDP Security) Capabilities Page 21 © Hortonworks Inc. 2014 Hadoop and Argus Authentication Cross Platform Security Kerberos, Integration with AD Gateway for REST APIs Knox for http, REST APIs Role Based Authorizations Fine grained access control HDFS – Folder, File, Hive – Database, Table, Column, UDFs HBase – Table, Column Family, Column Wildcard Resource Names Yes Permission Support HDFS – Read, Write, Execute Hive – Select, Update, Create, Drop, Alter, Index, Lock Hbase – Read, Write, Create
  • 22. Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Page 22 © Hortonworks Inc. 2014 Flexibility in defining policies Control access into system
  • 23. Central Security Administration Apache Argus • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack Page 23 © Hortonworks Inc. 2014
  • 24. Setup Authorization Policies 24 Page 24 © Hortonworks Inc. 2014 file level access control, flexible definition Control permissions
  • 25. Monitor through Auditing 25 Page 25 © Hortonworks Inc. 2014
  • 26. Authorization and Auditing with Argus Hadoop distributed file system (HDFS) Page 26 © Hortonworks Inc. 2014 Argus Administration Portal HBase Hive Server2 Argus Policy Server Argus Audit Server Argus Agent Hadoop Components Enterprise Users Argus Agent Argus Agent Legacy Tools Integration API RDBMS HDFS Knox Falcon Argus Agent* Argus Agent* Argus Agent* Storm YARN : Data Opera.ng System * - Future Integration
  • 27. Simplified Workflow - HDFS Users access HDFS data through application Name Node Page 27 © Hortonworks Inc. 2014 Argus Policy Manager Argus Agent Admin sets policies for HDFS files/folder User Application Data scientist runs a map reduce job IT users access HDFS through CLI Namenode uses Argus Agent for Authorization Audit Database Audit logs pushed to DB Namenode provides resource access to user/client 1 2 2 2 3 4 5
  • 28. Simplified Workflow - Hive 28 Page 28 © Hortonworks Inc. 2014 Audit logs pushed to DB Argus Agent Admin sets policies for Hive db/ tables/columns Hive Server2 HiveServer2 provide data access to users 1 3 4 5 IT users access Hive via beeline 2 command tool Hive Authorizes with Argus Agent 2 Users access Hive data using JDBC/ODBC Argus Policy Manager User Application Audit Database
  • 29. Simplified Workflow - HBase 29 Page 29 © Hortonworks Inc. 2014 Audit Database Audit logs pushed to DB Argus Policy Manager Argus Agent Admin sets policies for HBase table/cf/column User Application Data scientist runs a map reduce job Hbase Server HBase server provide data access to users 1 2 3 4 5 IT users access Hbase via HBShell 2 HBase Authorizes with Argus Agent 2 Users access HBase data using Java API
  • 30. Typical Flow – Add Authorization through Argus Page 30 © Hortonworks Inc. 2014 HDFS HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Argus Hive creates map reduce using NN ST Client gets service ticket for Hive Beeline Client
  • 31. Typical Hadoop Security Strong Authentication + Cross-cutting Authorization + Perimeter Security Page 31 © Hortonworks Inc. 2014
  • 32. What does Perimeter Security really mean? REST API Page 32 © Hortonworks Inc. 2014 Hadoop Services Gateway REST API Firewall User Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host
  • 33. Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Page 33 © Hortonworks Inc. 2014 Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter
  • 34. Current Hadoop Client Model • FileSystem and MapReduce Java APIs • HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) • Typical use of APIs is via “Edge Node” that is “inside” cluster • Users SSH to Edge Node and execute API commands from shell Page 34 © Hortonworks Inc. 2014 Page 34 SSH! User Edge Node Hadoop
  • 35. Hadoop REST APIs Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL • Useful for connecting to Hadoop from the outside the cluster • When more client language flexibility is required – i.e. Java binding not an option • Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster Page 35 © Hortonworks Inc. 2014 Page 35 commands. Learn more about WebHCat. Hive Hive REST API operations HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie.
  • 36. Knox Deployment with Hadoop Cluster Application Tier DMZ Switch NN SNN Page 36 © Hortonworks Inc. 2014 LB Switch Switch …. Master Nodes Rack 1 Switch Switch DN DN …. Slave Nodes Rack 2 …. Slave Nodes Rack N Web Tier Knox Hadoop CLIs
  • 37. Hadoop REST API Security: Drill-Down Page 37 © Hortonworks Inc. 2014 Page 37 REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GGWW Firewall Firewall DMZ LB Edge Node/ Hadoop CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves NN RM Web Oozie HCat DN NM HBase HS2 Hadoop Cluster 2 Masters Slaves NN RM Web Oozie HCat DN NM HBase HS2
  • 38. OpenLDAP Configuration • In sandbox.xml: <param> <name>main.ldapRealm</name> <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> Page 38 © Hortonworks Inc. 2014 Page 38
  • 39. Service level authorization Configuration • In <cluster.xml> <provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>true</enabled> <param> <name>webhdfs.acl.mode</name> <value>OR</value> </param> <param> <name>webhdfs.acl</name> <value>guest;*;*</value> <-Format user(s);groups;ipaddress </param> <param> <name>webhcat.acl</name> <value>hdfs;admin;127.0.0.2,127.0.0.3</value> </param> </provider> Page 39 © Hortonworks Inc. 2014 Page 39
  • 40. Page 40 © Hortonworks Inc. 2014 HDFS Typical Flow – Firewall, Route through Knox Gateway HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Argus Hive creates map reduce using NN ST Knox runs as proxy user using Hive ST Knox gets service ticket for Hive Original request w/user id/password Client gets query result Beeline Client
  • 41. SSL Page 41 © Hortonworks Inc. 2014 HDFS Optionally - Add Wire and File Encryption SSL SSL HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Argus Hive creates map reduce using NN ST Knox runs as proxy user using Hive ST Knox gets service ticket for Hive Original request w/user id/password Client gets query result Beeline Client SASL SASL
  • 42. Security Features Page 42 © Hortonworks Inc. 2014 Hadoop with Argus Auditing Configurable audit Yes, auditing can be controlled through policy Resource access auditing User id, request type, repository, access resource, IP address, timestamp, access granted/denied Admin auditing Changes to policies, login sessions and agent monitoring, Data Protection Over the wire SASL for RPC, SSL for MR shuffle, Web HDFS Data at rest LUKS for Volume Encryption, Partners Manage User/ Group mapping Local, Sync with LDAP/AD, Sync with Unix Delegated administration Delegate policy administration to groups or users
  • 43. Data Protection Page 43 © Hortonworks Inc. 2014 Page 43
  • 44. Data Protection HDP allows you to apply data protection policy at three different layers across the Hadoop stack Layer What? How ? Storage Encrypt data while it is at rest 3rd Party, Future Hadoop improvements Transmission Encrypt data as it moves Already in Hadoop Upon Access Apply restrictions when accessed 3rd Party Page 44 © Hortonworks Inc. 2014
  • 45. Points of Communication Page 45 © Hortonworks Inc. 2014 Page 45 WebHDFS DataTransferProtocol Nodes 2 DataTransfer 3 RPC Nodes M/R Shuffle Client 1 2 4 JDBC/ODBC 3 Hadoop Cluster RPC 4
  • 46. Data Transmission Protection in HDP 2.1 • WebHDFS – Provides read/write access to HDFS – Optionally enable HTTPS – Authenticated using SPNEGO (Kerberos for HTTP) filter – SSL based wire encryption • RPC – Communications between NNs, DNs, etc. and Clients – SASL based wire encryption – DTP encryption with SASL • JDBC/ODBC – SSL based wire encryption – Also available SASL based encryption • Shuffle – Mapper to Reducer over HTTP(S) with SSL Page 46 © Hortonworks Inc. 2014 46
  • 47. Data Storage Protection • Encrypt at the physical file system level (e.g. dm-crypt) • Encrypt via custom HDFS “compression” codec • Encrypt at Application level (including security service/device) Page 47 © Hortonworks Inc. 2014 DEF ABC Page 47 Security Service (e.g. Voltage) ABC 1a3d HDFS ABC DEF ETL App ENCRYPT DECRYPT
  • 48. Current Open Source Initiatives • HDFS Encryption – Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community – Dependency on Key Management Server and Keyshell • Hive Column Level Encryption • HBase Column Level Encryption – Transparent Column Encryption, needs more testing/validation • Key Management Server • Key Provider API • Command line Key Operations Page 48 © Hortonworks Inc. 2014
  • 49. And remember…. Page 49 © Hortonworks Inc. 2014