The Hadoop ecosystem is vast, and there's a lot of conflicting information available about how to best secure any given implementation. It's also difficult to fix any mistakes made early on once an instance is put into production. In this paper, we demonstrate the currently accepted best practices for securing and Kerberizing Hadoop clusters in a vendor-agnostic way, review some of the not-so-obvious pitfalls one could encounter during the process, and delve into some of the theory behind why things are the way they are.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity
1. SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity
Evan Kinney, Security Software Developer
SAS
2. $whoami
Security Software Developer
at SAS
Platform authn/authz,
general security
Kerberos; all the time
always.
Open source
developer/contributor/wrangle
r
Dinosaur enthusiast
Twitter: @evankinney
GitHub: 3van
Freenode: evanosaurus
3. Why are we here?
Hadoop isn’t secure by default
Fixing this is non-trivial
Mistakes made are hard to diagnose
No consistent story for authentication (yet)
Kerberos is hard
4. Let’s talk about the elephant in the room.
(this is the first time that pun has ever been used.)
When created, Hadoop didn’t need security
Components were almost exclusively single-tenant and isolated
What problems do we have?
HDFS and MapReduce
Blindly trust that a user is who they say they are
Allow for arbitrary Java code execution as the JobTracker service account
Trivial to circumvent permissions restrictions
Rogue services
Services don’t authenticate each other
DataNodes
Know the block ID? Awesome, here are your dataz!
5. Okay, great—now how do we fix it?
Kerberos! (surprise)
Doesn’t require significantly changing the architecture
Uses symmetric encryption
Allows services to authenticate each other
Allows services to authenticate users
Allows users to authenticate services
6. Let’s talk about the three-headed dog in the room.
Kerberos is a standard protocol [RFC 4120] that allows for
mutual authentication of entities via the use of a trusted arbiter
Developed at MIT originally for AFS
V4 - 1988[ish]
V5 - 1993; 2005
Used in a lot more places than most people think
Active Directory == LDAP + Kerberos
Once it’s working, is usually fairly transparent to users
7. Some Kerberos Terminology
principal: an entity that can authenticate or be authenticated to
realm: an administrative partition that contains one or more principals; always given in
uppercase
ticket: ASN.1 structure containing information about a request as well as authenticators to
verify the info
ticket granting ticket (TGT): the result of initial authentication with a KDC; used to assert
identity in further requests
service ticket: issued to a principal for use in asserting identity to Kerberized services
credentials cache (CC): contains one or more tickets for one principal
keytab: holds any number of pre-salted/hashed long-term keys (passwords); generally used for
headless/automated services
key distribution center (KDC): the trusted arbiter; stores keys for one or more realms and
answers authentication requests from principals
ticket granting service (TGS): accepts TGT session keys, issues service tickets
authentication service (AS): accepts salted/hashed password, issues TGT
9. Hadoop and Kerberos
Hadoop relies on user accounts to enforce ACLs, et al. (if not
configured to look things up in LDAP)
Generally speaking, a Kerberos principal is not necessarily associated
with a POSIX user account (though, in practice, they usually are)
Once authenticated via Kerberos, a user is issued a delegation token
(specific to Hadoop) for use in further requests
Scheduling is… hard
Default ticket lifetime is 10 hours, can be renewed for 7 days
Most distributions assume you’re running an isolated realm with your
own KDC
10. What about Active Directory?
Great for users; not so great for admins
Two different deployment architectures:
only Active Directory
both user and Hadoop service principals exist as AD objects
Cross-realm principals (trusts)
Hadoop service principals exist in a pure Kerberos realm,
users exist in AD
Both have fun issues
Where’s the user data coming from?
11. Only Active Directory
Tons and tons of objects to create for Hadoop services
Unless using a vendor-supplied management
mechanism, setup is a very manual process
Usually requires IT involvement any time changes are
made
AD’s idea of Kerberos != everyone else’s
12. Cross-Realm Principal Architecture
Much more complex; harder to debug
Unless you configure the KDC with replication (or a backend
database that replicates itself), it becomes a massive SPoF
You have to administer the KDC
Getting AD to use the correct encryption types is somewhat
challenging
Windows (i.e. purely SSPI-based) clients tend to not work
consistently (if at all)
13. Speaking of encryption types…
Three are the most used today:
aes256-cts-hmac-sha1-96
aes128-cts-hmac-sha1-96
arcfour-hmac-md5
Don’t use DES
or 3DES, preferably… but especially not DES
Crypto export/import restrictions cause issues with Java
The unlimited-strength JCE policy files must be present in the JRE to allow the use
of aes256-cts
AD won’t do AES if the domain functional level (DFL) is lower than Server 2008
Almost all other libraries default to using the cipher suites in the order above, unless
configured otherwise
14. Considerations for Hadoop
Kerberos doesn’t encrypt your data or traffic
Communication between all DataNodes and NameNode(s)
should be isolated and/or protected (via hadoop.rpc.protection)
If users have access to the files themselves, ACLs are basically
useless
If they have root/admin access to the servers…
…none of this matters anyway
Hadoop services determine what their hostname (and, thus, service
principal name) is via reverse DNS (or via fs.default.name, if set)
Also, Kerberos itself is very, very dependent upon properly
administered DNS records and local client configuration
15. Considerations for SAS
SAS/ACCESS® Interface to Hadoop™
Uses Java, so subject to aforementioned issues
SAS® Enterprise Guide®, Web-based products (e.g. SAS® Visual
Analytics), et al.
Need to configure sasauth for PAM authentication
Need to configure PAM to obtain Kerberos credentials on login (via
SSSD, pam_krb5, QAS, etc.)
If AD: need to configure nsswitch to obtain user info from AD (via
SSSD, nss_ldap, etc.)
Needed for both SAS and Hadoop
16. How can this go wrong?
Don’t try to enumerate them all; sadness will ensue
Vast majority of issues are eventually attributed to incorrect or
missing configuration
Adding debug parameters to the JVM invocation will almost
always lead you in the right direction
sun.security.krb5.debug=true
sun.security.jgss.debug=true
HADOOP_JAAS_DEBUG=true
Wireshark is invaluable
17. Common (and/or Particularly Egregious) Pitfalls
Bad principal mapping to local users
If the user principal attempting to authenticate is from a realm other than the
default realm, rules must be set up to indicate that principals from the other realm
are to be trusted as being equivalent to local accounts of the same name
Usually only matters if using cross-realm principals (trusts)
Consists of a set of regex-like strings used to parse principals into their
constituent parts
Set in both krb5.conf and Hadoop configs
krb5.conf: auth_to_local (defined per-realm)
Hadoop: hadoop.security.auth_to_local
Java is *supposed* to look in krb5.conf, but it doesn’t work
18. Common (and/or Particularly Egregious) Pitfalls
Unlimited-strength JCE policy files missing or bad
Are you sure you put them in the right JRE?
Are you sure you put them in all the JREs?
Did you download the correct version?
Stack traces (with krb5.debug/jgss.debug):
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure
unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC
SHA1-96 is not supported/enabled)]
19. Common (and/or Particularly Egregious) Pitfalls
“Clock skew too great”
Kerberos requires that all parties involved in
authentication have their clocks synchronized within 5
minutes of each other (by default)
Use chronyd/ntpd against your preferred authoritative
time source on the KDC, and have other clients get
their time from it
If AD is involved, the PDC is also an NTP server
20. Common (and/or Particularly Egregious) Pitfalls
“Mechanism level: EncryptedData is encrypted using
keytype DES3 CBC mode with SHA1-KD but decryption
key is of type NULL”
Long story short: you’re using DES; stop it!
Actually due to a bug in Java where the RFC wasn’t
interpreted correctly
https://bugs.openjdk.java.net/browse/JDK-8025124
Fixed in Java 8 b113 (and current stable)