SlideShare a Scribd company logo
1 of 22
SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity
Evan Kinney, Security Software Developer
SAS
$whoami
Security Software Developer
at SAS
Platform authn/authz,
general security
Kerberos; all the time
always.
Open source
developer/contributor/wrangle
r
Dinosaur enthusiast
Twitter: @evankinney
GitHub: 3van
Freenode: evanosaurus
Why are we here?
 Hadoop isn’t secure by default
 Fixing this is non-trivial
 Mistakes made are hard to diagnose
 No consistent story for authentication (yet)
 Kerberos is hard
Let’s talk about the elephant in the room.
 (this is the first time that pun has ever been used.)
 When created, Hadoop didn’t need security
 Components were almost exclusively single-tenant and isolated
 What problems do we have?
 HDFS and MapReduce
 Blindly trust that a user is who they say they are
 Allow for arbitrary Java code execution as the JobTracker service account
 Trivial to circumvent permissions restrictions
 Rogue services
 Services don’t authenticate each other
 DataNodes
 Know the block ID? Awesome, here are your dataz!
Okay, great—now how do we fix it?
 Kerberos! (surprise)
 Doesn’t require significantly changing the architecture
 Uses symmetric encryption
 Allows services to authenticate each other
 Allows services to authenticate users
 Allows users to authenticate services
Let’s talk about the three-headed dog in the room.
 Kerberos is a standard protocol [RFC 4120] that allows for
mutual authentication of entities via the use of a trusted arbiter
 Developed at MIT originally for AFS
 V4 - 1988[ish]
 V5 - 1993; 2005
 Used in a lot more places than most people think
 Active Directory == LDAP + Kerberos
 Once it’s working, is usually fairly transparent to users
Some Kerberos Terminology
 principal: an entity that can authenticate or be authenticated to
 realm: an administrative partition that contains one or more principals; always given in
uppercase
 ticket: ASN.1 structure containing information about a request as well as authenticators to
verify the info
 ticket granting ticket (TGT): the result of initial authentication with a KDC; used to assert
identity in further requests
 service ticket: issued to a principal for use in asserting identity to Kerberized services
 credentials cache (CC): contains one or more tickets for one principal
 keytab: holds any number of pre-salted/hashed long-term keys (passwords); generally used for
headless/automated services
 key distribution center (KDC): the trusted arbiter; stores keys for one or more realms and
answers authentication requests from principals
 ticket granting service (TGS): accepts TGT session keys, issues service tickets
 authentication service (AS): accepts salted/hashed password, issues TGT
How does it work?
Hadoop and Kerberos
 Hadoop relies on user accounts to enforce ACLs, et al. (if not
configured to look things up in LDAP)
 Generally speaking, a Kerberos principal is not necessarily associated
with a POSIX user account (though, in practice, they usually are)
 Once authenticated via Kerberos, a user is issued a delegation token
(specific to Hadoop) for use in further requests
 Scheduling is… hard
 Default ticket lifetime is 10 hours, can be renewed for 7 days
 Most distributions assume you’re running an isolated realm with your
own KDC
What about Active Directory?
 Great for users; not so great for admins
 Two different deployment architectures:
 only Active Directory
 both user and Hadoop service principals exist as AD objects
 Cross-realm principals (trusts)
 Hadoop service principals exist in a pure Kerberos realm,
users exist in AD
 Both have fun issues
 Where’s the user data coming from?
Only Active Directory
 Tons and tons of objects to create for Hadoop services
 Unless using a vendor-supplied management
mechanism, setup is a very manual process
 Usually requires IT involvement any time changes are
made
 AD’s idea of Kerberos != everyone else’s
Cross-Realm Principal Architecture
 Much more complex; harder to debug
 Unless you configure the KDC with replication (or a backend
database that replicates itself), it becomes a massive SPoF
 You have to administer the KDC
 Getting AD to use the correct encryption types is somewhat
challenging
 Windows (i.e. purely SSPI-based) clients tend to not work
consistently (if at all)
Speaking of encryption types…
 Three are the most used today:
 aes256-cts-hmac-sha1-96
 aes128-cts-hmac-sha1-96
 arcfour-hmac-md5
 Don’t use DES
 or 3DES, preferably… but especially not DES
 Crypto export/import restrictions cause issues with Java
 The unlimited-strength JCE policy files must be present in the JRE to allow the use
of aes256-cts
 AD won’t do AES if the domain functional level (DFL) is lower than Server 2008
 Almost all other libraries default to using the cipher suites in the order above, unless
configured otherwise
Considerations for Hadoop
 Kerberos doesn’t encrypt your data or traffic
 Communication between all DataNodes and NameNode(s)
should be isolated and/or protected (via hadoop.rpc.protection)
 If users have access to the files themselves, ACLs are basically
useless
 If they have root/admin access to the servers…
 …none of this matters anyway
 Hadoop services determine what their hostname (and, thus, service
principal name) is via reverse DNS (or via fs.default.name, if set)
 Also, Kerberos itself is very, very dependent upon properly
administered DNS records and local client configuration
Considerations for SAS
 SAS/ACCESS® Interface to Hadoop™
 Uses Java, so subject to aforementioned issues
 SAS® Enterprise Guide®, Web-based products (e.g. SAS® Visual
Analytics), et al.
 Need to configure sasauth for PAM authentication
 Need to configure PAM to obtain Kerberos credentials on login (via
SSSD, pam_krb5, QAS, etc.)
 If AD: need to configure nsswitch to obtain user info from AD (via
SSSD, nss_ldap, etc.)
 Needed for both SAS and Hadoop
How can this go wrong?
 Don’t try to enumerate them all; sadness will ensue
 Vast majority of issues are eventually attributed to incorrect or
missing configuration
 Adding debug parameters to the JVM invocation will almost
always lead you in the right direction
 sun.security.krb5.debug=true
 sun.security.jgss.debug=true
 HADOOP_JAAS_DEBUG=true
 Wireshark is invaluable
Common (and/or Particularly Egregious) Pitfalls
 Bad principal mapping to local users
 If the user principal attempting to authenticate is from a realm other than the
default realm, rules must be set up to indicate that principals from the other realm
are to be trusted as being equivalent to local accounts of the same name
 Usually only matters if using cross-realm principals (trusts)
 Consists of a set of regex-like strings used to parse principals into their
constituent parts
 Set in both krb5.conf and Hadoop configs
 krb5.conf: auth_to_local (defined per-realm)
 Hadoop: hadoop.security.auth_to_local
 Java is *supposed* to look in krb5.conf, but it doesn’t work
Common (and/or Particularly Egregious) Pitfalls
 Unlimited-strength JCE policy files missing or bad
 Are you sure you put them in the right JRE?
 Are you sure you put them in all the JREs?
 Did you download the correct version?
 Stack traces (with krb5.debug/jgss.debug):
 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure
unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC
SHA1-96 is not supported/enabled)]
Common (and/or Particularly Egregious) Pitfalls
 “Clock skew too great”
 Kerberos requires that all parties involved in
authentication have their clocks synchronized within 5
minutes of each other (by default)
 Use chronyd/ntpd against your preferred authoritative
time source on the KDC, and have other clients get
their time from it
 If AD is involved, the PDC is also an NTP server
Common (and/or Particularly Egregious) Pitfalls
 “Mechanism level: EncryptedData is encrypted using
keytype DES3 CBC mode with SHA1-KD but decryption
key is of type NULL”
 Long story short: you’re using DES; stop it!
 Actually due to a bug in Java where the RFC wasn’t
interpreted correctly
 https://bugs.openjdk.java.net/browse/JDK-8025124
 Fixed in Java 8 b113 (and current stable)
Questions?
SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity

More Related Content

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity

  • 1. SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity Evan Kinney, Security Software Developer SAS
  • 2. $whoami Security Software Developer at SAS Platform authn/authz, general security Kerberos; all the time always. Open source developer/contributor/wrangle r Dinosaur enthusiast Twitter: @evankinney GitHub: 3van Freenode: evanosaurus
  • 3. Why are we here?  Hadoop isn’t secure by default  Fixing this is non-trivial  Mistakes made are hard to diagnose  No consistent story for authentication (yet)  Kerberos is hard
  • 4. Let’s talk about the elephant in the room.  (this is the first time that pun has ever been used.)  When created, Hadoop didn’t need security  Components were almost exclusively single-tenant and isolated  What problems do we have?  HDFS and MapReduce  Blindly trust that a user is who they say they are  Allow for arbitrary Java code execution as the JobTracker service account  Trivial to circumvent permissions restrictions  Rogue services  Services don’t authenticate each other  DataNodes  Know the block ID? Awesome, here are your dataz!
  • 5. Okay, great—now how do we fix it?  Kerberos! (surprise)  Doesn’t require significantly changing the architecture  Uses symmetric encryption  Allows services to authenticate each other  Allows services to authenticate users  Allows users to authenticate services
  • 6. Let’s talk about the three-headed dog in the room.  Kerberos is a standard protocol [RFC 4120] that allows for mutual authentication of entities via the use of a trusted arbiter  Developed at MIT originally for AFS  V4 - 1988[ish]  V5 - 1993; 2005  Used in a lot more places than most people think  Active Directory == LDAP + Kerberos  Once it’s working, is usually fairly transparent to users
  • 7. Some Kerberos Terminology  principal: an entity that can authenticate or be authenticated to  realm: an administrative partition that contains one or more principals; always given in uppercase  ticket: ASN.1 structure containing information about a request as well as authenticators to verify the info  ticket granting ticket (TGT): the result of initial authentication with a KDC; used to assert identity in further requests  service ticket: issued to a principal for use in asserting identity to Kerberized services  credentials cache (CC): contains one or more tickets for one principal  keytab: holds any number of pre-salted/hashed long-term keys (passwords); generally used for headless/automated services  key distribution center (KDC): the trusted arbiter; stores keys for one or more realms and answers authentication requests from principals  ticket granting service (TGS): accepts TGT session keys, issues service tickets  authentication service (AS): accepts salted/hashed password, issues TGT
  • 8. How does it work?
  • 9. Hadoop and Kerberos  Hadoop relies on user accounts to enforce ACLs, et al. (if not configured to look things up in LDAP)  Generally speaking, a Kerberos principal is not necessarily associated with a POSIX user account (though, in practice, they usually are)  Once authenticated via Kerberos, a user is issued a delegation token (specific to Hadoop) for use in further requests  Scheduling is… hard  Default ticket lifetime is 10 hours, can be renewed for 7 days  Most distributions assume you’re running an isolated realm with your own KDC
  • 10. What about Active Directory?  Great for users; not so great for admins  Two different deployment architectures:  only Active Directory  both user and Hadoop service principals exist as AD objects  Cross-realm principals (trusts)  Hadoop service principals exist in a pure Kerberos realm, users exist in AD  Both have fun issues  Where’s the user data coming from?
  • 11. Only Active Directory  Tons and tons of objects to create for Hadoop services  Unless using a vendor-supplied management mechanism, setup is a very manual process  Usually requires IT involvement any time changes are made  AD’s idea of Kerberos != everyone else’s
  • 12. Cross-Realm Principal Architecture  Much more complex; harder to debug  Unless you configure the KDC with replication (or a backend database that replicates itself), it becomes a massive SPoF  You have to administer the KDC  Getting AD to use the correct encryption types is somewhat challenging  Windows (i.e. purely SSPI-based) clients tend to not work consistently (if at all)
  • 13. Speaking of encryption types…  Three are the most used today:  aes256-cts-hmac-sha1-96  aes128-cts-hmac-sha1-96  arcfour-hmac-md5  Don’t use DES  or 3DES, preferably… but especially not DES  Crypto export/import restrictions cause issues with Java  The unlimited-strength JCE policy files must be present in the JRE to allow the use of aes256-cts  AD won’t do AES if the domain functional level (DFL) is lower than Server 2008  Almost all other libraries default to using the cipher suites in the order above, unless configured otherwise
  • 14. Considerations for Hadoop  Kerberos doesn’t encrypt your data or traffic  Communication between all DataNodes and NameNode(s) should be isolated and/or protected (via hadoop.rpc.protection)  If users have access to the files themselves, ACLs are basically useless  If they have root/admin access to the servers…  …none of this matters anyway  Hadoop services determine what their hostname (and, thus, service principal name) is via reverse DNS (or via fs.default.name, if set)  Also, Kerberos itself is very, very dependent upon properly administered DNS records and local client configuration
  • 15. Considerations for SAS  SAS/ACCESS® Interface to Hadoop™  Uses Java, so subject to aforementioned issues  SAS® Enterprise Guide®, Web-based products (e.g. SAS® Visual Analytics), et al.  Need to configure sasauth for PAM authentication  Need to configure PAM to obtain Kerberos credentials on login (via SSSD, pam_krb5, QAS, etc.)  If AD: need to configure nsswitch to obtain user info from AD (via SSSD, nss_ldap, etc.)  Needed for both SAS and Hadoop
  • 16. How can this go wrong?  Don’t try to enumerate them all; sadness will ensue  Vast majority of issues are eventually attributed to incorrect or missing configuration  Adding debug parameters to the JVM invocation will almost always lead you in the right direction  sun.security.krb5.debug=true  sun.security.jgss.debug=true  HADOOP_JAAS_DEBUG=true  Wireshark is invaluable
  • 17. Common (and/or Particularly Egregious) Pitfalls  Bad principal mapping to local users  If the user principal attempting to authenticate is from a realm other than the default realm, rules must be set up to indicate that principals from the other realm are to be trusted as being equivalent to local accounts of the same name  Usually only matters if using cross-realm principals (trusts)  Consists of a set of regex-like strings used to parse principals into their constituent parts  Set in both krb5.conf and Hadoop configs  krb5.conf: auth_to_local (defined per-realm)  Hadoop: hadoop.security.auth_to_local  Java is *supposed* to look in krb5.conf, but it doesn’t work
  • 18. Common (and/or Particularly Egregious) Pitfalls  Unlimited-strength JCE policy files missing or bad  Are you sure you put them in the right JRE?  Are you sure you put them in all the JREs?  Did you download the correct version?  Stack traces (with krb5.debug/jgss.debug):  javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]
  • 19. Common (and/or Particularly Egregious) Pitfalls  “Clock skew too great”  Kerberos requires that all parties involved in authentication have their clocks synchronized within 5 minutes of each other (by default)  Use chronyd/ntpd against your preferred authoritative time source on the KDC, and have other clients get their time from it  If AD is involved, the PDC is also an NTP server
  • 20. Common (and/or Particularly Egregious) Pitfalls  “Mechanism level: EncryptedData is encrypted using keytype DES3 CBC mode with SHA1-KD but decryption key is of type NULL”  Long story short: you’re using DES; stop it!  Actually due to a bug in Java where the RFC wasn’t interpreted correctly  https://bugs.openjdk.java.net/browse/JDK-8025124  Fixed in Java 8 b113 (and current stable)