Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Securing Hadoop in an Enterprise Context

810 vues

Publié le

Securing Hadoop in an Enterprise Context

Publié dans : Technologie
  • Soyez le premier à commenter

Securing Hadoop in an Enterprise Context

  1. 1. Hadoop Summit 2016 Securing Hadoop in an Enterprise Context Hellmar Becker, DevOps Engineer Dublin, April 14, 2016
  2. 2. Who am I? 2
  3. 3. 2 4 3 1 5 The Challenge Hadoop Usage Patterns Aspects of Security Building Blocks for a Security Architecture Questions Securing Hadoop in an Enterprise Context 3
  4. 4. The Challenge
  5. 5. Data Lake and Advanced Analytics within ING 5 External and internal reporting for own or regulatory purposes Integrate all data sources within the bank into one processing platform • Batch data streams • Live transactions • Model building for customer interaction Better understand customer needs in an increasingly digital world Data can help us offering tailored products and services Empower data scientists and analysts to get the best results with advanced analytics tools and predictive models Open source software where possible – Hadoop as a core component
  6. 6. 6 Possible consequences • Legal consequences • Loss of reputation • Financial loss Risks • Data loss • Privacy breach • System intrusion
  7. 7. Hadoop user model: • A user name is just an alphanumeric string • So is a group name • They do not have to match entities in the OS • Via REST API anybody could read or modify data So, the security design has to be actively built! And this is what we did. Hadoop "out of the box" default runs without security 7
  8. 8. Hadoop Usage Patterns
  9. 9. 1. File Storage 2. Deep Data 3. Analytical Hadoop 4. (Real Time) Hadoop Usage Patterns 9
  10. 10. Aspects of Security
  11. 11. Aspects of Security 12 Technical: Rings of Defense • Perimeter Level Security • Application Level Authentication and Authorization • OS Security • Data Protection See also: http://www.slideshare.net/vinnies12/hadoop- security-today-tomorrow-apache-knox Conceptual: Five Pillars of Security • Administration • Authentication • Authorization • Auditing • Data Protection See also: http://hortonworks.com/hdp/security/
  12. 12. Building Blocks for a Security Architecture
  13. 13. • Firewall around the entire cluster • “Stepping stone” servers • Citrix/Terminal server for interactive access • Ingestion server with defined transfer paths User model • Personal users locally defined or with corporate directory • Service/Technical users defined locally Software updates and software development • Through manually maintained mirror Used in exploratory environments (pattern 3) Building Blocks: Perimeter Security 14
  14. 14. • General goal: Zero Touch deployment • Automatic synchronization with enterprise directory • UI access is only used for incidents Administration 15 • Kerberos] • Future: Share a KDC HA cluster among Hadoop instances • Connecting to enterprise directory using trusts and synchronization (next chapter) • Keep the Kerberos principals (Hadoop users) completely separate from OS users Authentication Building Blocks: Internal Security
  15. 15. Unified rights management with Ranger • Service principals will be directly made known to Ranger; PA's rights are assigned only based on groups • Groups and users are synced with Active Directory • Ranger 0.4 can not take away privileges that were granted on a lower level • HDFS permissions and ACLs override Ranger • Make sure these access paths are locked down HDFS ACLs (No!) • No easy to use GUI • Difficult to maintain overview • Only for HDFS, does not handle other components Authorization 16 > hdfs dfs -setfacl -m group:execs:r-- /sales-data > hdfs dfs -getfacl /sales-data # file: /sales-data # owner: bruce # group: sales user::rw- group::r-- group:execs:r-- mask::r-- other::---
  16. 16. • Personal users in corporate Active Directory, NPAs in cluster KDC • One KDC pair per cluster • One way realm trust • Custom script to synchronize Ranger What We Have Done: Corporate Integration 17 Challenges • Learning to work in interdisciplinary teams • Organizational boundaries • UNIX – Windows • Infra – Platform DevOps Example: Ambari service connects to UNIX LDAP rather than AD OS security and Hadoop security are not integrated • YARN container users • Hadoop ACLs, group mapping • Multitenancy? (Not solved in this picture)
  17. 17. • Ranger's uxugsync process queries Active Directory through LDAP protocol • Ranger 0.4: Reads all users, then determines their group affiliation • More than 50,000 employees in ING Group • Need to limit the load on LDAP server! • Ranger 0.5: Group driven query - still not optimal because it uses attribute filters • Most efficient LDAP query is either by a single DN (Distinguished Name), or by container (query base DN). • But we cannot use containers because of enterprise policy • Solution: custom Python script that queries LDAP hierarchically • One “supergroup” is picked by DN • The members of the “supergroup” are all LDAP groups that have Hadoop related privileges • Query all these groups, again by DN • Examine the members of each group (personal users) • Make the user-group relationships known to Ranger via REST call Working Around Ranger’s Limitations 18 Ranger User-Group API is not documented and supported Database schema: creates duplicate records, inconsistent deletion behavior OS integration should be better
  18. 18. • IPA and sssd provide user/group mapping on Hadoop and OS level • Role based access for personal users, managed through a central tool • One user database for Hadoop services, Ambari, Ranger • YARN, HDFS user models fall nicely into place • Requires ING patches (HDP 2.4, Ranger 0.6) • RANGER-827 use getent instead of files • RANGER-842 use pam for Ranger auth • HADOOP-12751, HIVE-4413 support ‘@’ in user name • AMBARI-6432 support IPA KDC A Better Approach: Corporate Directory Integration 19 Timelines! We need this prioritized by our vendor
  19. 19. Questions
  20. 20. • Hellmar in Nîmes / With Python in Mindanao, by the author • Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0 • Data Pipeline, ING OIB Image Bank • Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me • Scared Girl by Victor Bezrukov - Port-42 is licensed under CC BY 2.0 • System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me • Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me • Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain Attributions 21

×