Contenu connexe Similaire à Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled with Enterprise Active Directory and LDAP (20) Plus de DataWorks Summit (20) Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled with Enterprise Active Directory and LDAP1. ©2015 MFMER | slide-1
Securing Enterprise Healthcare Big Data
by the Combination of Knox/F5, Ranger,
TFA and Kerberos Coupled With
Enterprise Active Directory and LDAP
Dequan Chen, Ph.D.
Mayo Clinic Big Data Technology Services Team
chen.dequan@mayo.edu; 507-208-1599
San Jose McEnery Convention Center
June 13, 2017
2. ©2015 MFMER | slide-2
Outlines
• Data Security – Critical to the Success of
Healthcare Business at Mayo Clinic
Enterprise Healthcare Big Data at Mayo Clinic
Personally Identifiable Information (PII) Data
Protected Health Information (PHI) Data
• Mayo Clinic Big Data Clusters & Evolution
• Securing Enterprise Healthcare Big Data on
Mayo Clinic Hadoop Clusters
• On-going & Future Direction
• Conclusion
3. ©2015 MFMER | slide-3
Data Security – Critical to the Success of
Healthcare Business at Mayo Clinic
4. ©2015 MFMER | slide-4
Mayo Clinic Healthcare - Integrated with
Research and Education
• World’s largest integrated not-for-profit
healthcare system – > 70 hospitals and clinics
Enterprise core value: The Needs of the Patient Come First
• Provides care for > 1m (1,317,900 in 2014)
patients from all 50 states & > 150 countries
annually
• Daily generates large amounts of EHR (EMR)
Data
Structured
Semi-Structured
Unstructured
5. ©2015 MFMER | slide-5
Mayo Clinic Rochester, Minn. recognized as the #1 in the
"Best Hospitals“ list of USA for 2014-2015, and 2016-2017
by U.S. News & World Report
6. ©2015 MFMER | slide-6
Enterprise Healthcare Big Data at Mayo Clinic
• Enterprise Healthcare Big Data on Mayo Clinic
Hadoop Clusters
HL7 messages or their parsed data or their json
derivatives – mix of semi- and un-structured EHR data
Enterprise-level clinical usage (diagnosis, treatment,
prevention, or clinical reporting)
Enterprise-level non-clinical usage (research, business
intelligence, or health information exchange)
7. ©2015 MFMER | slide-7
Enterprise Healthcare Big Data Security Needs
• With Personally Identifiable Information (PII) Data
Any data that can be used to contact, locate or identify a
specific individual, either by itself or combined with other
sources that are easily accessed
May include information that is linked to an individual
through financial, medical, educational or employment
records
Some of the data elements that might be used to identify
a certain person could consist of fingerprints, biometric
data, a name, telephone number, email address or social
security number
Federal laws required to handle PII data securely: HIPAA,
Privacy Act, GLBA, FERPA, COPPA, and FCRA
8. ©2015 MFMER | slide-8
• With Protected Health Information (PHI) Data
Any health information that is individually identifiable, and
created or received by a covered entity - provider of
health care, a health plan operator, or health clearing
house
May relate to an individual’s present, past or future health,
either in physical or mental terms, or the current condition
of a person
Either maintained or transmitted in any given form,
including speech, paper, or electronics
Exclude the education records covered by the educational
family rights and privacy act or any employment records
maintained by a covered entity
Federal law required to handle PHI data securely: HIPAA
Enterprise Healthcare Big Data Security Needs
9. ©2015 MFMER | slide-9
Mayo Clinic Big Data Clusters & Evolution
10. ©2015 MFMER | slide-10
Mayo Clinic Big Data Clusters
• Teradata Appliance with SUSE Linux
Enterprise Server
• Each Hadoop Cluster Coupled with One
ElasticSearch (ES) Cluster on Selected
Edge Nodes
• Separated HDF (Nifi) Clusters (Not to be presented)
• (Hadoop + ES) Clusters
Normal: Sandbox, Dev, Test(Int)* and Prod
Disaster Recovery (DR): Dev and Prod
11. ©2015 MFMER | slide-11
Mayo Clinic Big Data Clusters
• Data Storage on (Hadoop + ES) Clusters
Permanent
HDFS folders/files
HBase tables
ES indexes
Temporary/Permanent
Hive tables
12. ©2015 MFMER | slide-12
Mayo Clinic (Hadoop + ES) Clusters
Evolution
• Hadoop/ES Cluster HDP + ES Evolution
TDH/HDP 1.3.2 + ES (v1.0.0) (Un-Kerberized)
TDH/HDP 2.1.2 + ES (v1.3.2..v1.5.2) (Un-
Kerberized)
HDP 2.1.11 + ES (v1.5.2) (Secured: Local KDC
+ ES Shield via AD/LDAP)
HDP 2.3.4 + ES (v1.5.2..v1.7.2..v2.1.2..v2.3.2..
v2.4.1..v2.4.4) (Secured: AD/LDAP etc)
HDP 2.5.3 + ES (v2.4.4) (Secured: AD/LDAP
etc)
13. ©2015 MFMER | slide-13
Securing Enterprise Healthcare Big Data
on Mayo Clinic Hadoop Clusters
14. ©2015 MFMER | slide-14
Security Adopted on Mayo Clinic Hadoop
Clusters
15. ©2015 MFMER | slide-15
Security Adopted on Mayo Clinic Hadoop
Clusters
• Kerberos Security
o Coupled with enterprise active directory (AD) using AD KDC
• Coupled with lightweight directory access
protocol (LDAP) over SSL
o Critical HDP services + ElasticSearch service
• Two Factor Authentication (TFA) Login and Sudo
Capability Post OS-Hardening
o Only for limited number of authorized users / applications on a
local entry node(s)
o Root login disabled
• Ranger Plugins and Policies
• HDFS/Hive/HBase Data Ops - Knox Gateway/F5
o The majority of users or applications
16. ©2015 MFMER | slide-16
Kerberos with Active Directory
• Kerberized Using Enterprise (Active Directory)
AD KDC
o Provides a host of extensions and conveniences, such as
password expiration and account lockout
o Authentication and authorization
o AD user (princ) name + Password
17. ©2015 MFMER | slide-17
Kerberos with Active Directory
• Kerberized Using Enterprise (Active Directory)
AD KDC (c'td)
o AD user (princ) name + keytab
o Auth_to_local rules needed for HDFS, Oozie, Storm, Kafka,
Ranger KMS, and Atlas
18. ©2015 MFMER | slide-18
LDAP + SSL == LDAPS
• User Authentication/Authorization Also Uses
LDAP protocol for Some Hadoop Components
Services
o Ambari, Ranger / Ranger KMS, Knox, Grafana, Atlas, Hue, ES
o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs
19. ©2015 MFMER | slide-19
LDAP + SSL == LDAPS
o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs
(c’td)
20. ©2015 MFMER | slide-20
TFA & Sudo Capability Post OS-Hardening
• TFA Used for Local Users on Cluster Nodes
o Root login disabled
o Specific local nodes user name + password
o Passcode generated on-the-fly from user’s mobile device – RSA
o Sudo capability only for a limited number of users
o Post TFA login, Kerberos authentication against AD is required
21. ©2015 MFMER | slide-21
Ranger Plugins and Policies
• Ranger Policies Control the Authorization of a
Single User/ Group Users Authorized to Operate
(CRUD etc) on the Specified Data or Services
o Data in HDFS files/folders, Hive databases/tables, HBase
namespaces/tables, (Solr collections/documents), Atlas metadata
o Services of
YARN,
Storm,
Kafka,
Knox
22. ©2015 MFMER | slide-22
Ranger Plugins and Policies
o Example list of HDFS policies:
23. ©2015 MFMER | slide-23
Ranger Plugins and Policies
o Example of a HDFS policy:
24. ©2015 MFMER | slide-24
Ranger Plugins and Policies
• Ranger Also Performs Data Or Service Auditing
o Example of hdfs user accessing hive service:
25. ©2015 MFMER | slide-25
Knox Gateway - HDFS/Hive/HBase Data Ops
• Required for the Majority of Mayo Clinic Big Data
Clients (Users/Applications)
• Non-Secure Hive Shell Has Been Disabled
o Hive CLI ops are forced to use beeline
• No Keytabs Issued for HDFS/Hive/HBase Data
Options by a Client Application Outside Mayo
Clinic Hadoop Clusters
o Regular HDFS remote Java client using keytabs – Not allowed
o Hive JDBC remote using keytabs – Not allowed
o HBase remote Java client using keytabs – Not allowed
26. ©2015 MFMER | slide-26
F5/Knox - HDFS/Hive/HBase Data Ops
• WebHDFS, WebHCat/Knox-Hive JDBC and
WebHBase for Data Ops of HDFS, Hive and
HBase via Knox Gateway Respectively
o HA (high availability) are achieved using 2 or more WebHDFS,
WebHcat and WebHBase (Stargate HBase) services on the
master nodes
o Knox-Hive JDBC needs using user’s AD user name & password
• F5 Balancer Over Two or More Knox Gateway
Services for Each Hadoop Cluster Are Used to
Achieve Knox Gateway Services HA (+ More
Protection)
27. ©2015 MFMER | slide-27
F5/Knox - HDFS/Hive/HBase Data Ops
• HA Example – WebHBase:
28. ©2015 MFMER | slide-28
Data Ops via F5 Balancer / Knox Gateway
• Example – WebHDFS Data via Web Browser or
Curl Cmd
oWebUI & results
https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/
zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
or
https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
29. ©2015 MFMER | slide-29
Data Ops via F5 Balancer / Knox Gateway
oCurl Cmd & results
curl -i -k -u xxxxxxxx -X GET -L
https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/
zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
curl -i -k -u xxxxxxxx -X GET -L
https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
or
https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
30. ©2015 MFMER | slide-30
How WebHDFS Data Ops via Knox Gateway Work?
oComplex authentication and authorization process
Ranger HDFS Plugin allows access to the
HDFS data
Ranger Knox Plugin allows access to the
HDFS interface through Knox
The access path is likely to be:
A User connects/authenticates to Knox via HTTPS (SSL
protects credentials)
Knox checks the user’s credentials via LDAPS (SSL protects
credentials)
Ranger Knox plugin allows access to WebHDFS (Ranger
Knox service level authorization)
31. ©2015 MFMER | slide-31
How WebHDFS Data Ops via Knox Gateway Work?
Knox authenticates to AD (Kerberos protects Knox credentials)
AD grants Knox a ticket granting ticket (TGT)
Knox requests WebHDFS service ticket from AD
AD grants Knox a service ticket (ST) for WebHDFS
Knox passes the user as a proxyuser to WebHDFS
The user tries to access the HDFS file XYZ
Ranger HDFS Plugin checks if policy exists for the user and
the HDFS file XYZ (Ranger HDFS authorization); HDFS
checks native authorization for the user and the HDFS file
XYZ (HDFS authorization) Issue authorization or denial
Only when authorized, data in the HDFS file XYZ is retrieved
and returned to the client application (Web Browser or CLI)
32. ©2015 MFMER | slide-32
Access Secured ElasticSearch (ES) Data
• Using Rest API via WebUI or Curl Cmd
https://elasticsearchuixxxx:ddd6/estest/
greeting/_search?pretty=true,q=title:Hello
curl -v --user xxxxxxx -XGET
https://elasticsearchuixxxx:ddd6/estest/
greeting/_search?pretty=true,q=title:Hello
33. ©2015 MFMER | slide-33
Access Secured ElasticSearch (ES) Data
• Using Java API via Transport-TCP
35. ©2015 MFMER | slide-35
On-going & Future Direction
• Big Data Network Segmentation - Whitelisting
oList of Mayo Clinic healthcare business-
allowed URLs or IP addresses
Hadoop cluster-specific
User/application client computer-specific
oImplemented by Mayo Network team and
Mayo Clinic BDTS team
oPermit data/service operations by any user /
client via the allowed list of IP connections
while block all the others
36. ©2015 MFMER | slide-36
On-going & Future Direction
• Big Data At-Rest Encryption Enhancement
oDrive (disk) encryption
oClient-managed encrypted HDFS, Hive,
HBase or ES data
oRanger KMS-managed encryption keys and
data de-encryption for HDFS, Hive and
HBase data in the encryption zones
For HDP v2.3.4 and earlier versions, HDFS data in the
encryption zones can only be retrieved by CLI but not via
Knox Gateway
38. ©2015 MFMER | slide-38
Conclusion
• Enterprise Healthcare Big Data on Mayo Clinic Hadoop
Clusters Have Been Successfully Protected by
o Enterprise Kerberos
o Active Directory
o LDAP Over SSL
o OS Hardening/TFA
o Ranger
o Knox Gateway/F5 Balancer
• Underway at Mayo Clinic - Enhancement of Enterprise
Healthcare Big Data Security by
o Network Segmentation
o Data-At-Rest Encryption via Ranger KMS
• Achieved Successful Data Ops on Enterprise-Secured
Healthcare Big Data
39. ©2015 MFMER | slide-39
References
• Mayo Clinic: http://www.mayoclinic.org/
• PII: http://privacyoffice.med.miami.edu/faq/privacy-
faqs/what-is-personally-identifiable-information-pii
• PHI: https://www.hipaa.com/hipaa-protected-health-
information-what-does-phi-include/
• Hadoop Stack: http://hortonworks.com
• ElasticSearch: https://www.elastic.co/
• CS Paper of Top IEEE Journal:
Chen et al. Real-Time or Near Real-Time Persisting Daily Healthcare
Data Into HDFS and ElasticSearch Index Inside a Big Data Platform.
IEEE Transactions on Industrial Informatics, vol. 13, no.2, pp 595-
606, April 2017
40. ©2015 MFMER | slide-40
Questions & Discussion
chen.dequan@mayo.edu; 507-208-1599
Personal Email: dequanchen2007@gmail.com
LinkedIn: https://www.linkedin.com/in/dequan-chen-5b0a37bb/