SlideShare une entreprise Scribd logo
1  sur  40
©2015 MFMER | slide-1
Securing Enterprise Healthcare Big Data
by the Combination of Knox/F5, Ranger,
TFA and Kerberos Coupled With
Enterprise Active Directory and LDAP
Dequan Chen, Ph.D.
Mayo Clinic Big Data Technology Services Team
chen.dequan@mayo.edu; 507-208-1599
San Jose McEnery Convention Center
June 13, 2017
©2015 MFMER | slide-2
Outlines
• Data Security – Critical to the Success of
Healthcare Business at Mayo Clinic
 Enterprise Healthcare Big Data at Mayo Clinic
 Personally Identifiable Information (PII) Data
 Protected Health Information (PHI) Data
• Mayo Clinic Big Data Clusters & Evolution
• Securing Enterprise Healthcare Big Data on
Mayo Clinic Hadoop Clusters
• On-going & Future Direction
• Conclusion
©2015 MFMER | slide-3
Data Security – Critical to the Success of
Healthcare Business at Mayo Clinic
©2015 MFMER | slide-4
Mayo Clinic Healthcare - Integrated with
Research and Education
• World’s largest integrated not-for-profit
healthcare system – > 70 hospitals and clinics
 Enterprise core value: The Needs of the Patient Come First
• Provides care for > 1m (1,317,900 in 2014)
patients from all 50 states & > 150 countries
annually
• Daily generates large amounts of EHR (EMR)
Data
 Structured
 Semi-Structured
 Unstructured
©2015 MFMER | slide-5
Mayo Clinic Rochester, Minn. recognized as the #1 in the
"Best Hospitals“ list of USA for 2014-2015, and 2016-2017
by U.S. News & World Report
©2015 MFMER | slide-6
Enterprise Healthcare Big Data at Mayo Clinic
• Enterprise Healthcare Big Data on Mayo Clinic
Hadoop Clusters
 HL7 messages or their parsed data or their json
derivatives – mix of semi- and un-structured EHR data
 Enterprise-level clinical usage (diagnosis, treatment,
prevention, or clinical reporting)
 Enterprise-level non-clinical usage (research, business
intelligence, or health information exchange)
©2015 MFMER | slide-7
Enterprise Healthcare Big Data Security Needs
• With Personally Identifiable Information (PII) Data
 Any data that can be used to contact, locate or identify a
specific individual, either by itself or combined with other
sources that are easily accessed
 May include information that is linked to an individual
through financial, medical, educational or employment
records
 Some of the data elements that might be used to identify
a certain person could consist of fingerprints, biometric
data, a name, telephone number, email address or social
security number
 Federal laws required to handle PII data securely: HIPAA,
Privacy Act, GLBA, FERPA, COPPA, and FCRA
©2015 MFMER | slide-8
• With Protected Health Information (PHI) Data
 Any health information that is individually identifiable, and
created or received by a covered entity - provider of
health care, a health plan operator, or health clearing
house
 May relate to an individual’s present, past or future health,
either in physical or mental terms, or the current condition
of a person
 Either maintained or transmitted in any given form,
including speech, paper, or electronics
 Exclude the education records covered by the educational
family rights and privacy act or any employment records
maintained by a covered entity
 Federal law required to handle PHI data securely: HIPAA
Enterprise Healthcare Big Data Security Needs
©2015 MFMER | slide-9
Mayo Clinic Big Data Clusters & Evolution
©2015 MFMER | slide-10
Mayo Clinic Big Data Clusters
• Teradata Appliance with SUSE Linux
Enterprise Server
• Each Hadoop Cluster Coupled with One
ElasticSearch (ES) Cluster on Selected
Edge Nodes
• Separated HDF (Nifi) Clusters (Not to be presented)
• (Hadoop + ES) Clusters
Normal: Sandbox, Dev, Test(Int)* and Prod
Disaster Recovery (DR): Dev and Prod
©2015 MFMER | slide-11
Mayo Clinic Big Data Clusters
• Data Storage on (Hadoop + ES) Clusters
Permanent
 HDFS folders/files
 HBase tables
 ES indexes
Temporary/Permanent
 Hive tables
©2015 MFMER | slide-12
Mayo Clinic (Hadoop + ES) Clusters
Evolution
• Hadoop/ES Cluster HDP + ES Evolution
 TDH/HDP 1.3.2 + ES (v1.0.0) (Un-Kerberized)
 TDH/HDP 2.1.2 + ES (v1.3.2..v1.5.2) (Un-
Kerberized)
 HDP 2.1.11 + ES (v1.5.2) (Secured: Local KDC
+ ES Shield via AD/LDAP)
 HDP 2.3.4 + ES (v1.5.2..v1.7.2..v2.1.2..v2.3.2..
v2.4.1..v2.4.4) (Secured: AD/LDAP etc)
 HDP 2.5.3 + ES (v2.4.4) (Secured: AD/LDAP
etc)
©2015 MFMER | slide-13
Securing Enterprise Healthcare Big Data
on Mayo Clinic Hadoop Clusters
©2015 MFMER | slide-14
Security Adopted on Mayo Clinic Hadoop
Clusters
©2015 MFMER | slide-15
Security Adopted on Mayo Clinic Hadoop
Clusters
• Kerberos Security
o Coupled with enterprise active directory (AD) using AD KDC
• Coupled with lightweight directory access
protocol (LDAP) over SSL
o Critical HDP services + ElasticSearch service
• Two Factor Authentication (TFA) Login and Sudo
Capability Post OS-Hardening
o Only for limited number of authorized users / applications on a
local entry node(s)
o Root login disabled
• Ranger Plugins and Policies
• HDFS/Hive/HBase Data Ops - Knox Gateway/F5
o The majority of users or applications
©2015 MFMER | slide-16
Kerberos with Active Directory
• Kerberized Using Enterprise (Active Directory)
AD KDC
o Provides a host of extensions and conveniences, such as
password expiration and account lockout
o Authentication and authorization
o AD user (princ) name + Password
©2015 MFMER | slide-17
Kerberos with Active Directory
• Kerberized Using Enterprise (Active Directory)
AD KDC (c'td)
o AD user (princ) name + keytab
o Auth_to_local rules needed for HDFS, Oozie, Storm, Kafka,
Ranger KMS, and Atlas
©2015 MFMER | slide-18
LDAP + SSL == LDAPS
• User Authentication/Authorization Also Uses
LDAP protocol for Some Hadoop Components
Services
o Ambari, Ranger / Ranger KMS, Knox, Grafana, Atlas, Hue, ES
o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs
©2015 MFMER | slide-19
LDAP + SSL == LDAPS
o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs
(c’td)
©2015 MFMER | slide-20
TFA & Sudo Capability Post OS-Hardening
• TFA Used for Local Users on Cluster Nodes
o Root login disabled
o Specific local nodes user name + password
o Passcode generated on-the-fly from user’s mobile device – RSA
o Sudo capability only for a limited number of users
o Post TFA login, Kerberos authentication against AD is required
©2015 MFMER | slide-21
Ranger Plugins and Policies
• Ranger Policies Control the Authorization of a
Single User/ Group Users Authorized to Operate
(CRUD etc) on the Specified Data or Services
o Data in HDFS files/folders, Hive databases/tables, HBase
namespaces/tables, (Solr collections/documents), Atlas metadata
o Services of
YARN,
Storm,
Kafka,
Knox
©2015 MFMER | slide-22
Ranger Plugins and Policies
o Example list of HDFS policies:
©2015 MFMER | slide-23
Ranger Plugins and Policies
o Example of a HDFS policy:
©2015 MFMER | slide-24
Ranger Plugins and Policies
• Ranger Also Performs Data Or Service Auditing
o Example of hdfs user accessing hive service:
©2015 MFMER | slide-25
Knox Gateway - HDFS/Hive/HBase Data Ops
• Required for the Majority of Mayo Clinic Big Data
Clients (Users/Applications)
• Non-Secure Hive Shell Has Been Disabled
o Hive CLI ops are forced to use beeline
• No Keytabs Issued for HDFS/Hive/HBase Data
Options by a Client Application Outside Mayo
Clinic Hadoop Clusters
o Regular HDFS remote Java client using keytabs – Not allowed
o Hive JDBC remote using keytabs – Not allowed
o HBase remote Java client using keytabs – Not allowed
©2015 MFMER | slide-26
F5/Knox - HDFS/Hive/HBase Data Ops
• WebHDFS, WebHCat/Knox-Hive JDBC and
WebHBase for Data Ops of HDFS, Hive and
HBase via Knox Gateway Respectively
o HA (high availability) are achieved using 2 or more WebHDFS,
WebHcat and WebHBase (Stargate HBase) services on the
master nodes
o Knox-Hive JDBC needs using user’s AD user name & password
• F5 Balancer Over Two or More Knox Gateway
Services for Each Hadoop Cluster Are Used to
Achieve Knox Gateway Services HA (+ More
Protection)
©2015 MFMER | slide-27
F5/Knox - HDFS/Hive/HBase Data Ops
• HA Example – WebHBase:
©2015 MFMER | slide-28
Data Ops via F5 Balancer / Knox Gateway
• Example – WebHDFS Data via Web Browser or
Curl Cmd
oWebUI & results
https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/
zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
or
https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
©2015 MFMER | slide-29
Data Ops via F5 Balancer / Knox Gateway
oCurl Cmd & results
curl -i -k -u xxxxxxxx -X GET -L 
https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/
zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
curl -i -k -u xxxxxxxx -X GET -L 
https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
or
https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u
ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN
90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0,
140000.0,7,3,113642.85714285714,….
©2015 MFMER | slide-30
How WebHDFS Data Ops via Knox Gateway Work?
oComplex authentication and authorization process
 Ranger HDFS Plugin allows access to the
HDFS data
 Ranger Knox Plugin allows access to the
HDFS interface through Knox
 The access path is likely to be:
A User connects/authenticates to Knox via HTTPS (SSL
protects credentials)
Knox checks the user’s credentials via LDAPS (SSL protects
credentials)
Ranger Knox plugin allows access to WebHDFS (Ranger
Knox service level authorization)
©2015 MFMER | slide-31
How WebHDFS Data Ops via Knox Gateway Work?
Knox authenticates to AD (Kerberos protects Knox credentials)
AD grants Knox a ticket granting ticket (TGT)
Knox requests WebHDFS service ticket from AD
AD grants Knox a service ticket (ST) for WebHDFS
Knox passes the user as a proxyuser to WebHDFS
The user tries to access the HDFS file XYZ
 Ranger HDFS Plugin checks if policy exists for the user and
the HDFS file XYZ (Ranger HDFS authorization); HDFS
checks native authorization for the user and the HDFS file
XYZ (HDFS authorization)  Issue authorization or denial
 Only when authorized, data in the HDFS file XYZ is retrieved
and returned to the client application (Web Browser or CLI)
©2015 MFMER | slide-32
Access Secured ElasticSearch (ES) Data
• Using Rest API via WebUI or Curl Cmd
https://elasticsearchuixxxx:ddd6/estest/
greeting/_search?pretty=true,q=title:Hello
curl -v --user xxxxxxx -XGET
https://elasticsearchuixxxx:ddd6/estest/
greeting/_search?pretty=true,q=title:Hello
©2015 MFMER | slide-33
Access Secured ElasticSearch (ES) Data
• Using Java API via Transport-TCP
©2015 MFMER | slide-34
On-going & Future Direction
©2015 MFMER | slide-35
On-going & Future Direction
• Big Data Network Segmentation - Whitelisting
oList of Mayo Clinic healthcare business-
allowed URLs or IP addresses
 Hadoop cluster-specific
 User/application client computer-specific
oImplemented by Mayo Network team and
Mayo Clinic BDTS team
oPermit data/service operations by any user /
client via the allowed list of IP connections
while block all the others
©2015 MFMER | slide-36
On-going & Future Direction
• Big Data At-Rest Encryption Enhancement
oDrive (disk) encryption
oClient-managed encrypted HDFS, Hive,
HBase or ES data
oRanger KMS-managed encryption keys and
data de-encryption for HDFS, Hive and
HBase data in the encryption zones
 For HDP v2.3.4 and earlier versions, HDFS data in the
encryption zones can only be retrieved by CLI but not via
Knox Gateway
©2015 MFMER | slide-37
Conclusion
©2015 MFMER | slide-38
Conclusion
• Enterprise Healthcare Big Data on Mayo Clinic Hadoop
Clusters Have Been Successfully Protected by
o Enterprise Kerberos
o Active Directory
o LDAP Over SSL
o OS Hardening/TFA
o Ranger
o Knox Gateway/F5 Balancer
• Underway at Mayo Clinic - Enhancement of Enterprise
Healthcare Big Data Security by
o Network Segmentation
o Data-At-Rest Encryption via Ranger KMS
• Achieved Successful Data Ops on Enterprise-Secured
Healthcare Big Data
©2015 MFMER | slide-39
References
• Mayo Clinic: http://www.mayoclinic.org/
• PII: http://privacyoffice.med.miami.edu/faq/privacy-
faqs/what-is-personally-identifiable-information-pii
• PHI: https://www.hipaa.com/hipaa-protected-health-
information-what-does-phi-include/
• Hadoop Stack: http://hortonworks.com
• ElasticSearch: https://www.elastic.co/
• CS Paper of Top IEEE Journal:
Chen et al. Real-Time or Near Real-Time Persisting Daily Healthcare
Data Into HDFS and ElasticSearch Index Inside a Big Data Platform.
IEEE Transactions on Industrial Informatics, vol. 13, no.2, pp 595-
606, April 2017
©2015 MFMER | slide-40
Questions & Discussion
chen.dequan@mayo.edu; 507-208-1599
Personal Email: dequanchen2007@gmail.com
LinkedIn: https://www.linkedin.com/in/dequan-chen-5b0a37bb/

Contenu connexe

Tendances

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 

Tendances (20)

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
History of Privacera
History of PrivaceraHistory of Privacera
History of Privacera
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 

Similaire à Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled with Enterprise Active Directory and LDAP

Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 

Similaire à Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled with Enterprise Active Directory and LDAP (20)

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoop
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Achieving HIPAA Compliance with Postgres Plus Cloud Database
Achieving HIPAA Compliance with Postgres Plus Cloud DatabaseAchieving HIPAA Compliance with Postgres Plus Cloud Database
Achieving HIPAA Compliance with Postgres Plus Cloud Database
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hortonworks help customers building a HIPAA compliant Data Lake
Hortonworks help customers building a HIPAA compliant Data Lake Hortonworks help customers building a HIPAA compliant Data Lake
Hortonworks help customers building a HIPAA compliant Data Lake
 
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
 

Plus de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled with Enterprise Active Directory and LDAP

  • 1. ©2015 MFMER | slide-1 Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger, TFA and Kerberos Coupled With Enterprise Active Directory and LDAP Dequan Chen, Ph.D. Mayo Clinic Big Data Technology Services Team chen.dequan@mayo.edu; 507-208-1599 San Jose McEnery Convention Center June 13, 2017
  • 2. ©2015 MFMER | slide-2 Outlines • Data Security – Critical to the Success of Healthcare Business at Mayo Clinic  Enterprise Healthcare Big Data at Mayo Clinic  Personally Identifiable Information (PII) Data  Protected Health Information (PHI) Data • Mayo Clinic Big Data Clusters & Evolution • Securing Enterprise Healthcare Big Data on Mayo Clinic Hadoop Clusters • On-going & Future Direction • Conclusion
  • 3. ©2015 MFMER | slide-3 Data Security – Critical to the Success of Healthcare Business at Mayo Clinic
  • 4. ©2015 MFMER | slide-4 Mayo Clinic Healthcare - Integrated with Research and Education • World’s largest integrated not-for-profit healthcare system – > 70 hospitals and clinics  Enterprise core value: The Needs of the Patient Come First • Provides care for > 1m (1,317,900 in 2014) patients from all 50 states & > 150 countries annually • Daily generates large amounts of EHR (EMR) Data  Structured  Semi-Structured  Unstructured
  • 5. ©2015 MFMER | slide-5 Mayo Clinic Rochester, Minn. recognized as the #1 in the "Best Hospitals“ list of USA for 2014-2015, and 2016-2017 by U.S. News & World Report
  • 6. ©2015 MFMER | slide-6 Enterprise Healthcare Big Data at Mayo Clinic • Enterprise Healthcare Big Data on Mayo Clinic Hadoop Clusters  HL7 messages or their parsed data or their json derivatives – mix of semi- and un-structured EHR data  Enterprise-level clinical usage (diagnosis, treatment, prevention, or clinical reporting)  Enterprise-level non-clinical usage (research, business intelligence, or health information exchange)
  • 7. ©2015 MFMER | slide-7 Enterprise Healthcare Big Data Security Needs • With Personally Identifiable Information (PII) Data  Any data that can be used to contact, locate or identify a specific individual, either by itself or combined with other sources that are easily accessed  May include information that is linked to an individual through financial, medical, educational or employment records  Some of the data elements that might be used to identify a certain person could consist of fingerprints, biometric data, a name, telephone number, email address or social security number  Federal laws required to handle PII data securely: HIPAA, Privacy Act, GLBA, FERPA, COPPA, and FCRA
  • 8. ©2015 MFMER | slide-8 • With Protected Health Information (PHI) Data  Any health information that is individually identifiable, and created or received by a covered entity - provider of health care, a health plan operator, or health clearing house  May relate to an individual’s present, past or future health, either in physical or mental terms, or the current condition of a person  Either maintained or transmitted in any given form, including speech, paper, or electronics  Exclude the education records covered by the educational family rights and privacy act or any employment records maintained by a covered entity  Federal law required to handle PHI data securely: HIPAA Enterprise Healthcare Big Data Security Needs
  • 9. ©2015 MFMER | slide-9 Mayo Clinic Big Data Clusters & Evolution
  • 10. ©2015 MFMER | slide-10 Mayo Clinic Big Data Clusters • Teradata Appliance with SUSE Linux Enterprise Server • Each Hadoop Cluster Coupled with One ElasticSearch (ES) Cluster on Selected Edge Nodes • Separated HDF (Nifi) Clusters (Not to be presented) • (Hadoop + ES) Clusters Normal: Sandbox, Dev, Test(Int)* and Prod Disaster Recovery (DR): Dev and Prod
  • 11. ©2015 MFMER | slide-11 Mayo Clinic Big Data Clusters • Data Storage on (Hadoop + ES) Clusters Permanent  HDFS folders/files  HBase tables  ES indexes Temporary/Permanent  Hive tables
  • 12. ©2015 MFMER | slide-12 Mayo Clinic (Hadoop + ES) Clusters Evolution • Hadoop/ES Cluster HDP + ES Evolution  TDH/HDP 1.3.2 + ES (v1.0.0) (Un-Kerberized)  TDH/HDP 2.1.2 + ES (v1.3.2..v1.5.2) (Un- Kerberized)  HDP 2.1.11 + ES (v1.5.2) (Secured: Local KDC + ES Shield via AD/LDAP)  HDP 2.3.4 + ES (v1.5.2..v1.7.2..v2.1.2..v2.3.2.. v2.4.1..v2.4.4) (Secured: AD/LDAP etc)  HDP 2.5.3 + ES (v2.4.4) (Secured: AD/LDAP etc)
  • 13. ©2015 MFMER | slide-13 Securing Enterprise Healthcare Big Data on Mayo Clinic Hadoop Clusters
  • 14. ©2015 MFMER | slide-14 Security Adopted on Mayo Clinic Hadoop Clusters
  • 15. ©2015 MFMER | slide-15 Security Adopted on Mayo Clinic Hadoop Clusters • Kerberos Security o Coupled with enterprise active directory (AD) using AD KDC • Coupled with lightweight directory access protocol (LDAP) over SSL o Critical HDP services + ElasticSearch service • Two Factor Authentication (TFA) Login and Sudo Capability Post OS-Hardening o Only for limited number of authorized users / applications on a local entry node(s) o Root login disabled • Ranger Plugins and Policies • HDFS/Hive/HBase Data Ops - Knox Gateway/F5 o The majority of users or applications
  • 16. ©2015 MFMER | slide-16 Kerberos with Active Directory • Kerberized Using Enterprise (Active Directory) AD KDC o Provides a host of extensions and conveniences, such as password expiration and account lockout o Authentication and authorization o AD user (princ) name + Password
  • 17. ©2015 MFMER | slide-17 Kerberos with Active Directory • Kerberized Using Enterprise (Active Directory) AD KDC (c'td) o AD user (princ) name + keytab o Auth_to_local rules needed for HDFS, Oozie, Storm, Kafka, Ranger KMS, and Atlas
  • 18. ©2015 MFMER | slide-18 LDAP + SSL == LDAPS • User Authentication/Authorization Also Uses LDAP protocol for Some Hadoop Components Services o Ambari, Ranger / Ranger KMS, Knox, Grafana, Atlas, Hue, ES o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs
  • 19. ©2015 MFMER | slide-19 LDAP + SSL == LDAPS o LDAP over SSL (LDAPS) certificate – Mayo Clinic Comodo certs (c’td)
  • 20. ©2015 MFMER | slide-20 TFA & Sudo Capability Post OS-Hardening • TFA Used for Local Users on Cluster Nodes o Root login disabled o Specific local nodes user name + password o Passcode generated on-the-fly from user’s mobile device – RSA o Sudo capability only for a limited number of users o Post TFA login, Kerberos authentication against AD is required
  • 21. ©2015 MFMER | slide-21 Ranger Plugins and Policies • Ranger Policies Control the Authorization of a Single User/ Group Users Authorized to Operate (CRUD etc) on the Specified Data or Services o Data in HDFS files/folders, Hive databases/tables, HBase namespaces/tables, (Solr collections/documents), Atlas metadata o Services of YARN, Storm, Kafka, Knox
  • 22. ©2015 MFMER | slide-22 Ranger Plugins and Policies o Example list of HDFS policies:
  • 23. ©2015 MFMER | slide-23 Ranger Plugins and Policies o Example of a HDFS policy:
  • 24. ©2015 MFMER | slide-24 Ranger Plugins and Policies • Ranger Also Performs Data Or Service Auditing o Example of hdfs user accessing hive service:
  • 25. ©2015 MFMER | slide-25 Knox Gateway - HDFS/Hive/HBase Data Ops • Required for the Majority of Mayo Clinic Big Data Clients (Users/Applications) • Non-Secure Hive Shell Has Been Disabled o Hive CLI ops are forced to use beeline • No Keytabs Issued for HDFS/Hive/HBase Data Options by a Client Application Outside Mayo Clinic Hadoop Clusters o Regular HDFS remote Java client using keytabs – Not allowed o Hive JDBC remote using keytabs – Not allowed o HBase remote Java client using keytabs – Not allowed
  • 26. ©2015 MFMER | slide-26 F5/Knox - HDFS/Hive/HBase Data Ops • WebHDFS, WebHCat/Knox-Hive JDBC and WebHBase for Data Ops of HDFS, Hive and HBase via Knox Gateway Respectively o HA (high availability) are achieved using 2 or more WebHDFS, WebHcat and WebHBase (Stargate HBase) services on the master nodes o Knox-Hive JDBC needs using user’s AD user name & password • F5 Balancer Over Two or More Knox Gateway Services for Each Hadoop Cluster Are Used to Achieve Knox Gateway Services HA (+ More Protection)
  • 27. ©2015 MFMER | slide-27 F5/Knox - HDFS/Hive/HBase Data Ops • HA Example – WebHBase:
  • 28. ©2015 MFMER | slide-28 Data Ops via F5 Balancer / Knox Gateway • Example – WebHDFS Data via Web Browser or Curl Cmd oWebUI & results https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/ zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN 90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0, 140000.0,7,3,113642.85714285714,…. https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN or https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN 90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0, 140000.0,7,3,113642.85714285714,….
  • 29. ©2015 MFMER | slide-29 Data Ops via F5 Balancer / Knox Gateway oCurl Cmd & results curl -i -k -u xxxxxxxx -X GET -L https://f5balanceryyyy:port1/gateway/YYYYYYYYY/webhdfs/v1/user/ zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN 90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0, 140000.0,7,3,113642.85714285714,…. curl -i -k -u xxxxxxxx -X GET -L https://knoxgatewayzzz1:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN or https://knoxgatewayzzz2:ddd7/gateway/YYYYYYYYY/webhdfs/v1/u ser/zzzzz/test/Solr/solr_curl_query_result1.txt?op=OPEN 90000.0,125000.0,Texas,120000.0,45500.0,250000.0,110000.0, 140000.0,7,3,113642.85714285714,….
  • 30. ©2015 MFMER | slide-30 How WebHDFS Data Ops via Knox Gateway Work? oComplex authentication and authorization process  Ranger HDFS Plugin allows access to the HDFS data  Ranger Knox Plugin allows access to the HDFS interface through Knox  The access path is likely to be: A User connects/authenticates to Knox via HTTPS (SSL protects credentials) Knox checks the user’s credentials via LDAPS (SSL protects credentials) Ranger Knox plugin allows access to WebHDFS (Ranger Knox service level authorization)
  • 31. ©2015 MFMER | slide-31 How WebHDFS Data Ops via Knox Gateway Work? Knox authenticates to AD (Kerberos protects Knox credentials) AD grants Knox a ticket granting ticket (TGT) Knox requests WebHDFS service ticket from AD AD grants Knox a service ticket (ST) for WebHDFS Knox passes the user as a proxyuser to WebHDFS The user tries to access the HDFS file XYZ  Ranger HDFS Plugin checks if policy exists for the user and the HDFS file XYZ (Ranger HDFS authorization); HDFS checks native authorization for the user and the HDFS file XYZ (HDFS authorization)  Issue authorization or denial  Only when authorized, data in the HDFS file XYZ is retrieved and returned to the client application (Web Browser or CLI)
  • 32. ©2015 MFMER | slide-32 Access Secured ElasticSearch (ES) Data • Using Rest API via WebUI or Curl Cmd https://elasticsearchuixxxx:ddd6/estest/ greeting/_search?pretty=true,q=title:Hello curl -v --user xxxxxxx -XGET https://elasticsearchuixxxx:ddd6/estest/ greeting/_search?pretty=true,q=title:Hello
  • 33. ©2015 MFMER | slide-33 Access Secured ElasticSearch (ES) Data • Using Java API via Transport-TCP
  • 34. ©2015 MFMER | slide-34 On-going & Future Direction
  • 35. ©2015 MFMER | slide-35 On-going & Future Direction • Big Data Network Segmentation - Whitelisting oList of Mayo Clinic healthcare business- allowed URLs or IP addresses  Hadoop cluster-specific  User/application client computer-specific oImplemented by Mayo Network team and Mayo Clinic BDTS team oPermit data/service operations by any user / client via the allowed list of IP connections while block all the others
  • 36. ©2015 MFMER | slide-36 On-going & Future Direction • Big Data At-Rest Encryption Enhancement oDrive (disk) encryption oClient-managed encrypted HDFS, Hive, HBase or ES data oRanger KMS-managed encryption keys and data de-encryption for HDFS, Hive and HBase data in the encryption zones  For HDP v2.3.4 and earlier versions, HDFS data in the encryption zones can only be retrieved by CLI but not via Knox Gateway
  • 37. ©2015 MFMER | slide-37 Conclusion
  • 38. ©2015 MFMER | slide-38 Conclusion • Enterprise Healthcare Big Data on Mayo Clinic Hadoop Clusters Have Been Successfully Protected by o Enterprise Kerberos o Active Directory o LDAP Over SSL o OS Hardening/TFA o Ranger o Knox Gateway/F5 Balancer • Underway at Mayo Clinic - Enhancement of Enterprise Healthcare Big Data Security by o Network Segmentation o Data-At-Rest Encryption via Ranger KMS • Achieved Successful Data Ops on Enterprise-Secured Healthcare Big Data
  • 39. ©2015 MFMER | slide-39 References • Mayo Clinic: http://www.mayoclinic.org/ • PII: http://privacyoffice.med.miami.edu/faq/privacy- faqs/what-is-personally-identifiable-information-pii • PHI: https://www.hipaa.com/hipaa-protected-health- information-what-does-phi-include/ • Hadoop Stack: http://hortonworks.com • ElasticSearch: https://www.elastic.co/ • CS Paper of Top IEEE Journal: Chen et al. Real-Time or Near Real-Time Persisting Daily Healthcare Data Into HDFS and ElasticSearch Index Inside a Big Data Platform. IEEE Transactions on Industrial Informatics, vol. 13, no.2, pp 595- 606, April 2017
  • 40. ©2015 MFMER | slide-40 Questions & Discussion chen.dequan@mayo.edu; 507-208-1599 Personal Email: dequanchen2007@gmail.com LinkedIn: https://www.linkedin.com/in/dequan-chen-5b0a37bb/