Contenu connexe Similaire à Securing Big Dta: Lock it Down or Liberate (20) Plus de DataWorks Summit (20) Securing Big Dta: Lock it Down or Liberate1. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Securing Big Data
Jeff Graham Mark Tomallo
Sr. Advisor, Data Analytics Director, Information Security & Risk
Enterprise Architecture Enterprise Services Department
June 4th, 2014
2. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Cardinal Health
33,000
plus employees
with direct
operations in 10
countries
100,000
locations
delivered to
daily
2
Leading provider of products and services across the healthcare
supply chain with an extensive footprint across multiple channels
$101B
FY13 revenue
#19
on Fortune 500
list
85%
of hospitals in the
U.S. use our
products and
services
3. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
What types of data do we use?
3
Market
Public
Data
(Medicare.gov)
Clinical
Product &
Supplier
Employee Logistics
4. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
The Challenge
4
• We knew the benefits of going to a Big Data platform, but
we had huge concerns over securing those assets.
• The technology was immature from a security standpoint.
• The goals of an analytics group were sometimes at odds
with the responsibility of Governance & Security.
5. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
The Opportunity
5
We needed to strike a balance between protecting our data and
liberating our analytics community.
This emerged into two guiding principles that is still evolving in
our organization:
• Lockdown the Platform
• Liberate the Data to authorized users
Lockdown Liberate
6. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
The Journey Begins..
6
We needed involvement from many disciplines to come together:
• Platform Security
• Identity Management
• Network Security
• Data Segmentation
• Data Tokenization
• Governance
7. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
7
Lockdown:Platform Security
• Host-based firewalls on control & data nodes
– Locked down using iptables
– Block connections from unauthorized hosts
• Gold-image boot for data nodes
– No persistent OS / config data - continuous fresh, secure image
– Ease of security patching
8. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
8
Lockdown:Hadoop Architecture
9. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Access Nodes
9
Lockdown:Identity Management
• Segmented access control to access/ control/ data nodes
• Secure Active Directory groups for data segmentation where sensitive
• Vintella Authentication using Kerberos
• Access Nodes can talk to Control Nodes, Control Nodes can talk to Data Nodes, User
restricted to Access Layer
Datameer
Admin
Data
Nodes
Users
Power Users
AD
MySQL
Sqoop
Hive
Flume
Control
Nodes
Developers
Data Owners
10. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
10
Lockdown:Network Security
• Host-based firewalls on control & data nodes
• Segregated VLAN on dedicated network switches
• Segregated Prod, Integration, Backup environments
• Transaction, security and event logging
• Host-based file integrity monitoring
11. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Segmentation
11
• Data is ingested under source specific accounts.
• Data ingestion is loosely coupled with transformations.
• Atomic data patterns to avoid partial data products
• Finer grain control over data access.
Ingestion
Transform
12. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Segmentation
12
Ingestion
• We had to ensure that our landed data was “all or nothing”
• Each load is atomic in nature.
• If a load fails, we don’t want to see partially streamed results.
HDFS
Merge & Rename Source (target area)Staging Part FilesRDBMS
Step #1
Sqoop
Step #3
hadoop fs -mv
Step #2
copyMerge API
13. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Segmentation
13
This gave us the flexibility to segment ingestion
privileges independently of any transformation.
Sales
Market
Market
Employee
Logistics
Clinical
Public
Data
14. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Segmentation
14
This gave us the flexibility to segment ingestion
privileges independently of any transformation.
Customer
Insights
Sales
Market
Market
Employee
Logistics
Warehouse
Optimization
Clinical
Public
Data
Outcome
Based
Medicine
15. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Tokenization
Private Data without Identity is no longer Private*
Segregation Model:
1. Private Identity Data – Identity data which is itself private
– e.g. Social Security Number
2. Identity Data – Data to identify the subject of the
associated data – e.g. Name, Passport ID
3. Private Attributes – Data only sensitive when associated
with an identity – e.g. blood type
*Except in rare cases where the Law decides it’s private without Identity.
15
16. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Tokenization
16
A tokenization gateway gives us a centralized, reusable
framework for transforming private data into non-sensitive data.
Address Tokenized Address
1313 Mockingbird Ln A76a39daf6e83363372d326
1700 Pennsylvania Ave 9eeb8dc55d37388b18c12b4
1411 N. Park Ave 0f2ef91d336d38b4db3be54
17. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Tokenization
17
The gateway is a highly protected service outside of the cluster.
18. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Liberate: Data Tokenization
18
The gateway is composed of three regions:
PRIVATE
• Data that needs to be tokenized.
• At a minimum must be comprised of a primary key and token values.
• Multi-tenant store with role-based security
VAULT
• Stores the private data in a SHA2/128-bit AES encrypted binary string.
• Generates a token by
• Tokens are sharded and referenced by name(and can be shared).
• Access extremely limit (administrator only).
PUBLIC
• Once tokens are generated in the vault, private data is joined to those
tokens and landed in the Public region.
• Multi-tenant store with role-based security.
• Private may read public, but public may only read public.
19. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
In Summary
19
We needed involvement from many disciplines to come together:
• Platform Security
• Hadoop Architecture
• Identity Management
• Network Security
• Data Segmentation
• Data Tokenization
Lockdown
Lockdown
Liberate
Liberate
Lockdown
Lockdown
20. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Lessons Learned
20
• Original focus was technology. Data privacy, governance, and
declassification were our largest hurdles.
• Accountability across the Enterprise is important.
• For Big Data, we haven’t achieved pure statistical anonymization as
this isn’t our core competency.
• Legacy source metadata security classification is challenge.
• Initial tokenization was a success. However:
o The complexity of a mature tokenization solution is orders of magnitude
more difficult than anticipated – The margin of error and penalty of error
are both very high.
o Metadata needed for full token lifecycle management are unknown &
complex
o Implementing without the right metadata would likely result in duplication
of tokens
21. © Copyright 2013, Cardinal Health. All rights reserved. CARDINAL HEALTH, the Cardinal Health LOGO and
ESSENTIAL TO CARE are trademarks or registered trademarks of Cardinal Health.
Q&A
Notes de l'éditeur Mark
Cardinal Health is a multi-billion dollar healthcare services company. Actually, we like to say we’re the business behind healthcare because we focus on making it more cost-effective so our customers can focus on their patients. We work with pharmacies, hospitals, doctor’s offices, surgery centers and clinical labs- basically anywhere healthcare services are offered.
As a leading provider of products and services in the healthcare supply chain, we have the broadest view of healthcare in the industry:
We have more than 33,000 employees with direct operations around the world
We deliver products and services to 100,000 locations daily
85 percent of hospitals in the U.S. use Cardinal Health products and services
We supply pharmaceuticals to fill 25 percent of branded prescriptions in the U.S.
In fact, a third of all distributed pharmaceutical, laboratory and medical products in the U.S. and Puerto Rico flow through the Cardinal Health supply chain.
We are proud to be #19 on the Fortune 500 list
Mark
How we use the data specific to Big Data
Mark Jeff
Some view Locking down areas of functionality as a bad thing. We should embrace lockdown much like we do brakes on a car. The breaks actually allow us to take more risks and improve agility. Jeff Jeff Jeff Mark Mark Jeff
Data is ingested under a source specific account.
The data ingestion process is loosely coupled with the transformation processes.
This afforded us finer grain control over who and what processes have permission to access raw data.
This required us to develop atomic data patterns to avoid partial data products.
Jeff Jeff
This gave us the flexibility to segment ingestion privileges independently of any transformation. Jeff
This gave us the flexibility to segment ingestion privileges independently of any transformation. Mark Mark Jeff Jeff Mark Mark