AWS Summit 2014 Melbourne - Breakout 2
Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.
Presenter: Peter Kerney, Senior Solution Architect, Intel
3. Big Data Analytics in Health and Life Sciences
Now: Disparate
streams of data
Genomics
Clinical
Claims &
Meds & transactions
labs
Patient
experience
Personal
data
Next: Integrated
computing and data
Better decisions and outcomes at
reduced cost
Clinical Analysis
Genomic Analysis
From population- to person-based
treatment
4. Cost Savings via Big Data Analytics
Provider
Proven Pathways of care
Co-ordinated across providers
Shift volume to right setting
Reducing ER (re)admit rates
Patient
Payer
Accelerated Approval Regulator
Producer
$70B
Accelerated Discovery
$100B
Provider / performance transparency
& payment innovation
$180B
Personalized medicine
Data-driven adherence
$100B
6. Technical Safeguards
Access Control
A covered entity must implement technical policies and
procedures that allow only authorized persons to access
electronic protected health information (e-PHI).
Audit Controls A covered entity must implement hardware, software, and/or
procedural mechanisms to record and examine access and
other activity in information systems that contain or use e-PHI.
Integrity Controls A covered entity must implement policies and procedures to
ensure that e-PHI is not improperly altered or destroyed.
Electronic measures must be put in place to confirm that e-PHI
has not been improperly altered or destroyed.
Transmission Security A covered entity must implement technical security measures
that guard against unauthorized access to e-PHI that is being
transmitted over an electronic network.
16. Deliver defense in depth
Project Rhino
Firewall
Encryption and Key Management
Gateway
Role Based Access Control
Authn
Common Authorization
AuthZ
Consistent Auditing
Isolation
Encryption Audit & Alerts
17. Protect Hadoop APIs
Hcatalog
Stargate
WebHDFS
• Enforces consistent security policies across all Hadoop
services
• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS
APIs
• Common Criteria EAL4+, HSM, FIPS 140-2 certified
• Deploys as software, virtual appliance, or hardware appliance
• Available on AWS Marketplace
18. Provide role-based access control
AuthZ
_acl_table
• File, table, and cell-level
access control in HBase
• JIRA HBASE-6222:
Add per-KeyValue security
19. Provide encryption for data at rest
MapReduce
RecordReader
Map
Combiner
Partitioner
Local
Merge & Sort
Reduce
RecordWriter
Encrypt
Decrypt
Derivative
Decrypt
Derivative
Encrypt
HDFS
• Extends compression
codec into crypto codec
• Provides an abstract API
for general use
21. Pig & Hive Encryption
• Pig Encryption Capabilities
– Support of text file and Avro* file format
– Intermediate job output file protection
– Pluggable key retrieving and key resolving
– Protection of key distribution in cluster
• Hive Encryption Capabilities
– Support of RC file and Avro file format
– Intermediate and final output data encryption
– Encryption is transparent to end user without changing existing SQL
22. Crypto Codec Framework
• Extends compression codec
• Establishes a common abstraction of the API level that can be shared
by all crypto codec implementations
CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf);
CryptoContext cryptoContext = new CryptoContext();
...
cryptoCodec.setCryptoContext(cryptoContext);
CompressionInputStream input = cryptoCodec.createInputStream(inputStream);
...
• Provides a foundation for other components in Hadoop* such as
MapReduce or HBase* to support encryption features
23. Key Distribution
• Enabling crypto codec in a MapReduce job
• Enabling different key storage or management systems
• Allowing different stages and files to use different keys
• API to integrate with external key manage system
24. Crypto Software Optimization
Multi-Buffer Crypt
• Process multiple independent
data buffers in parallel
• Improves cryptographic
functionality up to 2-9X
25. Intel® Data Protection Technology
AES-NI
• Processor assistance for
performing AES encryption
• Makes enabled encryption
software faster and stronger
Secure Key (DRNG)
• Processor-based true random
number generator
• More secure, standards
compliance, high performance
Data in Motion
Secure transactions used
pervasively in ecommerce,
banking, etc.
Data at Rest
Full disk encryption software
protects data while saving to disk
Internet
Data in Process
Most enterprise and cloud applications
offer encryption options to secure
information and protect confidentiality
AES-NI - Advanced Encryption Standard New Instructions
Secure Key - previously known as Intel Digital
Random Number Generator (DRNG)
26. Intel® AES-NI Accelerated Encryption
18.2x/19.8x
Non Intel®
AES-NI
With Intel®
AES-NI
Intel® AES-NI
Multi-Buffer
5.3x/19.8x
Encryption
Decryption
Encryption
Decryption
AES-NI - Advanced Encryption Standard New Instructions
20X
Faster
Crypto
Relative speed of crypto functions
Higher is better
Based on Intel tests
27. Cloud Platform for secure Hadoop
Intel® Xeon® Processors
• E7 Family
• E5 Family
• E3 Family
Amazon
• EC2 Reserved Instances
• EC2 Dedicated Instances
30. For more information
• intel.com/bigdata
• intel.com/healthcare/bigdata
• github.com/intel-hadoop/project-rhino/
• aws.amazon.com/compliance/
• aws.amazon.com/ec2/instance-types/