SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nasdaq and the Nasdaq logo are registered and unregistered trademarks, or service marks, of Nasdaq, Inc. or its subsidiaries in the U.S. and other countries.
Hadoop, Hive, Spark, Parquet, and Zeppelin are registered and unregistered trademarks of the Apache Software Foundation in the U.S. and other countries.
Moataz Anany, Solutions Architect, AWS
Nate Sammons, Principal Architect, Nasdaq
November 29, 2016
SEC308
Securing Enterprise Big Data
Workloads on AWS
What to expect from this session
Hybrid enterprise data warehouse: A typical architecture
Apply security controls across this architecture
How it’s done at
A snorkel or a deep dive?
Effective security starts with a plan
“In security engineering, you first need to…
define the threat model, then create a security policy,
and only then choose security technologies that suit”
– Bruce Schneier*
* Secrets and Lies: Digital Security in a Networked World
https://www.amazon.com/Secrets-Lies-Digital-Security-Networked/dp/1119092434/
A hybrid
enterprise data warehouse
A typical hybrid enterprise data warehouse
Corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
engineers
Amazon QuickSight
A typical hybrid enterprise data warehouse
Corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
Scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
Engineers
Amazon QuickSight
How do you make it secure?
A typical hybrid enterprise data warehouse
Corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
Scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
Engineers
Amazon QuickSight
Start at the foundation
AWS Identity
and Access
Management
Amazon Virtual
Private Cloud
Configure IAM
IAM – a quick refresher
• Manage users and groups
• Powerful policy language
• Role-based access to API actions
• AWS-managed policy templates
{
"Statement":[{
"Effect":"effect",
"Principal":"principal",
"Action":"action",
"Resource":"arn",
"Condition":{
"condition":{
"key":"value" }
}
}
]
}
Structure of IAM policy statement
Configure IAM
AWS account
Amazon EMR
Amazon Redshift
Amazon S3
API actions:
• RunJobFlow
• DescribeJobFlow
• TerminateJobFlow
• ListClusters
• …
API actions:
• CreateCluster
• DescribeClusters
• ModifyCluster
• DeleteCluster
• …
Bucket
API actions:
• CreateBucket
• DeleteBucket
• …
Object
API actions:
• PutObject
• GetObject
• …
Roles
Groups
Users
Accounts
Configure IAM
Build IAM policies that match common activities
Access to Amazon S3
 Administration (IAM)
 Data read/write (IAM)
Access to Amazon EMR
 Cluster management (IAM)
 Running batch transient jobs (IAM)
 In-cluster activity (Hadoop AuthN/AuthZ)
 Client access (Hadoop AuthN/AuthZ)
Access to Amazon Redshift
 Cluster management (IAM)
 Authorizing COPY/UNLOAD (IAM)
 In-cluster activity (Amazon Redshift
AuthN/AuthZ)
Configure IAM
Define AWS Identities and attach policies
• Define IAM users, groups, and roles
• Provide least privilege IAM access to Amazon S3, EMR, and Amazon Redshift
• Simulate and verify IAM policies
AWS
Identity
S3
Prefix “/…/...”
Amazon Redshift
Cluster “aaa”
Amazon EMR AWS IAM
User X <Policy doc IDs> No Access No Access No Access
Group Y <Policy doc IDs> <Policy doc IDs> <Policy doc IDs> <Policy doc IDs>
Role Z <Policy doc IDs> <Policy doc IDs> No Access No Access
…
Configure IAM
Layer security controls around sensitive API actions
Use IAM policy conditions to...
• Require MFA for destructive API actions
 s3:DeleteBucket
 redshift:DeleteCluster
 elasticmapreduce:TerminateJobFlow
• Add pre-conditions such as source IP address
or time of day
MFA
Policy
conditions
Sensitive
APIs
Configure IAM
Customize service IAM roles for
Amazon EMR
• EMR creates two default IAM roles
• Default roles are assumed by EMR
• AWS-managed policies are attached to default
roles
• Understand default policies and customize
new ones
Amazon
S3
Amazon
EC2
Amazon
SQS
AWS
IAM
Amazon
EMR
Amazon
SNS
Amazon
CloudWatch
A typical hybrid enterprise data warehouse
corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
Scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
Engineers
Amazon QuickSight
Start at the foundation
AWS Identity
and Access
Management
Amazon Virtual
Private Cloud
Launch clusters in private VPC subnets
Corporate data
center
Amazon
S3
Data
scientists
AWS region
Business
end users
Private subnet
AWS
CloudHSM
AWS Direct
Connect
Enterprise data
sources AWSKMS
S3 VPC
endpoint
EMR cluster
Public subnet
Customer
router /
firewall
Virtual
private
gateway
Amazon
DynamoDB
Internet
gateway
VPC NAT
gateway
Traffic to AWS
endpoints
Amazon
SQS
Amazon Redshift
cluster
Amazon Redshift
and EMR
data traffic
Elastic Load
Balancing
Proxy
farm
Multiple private subnets
Launch clusters in private VPC subnets
corporate data
center
Amazon
S3
Data
Scientists
AWS region
Business
end users
Private VPC subnet
AWS
CloudHSM
AWS Direct
Connect
Enterprise data
sources AWS KMS
Amazon
Redshift
S3 VPC endpoint
Amazon
EMR Custer
Public VPC subnet
Internet
Gateway
VPC NAT Gateway
Customer
Gateway
Virtual
Private
Gateway
Amazon
DynamoDB
Communication
with AWS
service
endpoints
Amazon
SQS
Key security benefits
• Data flows are private; traversing your VPC
• Multiple network traffic “choke points”
• Traffic logging with VPC Flow Logs
• Dedicated tenancy is possible
A typical hybrid enterprise data warehouse
corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
Scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
Engineers
Amazon QuickSight
Protect your data with
access control
Amazon
S3
Amazon
Redshift
Amazon
EMR
Control access to data
Access control in a multi-team environment?
Key goals:
• Secure and segregated access to…
 Amazon S3
 Amazon Redshift clusters
 Amazon EMR clusters
• Secure data sharing between teams
Control access to data
“Fine-grained” data and
resource ownership
• Teams share S3 buckets
and clusters
• Access control complex to
set up and maintain
• Common in a
“shared services”
architecture
Team X Team Y Team Z
Amazon EMR cluster
Amazon S3 buckets
Local FS
HDFS
EMRFS
Amazon Redshift cluster
Databases and schemas
/foo/bar /abc/xyz /local
hdfs:///data/1st hdfs:///data2
s3://bucket/prfx s3://group/data
Zeppelin Presto Hive …
“Fine-grained”
ownership
Control access to data
Amazon
S3 buckets and
prefixes
Amazon
EMR clusters
Team X
Amazon
Redshift clusters
Prefer “coarse-grained” data and
resource ownership
• Teams own entire S3 buckets and
clusters
• Ownership segregated by AWS
accounts
• Access control easier to setup and
maintain
• Suitable for autonomous teams “Coarse-grained”
ownership
Control access to data
Configure Amazon S3 permissions
• Implement your access control matrix using IAM
policies
• Use S3 bucket policies for easy cross-account
data sharing
• Limit role-based access from an Amazon EMR
cluster’s EC2 instance profile
• Authorize Amazon Redshift COPY and
UNLOAD commands using IAM roles
Amazon
S3
Amazon
Redshift
Amazon
EMR
IAM
principals
Control access to data
Configure AuthN and AuthZ in Amazon EMR
• Enable “Secure Mode” in Hadoop
• Setup and configure Kerberos authentication
• Configure Hadoop ACLs for authorization
• Optionally integrate EMR with Apache Ranger or a
similar security framework
MIT Kerberos
Control access to data
Configure AuthN and AuthZ in Amazon Redshift
• Amazon Redshift is based on PostgreSQL
• GRANT or REVOKE fine-grained permissions databases, schemas,
tables, and other objects
• Set secure default privileges for new objects using the ALTER
DEFAULT PRIVILEGES command
• Verify privileges using SET SESSION AUTHORIZATION command
A typical hybrid enterprise data warehouse
corporate data center
Amazon
S3
AWS Direct
Connect
Amazon
Redshift
Amazon
EMR
AWS Cloud
Data
Scientists
Business
end users
Enterprise data
sources
Extract,
upload,
and
transform
Explore,
analyze,
and
manipulate
Query
and
visualize
1
2
3
Data
Engineers
Amazon QuickSight
Protect your data with
encryption
Amazon
S3
Amazon
Redshift
Amazon
EMR
Encrypt data at rest
In a nutshell…
1. Decide on an encryption key management strategy
2. Pick encryption mode for Amazon S3 objects
3. Configure encryption in Amazon EMR
4. Launch an encrypted Amazon Redshift cluster
Encrypt data at rest
Decide on an encryption key management strategy
AWS Key
Management Service
(AWS KMS)
AWS service managed
keys
Custom key
management system
AWS CloudHSM
Encrypt data at rest
What is AWS KMS?
• Simplifies creation, import, control, rotation, deletion, and use
of encryption keys
• Integrated with AWS client-side and server-side encryption
• Integrated with AWS CloudTrail
Encrypt data at rest
Decide on an encryption key management strategy
Do I have to
manage my
encryption keys?
Do I need dedicated
key management
hardware?
Do I have to manage
my keys on premises? Strategy
No No No Use AWS service managed
Yes No No Use AWS KMS
Yes Yes No Use AWS CloudHSM
Yes No Yes Use own KMS
Yes Yes Yes Use own HSM
Encrypt data at rest
Pick encryption mode for Amazon S3 objects
Where and when do I need to encrypt my data for S3?
• Before upload, after download – S3 client-side encryption
• After upload, before download – S3 server-side encryption
Encrypt data at rest
Pick encryption mode for Amazon S3 objects
CSE - KMS CSE - C
SSE - KMS SSE - C SSE - S3Server side
Client side
AWS KMS S3 built-inCustom KMS
Key
management?
Encryption
point?
Encrypt data at rest
Configure encryption in Amazon EMR
EMRFS encryption
• Supports S3 client-side and
server-side modes
• ... except SSE-C
• SSE and CSE modes
mutually exclusive
• In-transit encryption with TLS
Corenode
Root volume
Amazon S3
EMRFS clientHDFS client
Hive metastore
database
Hive
Hadoop
MapReduce
Spark
… other
daemons
Data volumes
Masternode
Root
volume
Amazon EMR cluster
Data
volume
Encrypt data at rest
Configure encryption in Amazon EMR
Local volume encryption
• Instance store split into
virtual root and data
volumes
• Root volume not
encryptable
• Data volumes encryptable
with LUKS*
Corenode
Data volumesRoot volume
Amazon S3
EMRFS clientHDFS client
Hive metastore
database
Hive
Hadoop
MapReduce
Spark
… other
daemons
Masternode
Root
volume
Amazon EMR cluster
Data
volume
* Linux Unified Key Setup disk encryption
Encrypt data at rest
Configure encryption in Amazon EMR
Volume encryption key
management
• Use AWS KMS as your
key provider
• Or use a custom key
provider application
Corenode
Data volumesRoot volume
Amazon S3
EMRFS clientHDFS client
Hive metastore
database
Hive
Hadoop
MapReduce
Spark
… other
daemons
Masternode
Root
volume
Amazon EMR cluster
Data
volume
Encrypt data at rest
Configure encryption in Amazon EMR
HDFS encryption
• Local volume encryption
enables HDFS block
transfers and RPC traffic
encryption
• Open-source HDFS
transparent encryption
 Finer-grained control
 End-to-end encryption
Corenode
data volumesroot volume
Amazon S3
EMRFS clientHDFS Client
Hive metastore
database
Hive
Hadoop
MapReduce
Spark
… other
daemons
Masternode
root
volume
Amazon EMR cluster
Data
volume
Encrypt data at rest
Configure encryption in Amazon EMR
 Create a managed “security configuration” object...
• Configure EMRFS and local-volume encryption at rest
• Configure encryption in transit
 At cluster creation time...
• Reference a managed security configuration
• If needed, configure HDFS transparent encryption
Encrypt data at rest
Launch an encrypted Amazon Redshift cluster
• Four-tier key hierarchy
• AES algorithm with 256-bit keys
• Use AWS KMS or HSM
• Control rotation of encryption keys
• Blocks backed up to S3 are encrypted
10 GigE
(HPC)
Backup
JDBC/ODBC
At rest
Four-tier key
hierarchy
Encrypt data in transit
Protect data flows
Point “A” Point “B” Data flow protection
Enterprise data
sources
Amazon S3 Encrypted with SSL/TLS; S3 requests signed with AWS Sigv4
Amazon S3 Amazon EMR Encrypted with SSL/TLS
Amazon S3 Amazon Redshift Encrypted with SSL/TLS
Amazon EMR Clients Encrypted with SSL/TLS; varies with Hadoop application client
Amazon Redshift Clients Supports SSL/TLS; Requires configuration
Apache Hadoop on Amazon EMR
• Hadoop RPC encryption
• HDFS Block data transfer encryption
• KMS over HTTPS is not enabled by default with Hadoop KMS
• May vary with EMR release (such as Tez and Spark in release 5.0.0+)
How it’s done at
What to expect from this session
Introduction
Choices made on our path:
• Amazon Redshift
• Amazon EMR
Future directions for big data at
NASDAQ LISTS3 , 7 0 0 G L O B A L C O M P A N I E S
IN MARKET CAP REPRESENTING
WORTH $9.3TRILLION
DIVERSE INDUSTRIES AND
MANY OF THE WORLD’S
MOST WELL-KNOWN AND
INNOVATIVE BRANDSMORE THAN U.S.
1 TRILLIONNATIONAL VALUE IS TIED
TO OUR LIBRARY OF MORE THAN
43,000 GLOBAL INDEXES
N A S D A Q T E C H N O L O G Y
IS USED TO POWER MORE THAN
IN 50 COUNTRIES
100 MARKETPLACES
OUR GLOBAL PLATFORM
CAN HANDLE MORE THAN
1 MILLION
MESSAGES/SECOND
AT SUB-40 MICROSECONDS
AV E R A G E S P E E D S
1 C L E A R I N G H O U S E
WE OWN AND OPERATE
33 MARKETS
5 CENTRAL SECURITIES
DEPOSITORIES
INCLUDING
A C R O S S A S S E T CL A S SE S
& GEOGRAPHIES
Amazon Redshift at Nasdaq
• In use since Amazon Redshift was in beta
• Nasdaq’s main data warehousing workhorse
• Daily ingest from 100s of internal sources, 6-20B rows/day
• Current footprint: 18x ds2.8xlarge instances, 3 trillion rows
• Highly sensitive data:
• All orders, quotes, trades, etc. from all Nasdaq exchanges
• Membership and ownership information
Daily ingest in billions of rows
Amazon Redshift workloads
• Billing and reporting
• Market surveillance
• Economic research
• Trade history queries
Amazon Redshift network security
• Clusters inside VPC subnets
• Locked down security groups
• VPC endpoint for Amazon S3 access
• SSL required for connectivity
• SSL certificate for each Amazon Redshift cluster
• 10 G AWS Direct Connect circuits to Nasdaq
• On-premises firewalls also limit access
Business
end users
AWS Direct
Connect
Amazon
Redshift
Public VPC subnet
VPC NAT
gateway
S3 VPC
endpoint
AWS APIs
VPC subnet
S3
Amazon Redshift data security
Amazon Redshift clusters
• Encryption keys in an on-premises HSM
• Amazon Redshift has a minimal IAM policy
Amazon S3 data
• Encrypted using S3-CSE
• Custom key management system
• Keys are stored on premises at Nasdaq
AWS Direct
Connect
Amazon
Redshift
VPC subnet
S3
endpoint
On-premises
HSM, KMS,
data ingest
S3
On-premises HSM
• Physical separation for keys
• Requires an EIP for Amazon
Redshift
• HSMs are delicate and require
special handling
Amazon Redshift encryption key management
AWS KMS
• Policy-based key rotation
• IAM policies for usage
• AWS CloudTrail usage logs
• High durability storage
• Support for more AWS
services (Amazon EBS,
Amazon RDS, etc.)
• Need to trust AWS
Amazon Redshift access control and monitoring
• Write access allowed only for the data ingest system
• Users granted access to specific schemas
• Users granted specific WLM constraints
• Monitor STL_CONNECTION_LOG for access
• Logs in S3 pulled on-premises for analysis
• Amazon Redshift activity logging
• CloudTrail API logs
• VPC Flow Logs
AWS CloudTrailAmazon S3
Amazon
CloudWatch
Amazon
Redshift
Managing Amazon Redshift cluster resources
Initially we never purged any data
• Led to growing clusters once per quarter
Now we maintain a 1-year rolling window in Amazon
Redshift
• Older data is accessed infrequently
• Resizing a large Amazon Redshift cluster is not instantaneous
• Grow clusters based on market volumes, acquisitions
• This led us to extend our warehouse to EMR and S3
Amazon EMR at Nasdaq
Gaining traction internally
• Building an open data platform
• Parallel daily loads of data for Amazon EMR and
Amazon Redshift
• Data stored as encrypted Parquet files in Amazon S3
Keep data “forever”
• Current footprint is 5.1 million objects, 500 TB
• Approximately 6.5 trillion rows since January 2014
• Backfilling data from the 1990s, around 1.5 PB
Hadoop file formats
Evaluated Parquet and ORC
• Arrived at Apache Parquet
Benefits
• Modern columnar format with good compression
• “Self-describing” format
• Growing support across open source projects
• Works with our two main use cases: Spark and Presto
• Good performance when encrypted
Amazon EMR workloads
Apache Spark and Zeppelin
• Economic research
• Market surveillance
• Machine learning
Presto from Facebook
• Trade history queries
• BI and reporting (experimental)
Amazon EMR data strategy
Decouple storage and compute
• Scale each as needed
• Data stored centrally in Amazon S3
Hive directory structure in S3
• Easy partitioning of time series data by date
• Fine-grained access control using bucket policies
• Cross-account access using bucket policies
• Use “MSCK REPAIR TABLE” to rebuild metastore
Multi-account Amazon EMR strategy
One AWS account per use-case or internal department
• Balance time and money for their own needs
• No resource contention between clients
• Cross-account bucket policies for S3 access
• VPC peering for Hive metastore access
Private VPC subnet
S3
endpointAmazon
EMR cluster
Private VPC subnet
Hive
metastore
VPC
peering
Client AWS accountCentral AWS account
Amazon S3
Amazon EMR network security
• Clusters in private VPC subnets
• S3 access via VPC API endpoint
• AWS API access via NAT gateway
• Locked down security groups
• 10 G AWS Direct Connect circuits to Nasdaq
• On-premise firewalls
AWS Direct
Connect
Amazon
EMR cluster
Private VPC subnet
Public VPC subnet
VPC NAT
gateway
S3 VPC
endpoint
AWS APIs
Business
end users
S3
Amazon EMR cluster security
Clusters are ephemeral
• No long-running clusters
• HDFS used only for scratch space
• Permanent data stored in S3
Local instance security
• New EMR security configuration
• Disk encryption using AWS KMS
• SELinux setup with an EMR BA script
AWS Direct
Connect
Amazon
EMR Cluster
Private VPC subnet
Public VPC subnet
VPC NAT
Gateway
S3 VPC
Endpoint
AWS APIs
Business
end users
S3
Amazon S3 data security with EMR
EMRFS: Amazon S3 as HDFS
• S3-CSE integrated as part of EMRFS
• Custom S3 encryption materials provider jar
• Requests to “seek” within objects stored in S3 works well
and is critical for performance
Multi-account access control
• S3 bucket policies control access
• Able to limit access to specific schemas and tables
Apache Spark data security
EMRFS on S3 “just works” with Spark
• Simple configurations for S3-CSE
• EMR security configuration for local disk encryption
Apache Zeppelin notebook storage in S3
• Nasdaq contributed S3-CSE support
• Custom KMS and AWS KMS supported as of 0.6.0
https://github.com/apache/zeppelin/pull/886
Presto data security
Presto does not use EMRFS
• PrestoS3FileSystem is part of the Hive connector
• EMR security configuration for local disk encryption
Nasdaq contributed S3-CSE support to Presto
• Support for S3-CSE with custom KMS merged in 0.129
https://github.com/prestodb/presto/pull/3802
• Support for S3-CSE-KMS merged in 0.153
https://github.com/prestodb/presto/pull/5701
Coming next: Data community
• Clients perform analytics on shared data
• New datasets created in their local account
• Amazon SQS messages from a “staging” S3 bucket
trigger data ingest in the central warehouse account
• Maintains centralized write access to the warehouse
• Client accounts generate Parquet output
• Automatically categorize and catalog data
Thank you!
Remember to complete
your evaluations!

Contenu connexe

Tendances

Tendances (20)

Protecting Your Data with Encryption on AWS
Protecting Your Data with Encryption on AWSProtecting Your Data with Encryption on AWS
Protecting Your Data with Encryption on AWS
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
 
Security Architecture recommendations for your new AWS operation - Pop-up Lof...
Security Architecture recommendations for your new AWS operation - Pop-up Lof...Security Architecture recommendations for your new AWS operation - Pop-up Lof...
Security Architecture recommendations for your new AWS operation - Pop-up Lof...
 
Security Day IAM Recommended Practices
Security Day IAM Recommended PracticesSecurity Day IAM Recommended Practices
Security Day IAM Recommended Practices
 
AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2
 
Creating Your Virtual Data Center: Amazon VPC Fundamentals and Connectivity O...
Creating Your Virtual Data Center: Amazon VPC Fundamentals and Connectivity O...Creating Your Virtual Data Center: Amazon VPC Fundamentals and Connectivity O...
Creating Your Virtual Data Center: Amazon VPC Fundamentals and Connectivity O...
 
Practical Steps to Hack-Proofing AWS
Practical Steps to Hack-Proofing AWSPractical Steps to Hack-Proofing AWS
Practical Steps to Hack-Proofing AWS
 
Network Security and Access Control within AWS
Network Security and Access Control within AWS Network Security and Access Control within AWS
Network Security and Access Control within AWS
 
Crypto Options in AWS
Crypto Options in AWSCrypto Options in AWS
Crypto Options in AWS
 
Security best practices on AWS - Pop-up Loft TLV 2017
Security best practices on AWS - Pop-up Loft TLV 2017Security best practices on AWS - Pop-up Loft TLV 2017
Security best practices on AWS - Pop-up Loft TLV 2017
 
Encryption and key management in AWS (SEC304) | AWS re:Invent 2013
Encryption and key management in AWS (SEC304) | AWS re:Invent 2013Encryption and key management in AWS (SEC304) | AWS re:Invent 2013
Encryption and key management in AWS (SEC304) | AWS re:Invent 2013
 
(SEC305) How to Become an IAM Policy Ninja in 60 Minutes or Less
(SEC305) How to Become an IAM Policy Ninja in 60 Minutes or Less(SEC305) How to Become an IAM Policy Ninja in 60 Minutes or Less
(SEC305) How to Become an IAM Policy Ninja in 60 Minutes or Less
 
AWS re:Invent 2016: AWS Partners and Data Privacy (GPST303)
AWS re:Invent 2016: AWS Partners and Data Privacy (GPST303)AWS re:Invent 2016: AWS Partners and Data Privacy (GPST303)
AWS re:Invent 2016: AWS Partners and Data Privacy (GPST303)
 
AWS re:Invent 2016: Scaling Security Resources for Your First 10 Million Cust...
AWS re:Invent 2016: Scaling Security Resources for Your First 10 Million Cust...AWS re:Invent 2016: Scaling Security Resources for Your First 10 Million Cust...
AWS re:Invent 2016: Scaling Security Resources for Your First 10 Million Cust...
 
AWS Security – Keynote Address (SEC101) | AWS re:Invent 2013
AWS Security – Keynote Address (SEC101) | AWS re:Invent 2013AWS Security – Keynote Address (SEC101) | AWS re:Invent 2013
AWS Security – Keynote Address (SEC101) | AWS re:Invent 2013
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
Get Started and Migrate Your Data to AWS
Get Started and Migrate Your Data to AWSGet Started and Migrate Your Data to AWS
Get Started and Migrate Your Data to AWS
 
AWS re:Invent 2016: How to Automate Policy Validation (SEC311)
AWS re:Invent 2016: How to Automate Policy Validation (SEC311)AWS re:Invent 2016: How to Automate Policy Validation (SEC311)
AWS re:Invent 2016: How to Automate Policy Validation (SEC311)
 
(SEC316) Harden Your Architecture w/ Security Incident Response Simulations
(SEC316) Harden Your Architecture w/ Security Incident Response Simulations(SEC316) Harden Your Architecture w/ Security Incident Response Simulations
(SEC316) Harden Your Architecture w/ Security Incident Response Simulations
 

En vedette

Moving AWS workloads to OpenStack
Moving AWS workloads to OpenStackMoving AWS workloads to OpenStack
Moving AWS workloads to OpenStack
Mirantis
 

En vedette (20)

AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
AWS re:Invent 2016: Reduce Your Blast Radius by Using Multiple AWS Accounts P...
 
AWS re:Invent 2016: Get the Most from AWS KMS: Architecting Applications for ...
AWS re:Invent 2016: Get the Most from AWS KMS: Architecting Applications for ...AWS re:Invent 2016: Get the Most from AWS KMS: Architecting Applications for ...
AWS re:Invent 2016: Get the Most from AWS KMS: Architecting Applications for ...
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech TalksDeep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
 
A Data Journey With AWS
A Data Journey With AWSA Data Journey With AWS
A Data Journey With AWS
 
AWS Customer Presentation - NASDAQ OMX
AWS Customer Presentation - NASDAQ OMX AWS Customer Presentation - NASDAQ OMX
AWS Customer Presentation - NASDAQ OMX
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Moving AWS workloads to OpenStack
Moving AWS workloads to OpenStackMoving AWS workloads to OpenStack
Moving AWS workloads to OpenStack
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWS
 
LDAP, SAML and Hue
LDAP, SAML and HueLDAP, SAML and Hue
LDAP, SAML and Hue
 
Moving Workloads into AWS GovCloud (US) - AWS Symposium 2014 - Washington D.C.
Moving Workloads into AWS GovCloud (US) - AWS Symposium 2014 - Washington D.C. Moving Workloads into AWS GovCloud (US) - AWS Symposium 2014 - Washington D.C.
Moving Workloads into AWS GovCloud (US) - AWS Symposium 2014 - Washington D.C.
 
Using AWS to Meet Requirements for HIPAA, FERPA, and CJIS | AWS Public Sector...
Using AWS to Meet Requirements for HIPAA, FERPA, and CJIS | AWS Public Sector...Using AWS to Meet Requirements for HIPAA, FERPA, and CJIS | AWS Public Sector...
Using AWS to Meet Requirements for HIPAA, FERPA, and CJIS | AWS Public Sector...
 
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
 
Ingest and storage options
Ingest and storage optionsIngest and storage options
Ingest and storage options
 
Simplestream
SimplestreamSimplestream
Simplestream
 
DevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesosDevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesos
 
Women in Technology: Supporting Diversity in a Technical Workplace
Women in Technology: Supporting Diversity in a Technical WorkplaceWomen in Technology: Supporting Diversity in a Technical Workplace
Women in Technology: Supporting Diversity in a Technical Workplace
 

Similaire à AWS re:Invent 2016: Securing Enterprise Big Data Workloads on AWS (SEC308)

Similaire à AWS re:Invent 2016: Securing Enterprise Big Data Workloads on AWS (SEC308) (20)

Securing enterprise big data workloads on AWS
Securing enterprise big data workloads on AWSSecuring enterprise big data workloads on AWS
Securing enterprise big data workloads on AWS
 
Big data security in AWS.pptx
Big data security in AWS.pptxBig data security in AWS.pptx
Big data security in AWS.pptx
 
AWS Paris Summit 2014 - Keynote Stephen Schmidt - AWS Security
AWS Paris Summit 2014 - Keynote Stephen Schmidt - AWS SecurityAWS Paris Summit 2014 - Keynote Stephen Schmidt - AWS Security
AWS Paris Summit 2014 - Keynote Stephen Schmidt - AWS Security
 
Sicurezza e Compliance nel Cloud
Sicurezza e Compliance nel CloudSicurezza e Compliance nel Cloud
Sicurezza e Compliance nel Cloud
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
Introduction to AWS Security
Introduction to AWS SecurityIntroduction to AWS Security
Introduction to AWS Security
 
The AWS Shared Responsibility Model in Practice
The AWS Shared Responsibility Model in PracticeThe AWS Shared Responsibility Model in Practice
The AWS Shared Responsibility Model in Practice
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
SEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) ScaleSEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) Scale
 
The AWS Shared Responsibility Model in Practice
The AWS Shared Responsibility Model in PracticeThe AWS Shared Responsibility Model in Practice
The AWS Shared Responsibility Model in Practice
 
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
Data Security in the Cloud - Matt Taylor - AWS TechShift ANZ 2018
 
Well-Architected for Security: Advanced Session
Well-Architected for Security: Advanced SessionWell-Architected for Security: Advanced Session
Well-Architected for Security: Advanced Session
 
AWS re:Invent re:Cap - 종단간 보안을 위한 클라우드 아키텍처 구축 - 양승도
AWS re:Invent re:Cap - 종단간 보안을 위한 클라우드 아키텍처 구축 - 양승도AWS re:Invent re:Cap - 종단간 보안을 위한 클라우드 아키텍처 구축 - 양승도
AWS re:Invent re:Cap - 종단간 보안을 위한 클라우드 아키텍처 구축 - 양승도
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management Track
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

AWS re:Invent 2016: Securing Enterprise Big Data Workloads on AWS (SEC308)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nasdaq and the Nasdaq logo are registered and unregistered trademarks, or service marks, of Nasdaq, Inc. or its subsidiaries in the U.S. and other countries. Hadoop, Hive, Spark, Parquet, and Zeppelin are registered and unregistered trademarks of the Apache Software Foundation in the U.S. and other countries. Moataz Anany, Solutions Architect, AWS Nate Sammons, Principal Architect, Nasdaq November 29, 2016 SEC308 Securing Enterprise Big Data Workloads on AWS
  • 2. What to expect from this session Hybrid enterprise data warehouse: A typical architecture Apply security controls across this architecture How it’s done at
  • 3. A snorkel or a deep dive?
  • 4. Effective security starts with a plan “In security engineering, you first need to… define the threat model, then create a security policy, and only then choose security technologies that suit” – Bruce Schneier* * Secrets and Lies: Digital Security in a Networked World https://www.amazon.com/Secrets-Lies-Digital-Security-Networked/dp/1119092434/
  • 6. A typical hybrid enterprise data warehouse Corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data engineers Amazon QuickSight
  • 7. A typical hybrid enterprise data warehouse Corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight How do you make it secure?
  • 8. A typical hybrid enterprise data warehouse Corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Start at the foundation AWS Identity and Access Management Amazon Virtual Private Cloud
  • 9. Configure IAM IAM – a quick refresher • Manage users and groups • Powerful policy language • Role-based access to API actions • AWS-managed policy templates { "Statement":[{ "Effect":"effect", "Principal":"principal", "Action":"action", "Resource":"arn", "Condition":{ "condition":{ "key":"value" } } } ] } Structure of IAM policy statement
  • 10. Configure IAM AWS account Amazon EMR Amazon Redshift Amazon S3 API actions: • RunJobFlow • DescribeJobFlow • TerminateJobFlow • ListClusters • … API actions: • CreateCluster • DescribeClusters • ModifyCluster • DeleteCluster • … Bucket API actions: • CreateBucket • DeleteBucket • … Object API actions: • PutObject • GetObject • … Roles Groups Users Accounts
  • 11. Configure IAM Build IAM policies that match common activities Access to Amazon S3  Administration (IAM)  Data read/write (IAM) Access to Amazon EMR  Cluster management (IAM)  Running batch transient jobs (IAM)  In-cluster activity (Hadoop AuthN/AuthZ)  Client access (Hadoop AuthN/AuthZ) Access to Amazon Redshift  Cluster management (IAM)  Authorizing COPY/UNLOAD (IAM)  In-cluster activity (Amazon Redshift AuthN/AuthZ)
  • 12. Configure IAM Define AWS Identities and attach policies • Define IAM users, groups, and roles • Provide least privilege IAM access to Amazon S3, EMR, and Amazon Redshift • Simulate and verify IAM policies AWS Identity S3 Prefix “/…/...” Amazon Redshift Cluster “aaa” Amazon EMR AWS IAM User X <Policy doc IDs> No Access No Access No Access Group Y <Policy doc IDs> <Policy doc IDs> <Policy doc IDs> <Policy doc IDs> Role Z <Policy doc IDs> <Policy doc IDs> No Access No Access …
  • 13. Configure IAM Layer security controls around sensitive API actions Use IAM policy conditions to... • Require MFA for destructive API actions  s3:DeleteBucket  redshift:DeleteCluster  elasticmapreduce:TerminateJobFlow • Add pre-conditions such as source IP address or time of day MFA Policy conditions Sensitive APIs
  • 14. Configure IAM Customize service IAM roles for Amazon EMR • EMR creates two default IAM roles • Default roles are assumed by EMR • AWS-managed policies are attached to default roles • Understand default policies and customize new ones Amazon S3 Amazon EC2 Amazon SQS AWS IAM Amazon EMR Amazon SNS Amazon CloudWatch
  • 15. A typical hybrid enterprise data warehouse corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Start at the foundation AWS Identity and Access Management Amazon Virtual Private Cloud
  • 16. Launch clusters in private VPC subnets Corporate data center Amazon S3 Data scientists AWS region Business end users Private subnet AWS CloudHSM AWS Direct Connect Enterprise data sources AWSKMS S3 VPC endpoint EMR cluster Public subnet Customer router / firewall Virtual private gateway Amazon DynamoDB Internet gateway VPC NAT gateway Traffic to AWS endpoints Amazon SQS Amazon Redshift cluster Amazon Redshift and EMR data traffic Elastic Load Balancing Proxy farm Multiple private subnets
  • 17. Launch clusters in private VPC subnets corporate data center Amazon S3 Data Scientists AWS region Business end users Private VPC subnet AWS CloudHSM AWS Direct Connect Enterprise data sources AWS KMS Amazon Redshift S3 VPC endpoint Amazon EMR Custer Public VPC subnet Internet Gateway VPC NAT Gateway Customer Gateway Virtual Private Gateway Amazon DynamoDB Communication with AWS service endpoints Amazon SQS Key security benefits • Data flows are private; traversing your VPC • Multiple network traffic “choke points” • Traffic logging with VPC Flow Logs • Dedicated tenancy is possible
  • 18. A typical hybrid enterprise data warehouse corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Protect your data with access control Amazon S3 Amazon Redshift Amazon EMR
  • 19. Control access to data Access control in a multi-team environment? Key goals: • Secure and segregated access to…  Amazon S3  Amazon Redshift clusters  Amazon EMR clusters • Secure data sharing between teams
  • 20. Control access to data “Fine-grained” data and resource ownership • Teams share S3 buckets and clusters • Access control complex to set up and maintain • Common in a “shared services” architecture Team X Team Y Team Z Amazon EMR cluster Amazon S3 buckets Local FS HDFS EMRFS Amazon Redshift cluster Databases and schemas /foo/bar /abc/xyz /local hdfs:///data/1st hdfs:///data2 s3://bucket/prfx s3://group/data Zeppelin Presto Hive … “Fine-grained” ownership
  • 21. Control access to data Amazon S3 buckets and prefixes Amazon EMR clusters Team X Amazon Redshift clusters Prefer “coarse-grained” data and resource ownership • Teams own entire S3 buckets and clusters • Ownership segregated by AWS accounts • Access control easier to setup and maintain • Suitable for autonomous teams “Coarse-grained” ownership
  • 22. Control access to data Configure Amazon S3 permissions • Implement your access control matrix using IAM policies • Use S3 bucket policies for easy cross-account data sharing • Limit role-based access from an Amazon EMR cluster’s EC2 instance profile • Authorize Amazon Redshift COPY and UNLOAD commands using IAM roles Amazon S3 Amazon Redshift Amazon EMR IAM principals
  • 23. Control access to data Configure AuthN and AuthZ in Amazon EMR • Enable “Secure Mode” in Hadoop • Setup and configure Kerberos authentication • Configure Hadoop ACLs for authorization • Optionally integrate EMR with Apache Ranger or a similar security framework MIT Kerberos
  • 24. Control access to data Configure AuthN and AuthZ in Amazon Redshift • Amazon Redshift is based on PostgreSQL • GRANT or REVOKE fine-grained permissions databases, schemas, tables, and other objects • Set secure default privileges for new objects using the ALTER DEFAULT PRIVILEGES command • Verify privileges using SET SESSION AUTHORIZATION command
  • 25. A typical hybrid enterprise data warehouse corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Protect your data with encryption Amazon S3 Amazon Redshift Amazon EMR
  • 26. Encrypt data at rest In a nutshell… 1. Decide on an encryption key management strategy 2. Pick encryption mode for Amazon S3 objects 3. Configure encryption in Amazon EMR 4. Launch an encrypted Amazon Redshift cluster
  • 27. Encrypt data at rest Decide on an encryption key management strategy AWS Key Management Service (AWS KMS) AWS service managed keys Custom key management system AWS CloudHSM
  • 28. Encrypt data at rest What is AWS KMS? • Simplifies creation, import, control, rotation, deletion, and use of encryption keys • Integrated with AWS client-side and server-side encryption • Integrated with AWS CloudTrail
  • 29. Encrypt data at rest Decide on an encryption key management strategy Do I have to manage my encryption keys? Do I need dedicated key management hardware? Do I have to manage my keys on premises? Strategy No No No Use AWS service managed Yes No No Use AWS KMS Yes Yes No Use AWS CloudHSM Yes No Yes Use own KMS Yes Yes Yes Use own HSM
  • 30. Encrypt data at rest Pick encryption mode for Amazon S3 objects Where and when do I need to encrypt my data for S3? • Before upload, after download – S3 client-side encryption • After upload, before download – S3 server-side encryption
  • 31. Encrypt data at rest Pick encryption mode for Amazon S3 objects CSE - KMS CSE - C SSE - KMS SSE - C SSE - S3Server side Client side AWS KMS S3 built-inCustom KMS Key management? Encryption point?
  • 32. Encrypt data at rest Configure encryption in Amazon EMR EMRFS encryption • Supports S3 client-side and server-side modes • ... except SSE-C • SSE and CSE modes mutually exclusive • In-transit encryption with TLS Corenode Root volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Data volumes Masternode Root volume Amazon EMR cluster Data volume
  • 33. Encrypt data at rest Configure encryption in Amazon EMR Local volume encryption • Instance store split into virtual root and data volumes • Root volume not encryptable • Data volumes encryptable with LUKS* Corenode Data volumesRoot volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Masternode Root volume Amazon EMR cluster Data volume * Linux Unified Key Setup disk encryption
  • 34. Encrypt data at rest Configure encryption in Amazon EMR Volume encryption key management • Use AWS KMS as your key provider • Or use a custom key provider application Corenode Data volumesRoot volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Masternode Root volume Amazon EMR cluster Data volume
  • 35. Encrypt data at rest Configure encryption in Amazon EMR HDFS encryption • Local volume encryption enables HDFS block transfers and RPC traffic encryption • Open-source HDFS transparent encryption  Finer-grained control  End-to-end encryption Corenode data volumesroot volume Amazon S3 EMRFS clientHDFS Client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Masternode root volume Amazon EMR cluster Data volume
  • 36. Encrypt data at rest Configure encryption in Amazon EMR  Create a managed “security configuration” object... • Configure EMRFS and local-volume encryption at rest • Configure encryption in transit  At cluster creation time... • Reference a managed security configuration • If needed, configure HDFS transparent encryption
  • 37. Encrypt data at rest Launch an encrypted Amazon Redshift cluster • Four-tier key hierarchy • AES algorithm with 256-bit keys • Use AWS KMS or HSM • Control rotation of encryption keys • Blocks backed up to S3 are encrypted 10 GigE (HPC) Backup JDBC/ODBC At rest Four-tier key hierarchy
  • 38. Encrypt data in transit Protect data flows Point “A” Point “B” Data flow protection Enterprise data sources Amazon S3 Encrypted with SSL/TLS; S3 requests signed with AWS Sigv4 Amazon S3 Amazon EMR Encrypted with SSL/TLS Amazon S3 Amazon Redshift Encrypted with SSL/TLS Amazon EMR Clients Encrypted with SSL/TLS; varies with Hadoop application client Amazon Redshift Clients Supports SSL/TLS; Requires configuration Apache Hadoop on Amazon EMR • Hadoop RPC encryption • HDFS Block data transfer encryption • KMS over HTTPS is not enabled by default with Hadoop KMS • May vary with EMR release (such as Tez and Spark in release 5.0.0+)
  • 40. What to expect from this session Introduction Choices made on our path: • Amazon Redshift • Amazon EMR Future directions for big data at
  • 41. NASDAQ LISTS3 , 7 0 0 G L O B A L C O M P A N I E S IN MARKET CAP REPRESENTING WORTH $9.3TRILLION DIVERSE INDUSTRIES AND MANY OF THE WORLD’S MOST WELL-KNOWN AND INNOVATIVE BRANDSMORE THAN U.S. 1 TRILLIONNATIONAL VALUE IS TIED TO OUR LIBRARY OF MORE THAN 43,000 GLOBAL INDEXES N A S D A Q T E C H N O L O G Y IS USED TO POWER MORE THAN IN 50 COUNTRIES 100 MARKETPLACES OUR GLOBAL PLATFORM CAN HANDLE MORE THAN 1 MILLION MESSAGES/SECOND AT SUB-40 MICROSECONDS AV E R A G E S P E E D S 1 C L E A R I N G H O U S E WE OWN AND OPERATE 33 MARKETS 5 CENTRAL SECURITIES DEPOSITORIES INCLUDING A C R O S S A S S E T CL A S SE S & GEOGRAPHIES
  • 42. Amazon Redshift at Nasdaq • In use since Amazon Redshift was in beta • Nasdaq’s main data warehousing workhorse • Daily ingest from 100s of internal sources, 6-20B rows/day • Current footprint: 18x ds2.8xlarge instances, 3 trillion rows • Highly sensitive data: • All orders, quotes, trades, etc. from all Nasdaq exchanges • Membership and ownership information
  • 43. Daily ingest in billions of rows
  • 44. Amazon Redshift workloads • Billing and reporting • Market surveillance • Economic research • Trade history queries
  • 45. Amazon Redshift network security • Clusters inside VPC subnets • Locked down security groups • VPC endpoint for Amazon S3 access • SSL required for connectivity • SSL certificate for each Amazon Redshift cluster • 10 G AWS Direct Connect circuits to Nasdaq • On-premises firewalls also limit access Business end users AWS Direct Connect Amazon Redshift Public VPC subnet VPC NAT gateway S3 VPC endpoint AWS APIs VPC subnet S3
  • 46. Amazon Redshift data security Amazon Redshift clusters • Encryption keys in an on-premises HSM • Amazon Redshift has a minimal IAM policy Amazon S3 data • Encrypted using S3-CSE • Custom key management system • Keys are stored on premises at Nasdaq AWS Direct Connect Amazon Redshift VPC subnet S3 endpoint On-premises HSM, KMS, data ingest S3
  • 47. On-premises HSM • Physical separation for keys • Requires an EIP for Amazon Redshift • HSMs are delicate and require special handling Amazon Redshift encryption key management AWS KMS • Policy-based key rotation • IAM policies for usage • AWS CloudTrail usage logs • High durability storage • Support for more AWS services (Amazon EBS, Amazon RDS, etc.) • Need to trust AWS
  • 48. Amazon Redshift access control and monitoring • Write access allowed only for the data ingest system • Users granted access to specific schemas • Users granted specific WLM constraints • Monitor STL_CONNECTION_LOG for access • Logs in S3 pulled on-premises for analysis • Amazon Redshift activity logging • CloudTrail API logs • VPC Flow Logs AWS CloudTrailAmazon S3 Amazon CloudWatch Amazon Redshift
  • 49. Managing Amazon Redshift cluster resources Initially we never purged any data • Led to growing clusters once per quarter Now we maintain a 1-year rolling window in Amazon Redshift • Older data is accessed infrequently • Resizing a large Amazon Redshift cluster is not instantaneous • Grow clusters based on market volumes, acquisitions • This led us to extend our warehouse to EMR and S3
  • 50. Amazon EMR at Nasdaq Gaining traction internally • Building an open data platform • Parallel daily loads of data for Amazon EMR and Amazon Redshift • Data stored as encrypted Parquet files in Amazon S3 Keep data “forever” • Current footprint is 5.1 million objects, 500 TB • Approximately 6.5 trillion rows since January 2014 • Backfilling data from the 1990s, around 1.5 PB
  • 51. Hadoop file formats Evaluated Parquet and ORC • Arrived at Apache Parquet Benefits • Modern columnar format with good compression • “Self-describing” format • Growing support across open source projects • Works with our two main use cases: Spark and Presto • Good performance when encrypted
  • 52. Amazon EMR workloads Apache Spark and Zeppelin • Economic research • Market surveillance • Machine learning Presto from Facebook • Trade history queries • BI and reporting (experimental)
  • 53. Amazon EMR data strategy Decouple storage and compute • Scale each as needed • Data stored centrally in Amazon S3 Hive directory structure in S3 • Easy partitioning of time series data by date • Fine-grained access control using bucket policies • Cross-account access using bucket policies • Use “MSCK REPAIR TABLE” to rebuild metastore
  • 54. Multi-account Amazon EMR strategy One AWS account per use-case or internal department • Balance time and money for their own needs • No resource contention between clients • Cross-account bucket policies for S3 access • VPC peering for Hive metastore access Private VPC subnet S3 endpointAmazon EMR cluster Private VPC subnet Hive metastore VPC peering Client AWS accountCentral AWS account Amazon S3
  • 55. Amazon EMR network security • Clusters in private VPC subnets • S3 access via VPC API endpoint • AWS API access via NAT gateway • Locked down security groups • 10 G AWS Direct Connect circuits to Nasdaq • On-premise firewalls AWS Direct Connect Amazon EMR cluster Private VPC subnet Public VPC subnet VPC NAT gateway S3 VPC endpoint AWS APIs Business end users S3
  • 56. Amazon EMR cluster security Clusters are ephemeral • No long-running clusters • HDFS used only for scratch space • Permanent data stored in S3 Local instance security • New EMR security configuration • Disk encryption using AWS KMS • SELinux setup with an EMR BA script AWS Direct Connect Amazon EMR Cluster Private VPC subnet Public VPC subnet VPC NAT Gateway S3 VPC Endpoint AWS APIs Business end users S3
  • 57. Amazon S3 data security with EMR EMRFS: Amazon S3 as HDFS • S3-CSE integrated as part of EMRFS • Custom S3 encryption materials provider jar • Requests to “seek” within objects stored in S3 works well and is critical for performance Multi-account access control • S3 bucket policies control access • Able to limit access to specific schemas and tables
  • 58. Apache Spark data security EMRFS on S3 “just works” with Spark • Simple configurations for S3-CSE • EMR security configuration for local disk encryption Apache Zeppelin notebook storage in S3 • Nasdaq contributed S3-CSE support • Custom KMS and AWS KMS supported as of 0.6.0 https://github.com/apache/zeppelin/pull/886
  • 59. Presto data security Presto does not use EMRFS • PrestoS3FileSystem is part of the Hive connector • EMR security configuration for local disk encryption Nasdaq contributed S3-CSE support to Presto • Support for S3-CSE with custom KMS merged in 0.129 https://github.com/prestodb/presto/pull/3802 • Support for S3-CSE-KMS merged in 0.153 https://github.com/prestodb/presto/pull/5701
  • 60. Coming next: Data community • Clients perform analytics on shared data • New datasets created in their local account • Amazon SQS messages from a “staging” S3 bucket trigger data ingest in the central warehouse account • Maintains centralized write access to the warehouse • Client accounts generate Parquet output • Automatically categorize and catalog data