Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Securing enterprise big data workloads on AWS

1 077 vues

Publié le

Security of big data workloads in a hybrid IT environment often comes as an afterthought. This session discusses how enterprises can architect secure, big-data workloads on AWS. We cover the application of authentication, authorization, encryption, and additional security principles and mechanisms to workloads leveraging Amazon Elastic MapReduce (EMR) and Amazon Redshift.

Publié dans : Business
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Securing enterprise big data workloads on AWS

  1. 1. Securing Enterprise Big Data workloads on AWS Pratim Das, Specialist SA, Analytics - EME
  2. 2. Building a Big Data Application Agenda Securing your Big Data Applications Deep dive on Redshift Security Deep dive on EMR Security Network Isolation Using VPN Identity and Access Management Encryption at Rest/Transit Compliance and Assurance
  3. 3. Securing your Big Data Solution
  4. 4. AWS Big Data & Analytics Security Posture • Fine grained permissions and auditing using AWS IAM and AWS CloudTrail • Encryption at rest with choice of key management • Service managed, AWS KMS, CloudHSM, on premise HSM • Encryption in Transit • Require SSL, all internal communication over SSL/TLS • Network isolation using Amazon VPC
  5. 5. A typical hybrid enterprise data warehouse Corporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data engineers Amazon QuickSight DMS
  6. 6. A typical hybrid enterprise data warehouseCorporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight How do you make it secure?
  7. 7. A typical hybrid enterprise data warehouseCorporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Start at the foundation • AWS Identity and Access Management Amazon Virtual Private Cloud
  8. 8. Configure IAM • IAM – a quick refresher • Manage users and groups • Powerful policy language • Role-based access to API actions • AWS-managed policy templates { "Statement":[{ "Effect":"effect", "Principal":"principal", "Action":"action", "Resource":"arn", "Condition":{ "condition":{ "key":"value" } } } ] } Structure of IAM policy statement
  9. 9. Configure IAM AWS account Amazon EMR Amazon Redshift Amazon S3 API actions: • RunJobFlow • DescribeJobFlow • TerminateJobFlow • ListClusters • … API actions: • CreateCluster • DescribeClusters • ModifyCluster • DeleteCluster • … Bucket API actions: • CreateBucket • DeleteBucket • … Object API actions: • PutObject • GetObject • … Roles Groups Users Accounts
  10. 10. Configure IAM Build IAM policies that match common activities Access to Amazon S3 ü Administration (IAM) ü Data read/write (IAM) Access to Amazon EMR ü Cluster management (IAM) ü Running batch transient jobs (IAM) ü In-cluster activity (Hadoop AuthN/AuthZ) ü Client access (Hadoop AuthN/AuthZ) Access to Amazon Redshift ü Cluster management (IAM) ü Authorizing COPY/UNLOAD (IAM) ü In-cluster activity (Amazon Redshift AuthN/AuthZ)
  11. 11. Configure IAM • Define AWS Identities and attach policies • Define IAM users, groups, and roles • Provide least privilege IAM access to Amazon S3, EMR, and Amazon Redshift • Simulate and verify IAM policies AWS Identity S3 Prefix “/…/...” Amazon Redshift Cluster “aaa” Amazon EMR AWS IAM User X <Policy doc IDs> No Access No Access No Access Group Y <Policy doc IDs> <Policy doc IDs> <Policy doc IDs> <Policy doc IDs> Role Z <Policy doc IDs> <Policy doc IDs> No Access No Access …
  12. 12. Configure IAM • Layer security controls around sensitive API actions • Use IAM policy conditions to... • Require MFA for destructive API actions § s3:DeleteBucket § redshift:DeleteCluster § elasticmapreduce:TerminateJobFlow • Add pre-conditions such as source IP address or time of day MFA Policy conditions Sensitive APIs
  13. 13. Configure IAM • Customize service IAM roles for Amazon EMR • EMR creates two default IAM roles • Default roles are assumed by EMR • AWS-managed policies are attached to default roles • Understand default policies and customize new ones Amazon S3 Amazon EC2 Amazon SQS AWS IAM Amazon EMR Amazon SNS Amazon CloudWatch
  14. 14. A typical hybrid enterprise data warehousecorporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Start at the foundation • AWS Identity and Access Management Amazon Virtual Private Cloud
  15. 15. Launch clusters in private VPC subnets Corporate data center Amazon S3 Data scientists AWS region Business end users Private subnet AWS CloudHSM AWS Direct Connect Enterprise data sources AWS KMS S3 VPC endpoint EMR cluster Public subnet Customer router / firewall Virtual private gateway Amazon DynamoDB Internet gateway VPC NAT gateway Traffic to AWS endpoints Amazon SQS Amazon Redshift cluster Amazon Redshift and EMR data traffic Elastic Load Balancing Proxy farm Multiple private subnets
  16. 16. Launch clusters in private VPC subnets corporate data center Amazon S3 Data Scientists AWS region Business end users Private VPC subnet AWS CloudHSM AWS Direct Connect Enterprise data sources AWS KMS Amazon Redshift S3 VPC endpoint Amazon EMR Custer Public VPC subnet Internet Gateway VPC NAT Gateway Customer Gateway Virtual Private Gateway Amazon DynamoDB Communication with AWS service endpoints Amazon SQS Key security benefits • Data flows are private; traversing your VPC • Multiple network traffic “choke points” • Traffic logging with VPC Flow Logs • Dedicated tenancy is possible
  17. 17. A typical hybrid enterprise data warehousecorporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Protect your data with access control Amazon S3 Amazon Redshift Amazon EMR
  18. 18. Control access to data Access control in a multi-team environment? Key goals: • Secure and segregated access to… § Amazon S3 § Amazon Redshift clusters § Amazon EMR clusters • Secure data sharing between teams
  19. 19. Control access to data “Fine-grained” data and resource ownership • Teams share S3 buckets and clusters • Access control complex to set up and maintain • Common in a “shared services” architecture Team X Team Y Team Z Amazon EMR cluster Amazon S3 buckets Local FS HDFS EMRFS Amazon Redshift cluster Databases and schemas /foo/bar /abc/xyz /local hdfs:///data/1st hdfs:///data2 s3://bucket/prfx s3://group/data Zeppelin Presto Hive … “Fine-grained” ownership
  20. 20. Control access to data Amazon S3 buckets and prefixes Amazon EMR clusters Team X Amazon Redshift clusters Prefer “coarse-grained” data and resource ownership • Teams own entire S3 buckets and clusters • Ownership segregated by AWS accounts • Access control easier to setup and maintain • Suitable for autonomous teams “Coarse-grained” ownership
  21. 21. Control access to data Configure Amazon S3 permissions • Implement your access control matrix using IAM policies • Use S3 bucket policies for easy cross-account data sharing • Limit role-based access from an Amazon EMR cluster’s EC2 instance profile • Authorize Amazon Redshift COPY and UNLOAD commands using IAM roles Amazon S3 Amazon Redshift Amazon EMR IAM principals
  22. 22. Control access to data Configure AuthN and AuthZ in Amazon EMR • Enable “Secure Mode” in Hadoop • Setup and configure Kerberos authentication • Configure Hadoop ACLs for authorization • Optionally integrate EMR with Apache Ranger or a similar security framework MIT Kerberos
  23. 23. Control access to data Configure AuthN and AuthZ in Amazon Redshift • Amazon Redshift is based on PostgreSQL • GRANT or REVOKE fine-grained permissions databases, schemas, tables, and other objects • Set secure default privileges for new objects using the ALTER DEFAULT PRIVILEGES command • Verify privileges using SET SESSION AUTHORIZATION command
  24. 24. A typical hybrid enterprise data warehousecorporate data center Amazon S3 AWS Direct Connect Amazon Redshift Amazon EMR AWS Cloud Data Scientists Business end users Enterprise data sources Extract, upload, and transform Explore, analyze, and manipulate Query and visualize 1 2 3 Data Engineers Amazon QuickSight Protect your data with encryption Amazon S3 Amazon Redshift Amazon EMR Amazon Athena
  25. 25. Encrypt data at rest In a nutshell… 1. Decide on an encryption key management strategy 2. Pick encryption mode for Amazon S3 objects 3. With Amazon Athena you can query using SSE S3, SSE KMS, and CSE KMS. Athena provides you an option to encrypt your result sets. 4. Configure encryption in Amazon EMR 5. Launch an encrypted Amazon Redshift cluster
  26. 26. Encrypt data at rest Decide on an encryption key management strategy AWS Key Management Service (AWS KMS) AWS service managed keys Custom key management system AWS CloudHSM
  27. 27. Encrypt data at rest What is AWS KMS? • Simplifies creation, import, control, rotation, deletion, and use of encryption keys • Integrated with AWS client-side and server-side encryption • Integrated with AWS CloudTrail
  28. 28. Encrypt data at rest Decide on an encryption key management strategy Do I have to manage my encryption keys? Do I need dedicated key management hardware? Do I have to manage my keys on premises? Strategy No No No Use AWS service managed Yes No No Use AWS KMS Yes Yes No Use AWS CloudHSM Yes No Yes Use own KMS Yes Yes Yes Use own HSM
  29. 29. Encrypt data at rest Pick encryption mode for Amazon S3 objects Where and when do I need to encrypt my data for S3? • Before upload, after download – S3 client-side encryption • After upload, before download – S3 server-side encryption
  30. 30. Encrypt data at rest Pick encryption mode for Amazon S3 objects CSE - KMS CSE - C SSE - KMS SSE - C SSE - S3Server side Client side AWS KMS S3 built-inCustom KMS Key management? Encryption point?
  31. 31. Encrypt data at rest Configure encryption in Amazon EMR EMRFS encryption • Supports S3 client-side and server-side modes • ... except SSE-C • SSE and CSE modes mutually exclusive • In-transit encryption with TLS Core node Root volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Data volumes Master node Root volume Amazon EMR cluster Data volume
  32. 32. Encrypt data at rest Configure encryption in Amazon EMR Local volume encryption • Instance store split into virtual root and data volumes • Root volume not encryptable • Data volumes encryptable with LUKS* Core node Data volumesRoot volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Master node Root volume Amazon EMR cluster Data volume * Linux Unified Key Setup disk encryption
  33. 33. Encrypt data at rest Configure encryption in Amazon EMR Volume encryption key management • Use AWS KMS as your key provider • Or use a custom key provider application Core node Data volumesRoot volume Amazon S3 EMRFS clientHDFS client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Master node Root volume Amazon EMR cluster Data volume
  34. 34. Encrypt data at rest Configure encryption in Amazon EMR HDFS encryption • Local volume encryption enables HDFS block transfers and RPC traffic encryption • Open-source HDFS transparent encryption ü Finer-grained control ü End-to-end encryption Core node data volumesroot volume Amazon S3 EMRFS clientHDFS Client Hive metastore database Hive Hadoop MapReduce Spark … other daemons Master node root volume Amazon EMR cluster Data volume
  35. 35. Encrypt data at rest Configure encryption in Amazon EMR ü Create a managed “security configuration” object... • Configure EMRFS and local-volume encryption at rest • Configure encryption in transit ü At cluster creation time... • Reference a managed security configuration • If needed, configure HDFS transparent encryption
  36. 36. Encrypt data at rest Launch an encrypted Amazon Redshift cluster • Four-tier key hierarchy • AES algorithm with 256-bit keys • Use AWS KMS or HSM • Control rotation of encryption keys • Blocks backed up to S3 are encrypted 10 GigE (HPC) Backup JDBC/ODBC At rest Four-tier key hierarchy
  37. 37. Encrypt data in transit Protect data flows Point “A” Point “B” Data flow protection Enterprise data sources Amazon S3 Encrypted with SSL/TLS; S3 requests signed with AWS Sigv4 Amazon S3 Amazon EMR Encrypted with SSL/TLS Amazon S3 Amazon Redshift Encrypted with SSL/TLS Amazon EMR Clients Encrypted with SSL/TLS; varies with Hadoop application client Amazon Redshift Clients Supports SSL/TLS; Requires configuration Apache Hadoop on Amazon EMR • Hadoop RPC encryption • HDFS Block data transfer encryption • KMS over HTTPS is not enabled by default with Hadoop KMS • May vary with EMR release (such as Tez and Spark in release 5.0.0+)
  38. 38. Compliance
  39. 39. Assurance Programs https://aws.amazon.com/compliance/services-in-scope/ ISO 9001 SOC 3 SOC 2 ISO 27001 ISO 27017 PCI DSS Level 1ISO 27018 SOC 1 / ISAE 3402 GxPHIPAA ITAR FERPA FISMA, RMF, and DIACAP FedRAMP Section 508 / VPAT DoD SRG Levels 2 & 4 FIPS 140-2 CJIS Cloud Security Alliance MPAA NIST MLPS Level 3 G-Cloud IT-Grundschutz MTCS Tier 3 IRAP Cyber Essentials Plus
  40. 40. Additional reading • Implementing Authorization and Auditing using Apache Ranger on Amazon EMR • Secure Amazon EMR with Encryption • Respond to State Changes on Amazon EMR Clusters with Amazon CloudWatch Events • Run Jupyter Notebook and JupyterHub on Amazon EMR • Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS • Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight • Encrypt and Decrypt Amazon Kinesis Records Using AWS KMS • https://aws.amazon.com/blogs/big-data/ • https://aws.amazon.com/answers/big-data/ http://amzn.to/2ptgamM http://amzn.to/2ptlRRA http://amzn.to/2ooGkrw http://amzn.to/2kBMCUu http://amzn.to/2nEh6be http://amzn.to/2oyKD5p http://amzn.to/2osN619
  41. 41. Thank you!

×