SlideShare une entreprise Scribd logo
1  sur  35
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ran Tessler - Manager, Solutions Architecture, AWS
Shahar Bonderman – Head of Architecture, Matomy
June 21, 2017
Deploying a Data Lake in AWS
What to expect from this session
• Data Lake concept
• Important capabilities of a Data Lake
• Matomy’s Data Lake implementation
• Big Data Reference Architecture
Data Lake Concept
What is a Data Lake?
Data Lake is a new and increasingly
popular way to store and analyze
massive volumes and heterogenous
types of data in a centralized repository.
Benefits of a Data Lake – Quick Ingest
“How can I collect data quickly
from various sources and store
it efficiently?”
Quickly ingest data
without needing to force it into a
pre-defined schema.
Benefits of a Data Lake – All Data in One Place
“Why is the data distributed in
many locations? Where is the
single source of truth?”
Store and analyze all of your data,
from all of your sources, in one
centralized location.
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
Benefits of a Data Lake – Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
Important Capabilities of a
“Data Lake”
Important components of a Data Lake
Catalog & Search Protect & SecureAccess & User
Interface
Ingest and Store
Ingest and Store
Ingest streaming and batch data
Support for any type of data at scale
Durable
Low cost
Amazon S3 as your cluster’s persistent data store
Amazon S3
Separate compute and storage
Resize and shut down Analytics
Compute Environments with no data
loss
Point multiple compute clusters at
same data in Amazon S3
AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data Ingestion into Amazon S3
Use S3 as Data Substrate for Compute
EMR Kinesis
Redshift DynamoDB RDS
Athena
Storm
Amazon
S3
Import/Export
Snowball
Highly Durable
Low Cost
Scalable Storage
Spark
Metadata lake
Used for summary statistics and data
Classification management
Simplified model for data discovery & governance
Catalog & Search
Catalog & Search Architecture
Data Collectors
(EC2, ECS)
S3 Bucket
Metadata Index
Amazon DynamoDB
Put Object
AWS Lambda
Object Created,
Object Deleted Put Item
AWS Lambda
Search Index
Amazon Elasticsearch
Extract Search Fields
Update
Stream
Update
Index
Exposes the data lake to customers
Programmatically query catalogue
Expose search API
Ensures that entitlements are respected
API & User Interface
API & UI Architecture
Metadata Index
Amazon DynamoDB
Search Index
Amazon Elasticsearch
AWS LambdaAPI Gateway
Users
API
User
Management
Static Website
UI
Access Control - Authentication & Authorization
Data protection - Encryption
Logging and Monitoring
Protect and Secure
Encryption ComplianceSecurity
§ Identity & Access Management
§ Bucket policies
§ Access Control Lists (ACLs)
§ Query string authentication
§ Private VPC endpoints to
Amazon S3
§ SSL endpoints
§ Server Side Encryption
(SSE-S3, SSE-C, SSE-
KMS)
§ Client-side Encryption
§ Buckets access logs
§ Lifecycle Management
Policies
§ Versioning & MFA deletes
§ Certifications – HIPAA, PCI,
SOC 1/2/3 etc.
Implement the right controls
Putting it all together
A Data Lake on AWS
Catalog & Search Access & User Interface
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena
Kinesis
Analytics
RDS
Central
Storage
S3
Snowball Database
Migration Service
Kinesis Firehose Direct
Connect
Collect & Ingest
Protect & Secure Process & Analyze
Security
Token Service
CloudWatch CloudTrail Key Management
Service
That’s a lot of work!
Where do I even start?
http://aws.amazon.com/answers
Smarter Technology
Stronger Advertising
400,000
Bid Request
per Second
40Billion
Events per Day
<20ms
Response
Time
Collect & Store
Process
Analyze & Consume
Analyze & Consume
Senior Data
Engineer
We Are Hiring
DevOps
Engineer
Thank You
Big Data Reference Architecture
Streaming
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Amazon Elasticsearch
Service
Apache Kafka
Amazon RDS
Amazon DynamoDB
Amazon ElastiCache
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Amazon SQS
Amazon DynamoDB
Streams
Stream
SearchSQLNoSQLCacheFileMessage
Stream
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
ETL
Amazon EMR
Amazon SQS apps
Amazon Redshift
Amazon EC2
Amazon Athena
BatchMessageInteractiveAI
Presto
Amazon
EMR
Amazon
AI
Amazon S3
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Amazon QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
Ran Tessler
tesslerr@amazon.com

Contenu connexe

Tendances

Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
Amazon Web Services
 

Tendances (20)

AWS re:Invent 2016: Automating Security Event Response, from Idea to Code to ...
AWS re:Invent 2016: Automating Security Event Response, from Idea to Code to ...AWS re:Invent 2016: Automating Security Event Response, from Idea to Code to ...
AWS re:Invent 2016: Automating Security Event Response, from Idea to Code to ...
 
Introduction to AWS Security
Introduction to AWS SecurityIntroduction to AWS Security
Introduction to AWS Security
 
Introduction to AWS Organizations
Introduction to AWS OrganizationsIntroduction to AWS Organizations
Introduction to AWS Organizations
 
Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi Accounts
 
Getting started with aws security toronto rs
Getting started with aws security toronto rsGetting started with aws security toronto rs
Getting started with aws security toronto rs
 
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
 
AWS Summit Seoul 2015 - 모바일 및 IoT 환경을 위한 AWS 클라우드 플랫폼의 진화 (윤석찬, Markku Lepisto)
AWS Summit Seoul 2015 - 모바일 및 IoT 환경을 위한 AWS 클라우드 플랫폼의 진화 (윤석찬, Markku Lepisto)AWS Summit Seoul 2015 - 모바일 및 IoT 환경을 위한 AWS 클라우드 플랫폼의 진화 (윤석찬, Markku Lepisto)
AWS Summit Seoul 2015 - 모바일 및 IoT 환경을 위한 AWS 클라우드 플랫폼의 진화 (윤석찬, Markku Lepisto)
 
Lock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS AccountLock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS Account
 
Best Practices for IoT Security in the Cloud
Best Practices for IoT Security in the CloudBest Practices for IoT Security in the Cloud
Best Practices for IoT Security in the Cloud
 
Security Assurance and Governance in AWS (SEC203) | AWS re:Invent 2013
Security Assurance and Governance in AWS (SEC203) | AWS re:Invent 2013Security Assurance and Governance in AWS (SEC203) | AWS re:Invent 2013
Security Assurance and Governance in AWS (SEC203) | AWS re:Invent 2013
 
Srv204 Getting Started with AWS IoT
Srv204 Getting Started with AWS IoTSrv204 Getting Started with AWS IoT
Srv204 Getting Started with AWS IoT
 
Protecting Our Data on AWS
Protecting Our Data on AWSProtecting Our Data on AWS
Protecting Our Data on AWS
 
AWS Enterprise Summit Netherlands - Infosec by Design
AWS Enterprise Summit Netherlands - Infosec by DesignAWS Enterprise Summit Netherlands - Infosec by Design
AWS Enterprise Summit Netherlands - Infosec by Design
 
(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
Innovating IAM Protection for AWS with Dome9 - Session Sponsored by Dome9
Innovating IAM Protection for AWS with Dome9 - Session Sponsored by Dome9Innovating IAM Protection for AWS with Dome9 - Session Sponsored by Dome9
Innovating IAM Protection for AWS with Dome9 - Session Sponsored by Dome9
 
Get Started & Migrate Your Data to AWS (English Session)
Get Started & Migrate Your Data to AWS (English Session)Get Started & Migrate Your Data to AWS (English Session)
Get Started & Migrate Your Data to AWS (English Session)
 
Hands On Lab: Introduction to Microsoft SQL Server in AWS - April 2017 AWS On...
Hands On Lab: Introduction to Microsoft SQL Server in AWS - April 2017 AWS On...Hands On Lab: Introduction to Microsoft SQL Server in AWS - April 2017 AWS On...
Hands On Lab: Introduction to Microsoft SQL Server in AWS - April 2017 AWS On...
 
(SEC304) Architecting for HIPAA Compliance on AWS
(SEC304) Architecting for HIPAA Compliance on AWS(SEC304) Architecting for HIPAA Compliance on AWS
(SEC304) Architecting for HIPAA Compliance on AWS
 
Crypto Options in AWS
Crypto Options in AWSCrypto Options in AWS
Crypto Options in AWS
 

Similaire à Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017

Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
Amazon Web Services
 

Similaire à Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017 (20)

Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
AWS Tech Talks - Data Lake Analytics
AWS Tech Talks - Data Lake AnalyticsAWS Tech Talks - Data Lake Analytics
AWS Tech Talks - Data Lake Analytics
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ran Tessler - Manager, Solutions Architecture, AWS Shahar Bonderman – Head of Architecture, Matomy June 21, 2017 Deploying a Data Lake in AWS
  • 2. What to expect from this session • Data Lake concept • Important capabilities of a Data Lake • Matomy’s Data Lake implementation • Big Data Reference Architecture
  • 4. What is a Data Lake? Data Lake is a new and increasingly popular way to store and analyze massive volumes and heterogenous types of data in a centralized repository.
  • 5. Benefits of a Data Lake – Quick Ingest “How can I collect data quickly from various sources and store it efficiently?” Quickly ingest data without needing to force it into a pre-defined schema.
  • 6. Benefits of a Data Lake – All Data in One Place “Why is the data distributed in many locations? Where is the single source of truth?” Store and analyze all of your data, from all of your sources, in one centralized location.
  • 7. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 8. Benefits of a Data Lake – Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 9. Important Capabilities of a “Data Lake”
  • 10. Important components of a Data Lake Catalog & Search Protect & SecureAccess & User Interface Ingest and Store
  • 11. Ingest and Store Ingest streaming and batch data Support for any type of data at scale Durable Low cost
  • 12. Amazon S3 as your cluster’s persistent data store Amazon S3 Separate compute and storage Resize and shut down Analytics Compute Environments with no data loss Point multiple compute clusters at same data in Amazon S3
  • 13. AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose S3 Transfer Acceleration AWS Storage Gateway Data Ingestion into Amazon S3
  • 14. Use S3 as Data Substrate for Compute EMR Kinesis Redshift DynamoDB RDS Athena Storm Amazon S3 Import/Export Snowball Highly Durable Low Cost Scalable Storage Spark
  • 15. Metadata lake Used for summary statistics and data Classification management Simplified model for data discovery & governance Catalog & Search
  • 16. Catalog & Search Architecture Data Collectors (EC2, ECS) S3 Bucket Metadata Index Amazon DynamoDB Put Object AWS Lambda Object Created, Object Deleted Put Item AWS Lambda Search Index Amazon Elasticsearch Extract Search Fields Update Stream Update Index
  • 17. Exposes the data lake to customers Programmatically query catalogue Expose search API Ensures that entitlements are respected API & User Interface
  • 18. API & UI Architecture Metadata Index Amazon DynamoDB Search Index Amazon Elasticsearch AWS LambdaAPI Gateway Users API User Management Static Website UI
  • 19. Access Control - Authentication & Authorization Data protection - Encryption Logging and Monitoring Protect and Secure
  • 20. Encryption ComplianceSecurity § Identity & Access Management § Bucket policies § Access Control Lists (ACLs) § Query string authentication § Private VPC endpoints to Amazon S3 § SSL endpoints § Server Side Encryption (SSE-S3, SSE-C, SSE- KMS) § Client-side Encryption § Buckets access logs § Lifecycle Management Policies § Versioning & MFA deletes § Certifications – HIPAA, PCI, SOC 1/2/3 etc. Implement the right controls
  • 21. Putting it all together
  • 22. A Data Lake on AWS Catalog & Search Access & User Interface DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis Analytics RDS Central Storage S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Collect & Ingest Protect & Secure Process & Analyze Security Token Service CloudWatch CloudTrail Key Management Service
  • 23. That’s a lot of work! Where do I even start?
  • 26. 400,000 Bid Request per Second 40Billion Events per Day <20ms Response Time
  • 31. Senior Data Engineer We Are Hiring DevOps Engineer
  • 33. Big Data Reference Architecture
  • 34. Streaming COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Kinesis Analytics KCL apps AWS Lambda Amazon Elasticsearch Service Apache Kafka Amazon RDS Amazon DynamoDB Amazon ElastiCache Amazon Kinesis Firehose Amazon Kinesis Streams Amazon SQS Amazon DynamoDB Streams Stream SearchSQLNoSQLCacheFileMessage Stream Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging ETL Amazon EMR Amazon SQS apps Amazon Redshift Amazon EC2 Amazon Athena BatchMessageInteractiveAI Presto Amazon EMR Amazon AI Amazon S3 Apps & Services Analysis&visualizationNotebooksIDEAPI Amazon QuickSight
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Ran Tessler tesslerr@amazon.com