SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Beginner’s Guide to Data Lakes
in AWS
Guillermo A. Fisher
D V C 1 2
Senior Engineering Manager
Handshake
Agenda
Why a Data Lake?
Key Concepts
Data Lakes on AWS
An Example
Best Practices
Related DevChats
DVC10 - Lessons from the backyard: A connected BBQ grill and smoker
DVC06 - Use Neptune to discover where & when events can impact local
businesses
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“The never-ending stream of information
is incredibly useful for businesses, but it
can also be a challenge to draw relevant
insights from such a large data pool.”
Michael Brenner
CEO, Marketing Inside Group
The Data Science Hierarchy of Needs
AI
Learn/Optimize
Aggregate/Label
Explore/Transform
Move/Store
Collect
“You need a solid foundation
for your data before being
effective with AI and machine
learning.”
Monica Rogati
Data Science and AI Advisor
The Data Warehouse Solution
Data Warehouse
Data Mart Data Mart Data Mart
Advantages
Provides precise reporting and BI
Standardized, consistent data
Drawbacks
Limited to pre-determined questions
No low-level data visibility
Considerations for a Modern Solution
Centralized
Data Storage
Store all data
reliably in one
location
Multiple User
Communities
Business
analysts, data
professionals
Schema on
Read
Schema written
at time of
analysis
Storage vs.
Compute
Scale storage
and compute
independently
Data Types &
Formats
Structured,
semi-structured,
unstructured,
raw data
Security
Control access
to the data
Photo by Yifan Liu on Unsplash
A data lake is a centralized repository that allows you
to store all your structured and unstructured data at
any scale. You can store your data as-is, without having
to first structure the data, and run different types of
analytics—from dashboards and visualizations to big
data processing, real-time analytics, and machine
learning to guide better decisions.
Photo by arsalan arianmehr on Unsplash
Onboard relevant data


Metadata should exist in a data catalog


Data governance policies and procedures govern
storage and access


Automated processes manage data flow, data
cleaning, and enforce practices
Centralized Storage
Amazon S3
Scalable object storage
Decouples storage and
compute
99.999999999% durability
Cost effective lifecycle
policies
Data Ingestion
Amazon Kinesis

Data Firehose
Easily and reliably
stream data into data
lakes
AWS Snowball
Migrate large datasets

using secure devices
AWS Storage

Gateway
Gain on-premises
access to AWS cloud
storage
AWS Database

Migration Service
Migrate databases to
AWS quickly and
securely
AWS Direct

Connect
Establish a dedicated
network connection
to AWS
Catalog & Search
Amazon DynamoDB
Fully managed NoSQL
database service
Amazon Elasticsearch

Service
Fully managed Elasticsearch
service
AWS Glue
Store metadata in a
data catalog
Move & Transform
Amazon Kinesis

Data Firehose
Easily and reliably
stream data into data
lakes
AWS Glue
Fully managed ETL
service
AWS Lambda
Event-driven,
serverless computing
Access & User Interfaces
AWS AppSync
Manage and
synchronize mobile
app data in real time
across devices and
users
Amazon Cognito
 Add user sign-up,
sign-in, and access
control to your web
and mobile apps
quickly and easily
Amazon API

Gateway
Fully managed service
for creating, publishing,
maintaining, and
monitoring secure APIs
at scale
Analytics & Serving
Amazon Redshift
Fast, simple, cost-
effective data
warehousing service
Amazon Athena
Serverless,
interactive query
service
Amazon QuickSight
Fast, cloud-powered
business intelligence
service
AWS Glue
Store metadata
in a data catalog
Amazon DynamoDB
Fully managed NoSQL
database service
Amazon EMR
Run & Scale Spark,
Hadoop, and other
Big Data Frameworks
AWS Direct

Connect
Establish a
dedicated network
connection to AWS
Amazon Elasticsearch

Service
Fully managed
Elasticsearch service
Amazon Neptune
Fully managed Graph
database service
Amazon RDS
Distributed
relational
database service
Manage & Secure
AWS KMS
Manage cryptographic
keys and control their
use across services
AWS IAM
Securely manage
access to AWS
services and resources
AWS CloudTrail
Enable governance,
compliance,
operational auditing,
and risk auditing
Amazon CloudWatch
Monitor your AWS
resources and the
applications you run on
AWS in real time
A Data Lake in Days
AWS Lake Formation
Source crawlers, ETL and data
prep, data catalog, security
settings, access control
Identify data sources
Data lake storage
Provide self-
service access
An Example
Amazon S3 AWS Lambda
AWS CloudTrailAWS IAM
AWS Glue
Amazon Athena Amazon QuickSight
Photo by Moritz Mentges on Unsplash
DEMO
Some Best Practices
Encrypt data at-rest and in-transit
Partition data
Compress data
Use columnar file formats
Use lifecycle policies
Automate, automate, automate
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Guillermo A. Fisher
@guillermoandrae
https://bklyn.dev
Please complete the session
survey in the mobile app.
!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Amazon Web Services
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
Amazon Web Services
 
Track 1 Session 5_數位創新 市場資料雲端分析與應用(new).pptx
Track 1 Session 5_數位創新  市場資料雲端分析與應用(new).pptxTrack 1 Session 5_數位創新  市場資料雲端分析與應用(new).pptx
Track 1 Session 5_數位創新 市場資料雲端分析與應用(new).pptx
Amazon Web Services
 

Tendances (20)

Innovation with AWS on : Big Data Analytics
Innovation with AWS on : Big Data AnalyticsInnovation with AWS on : Big Data Analytics
Innovation with AWS on : Big Data Analytics
 
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
 
AWS Application Service Workshop - Serverless Architecture
AWS Application Service Workshop - Serverless ArchitectureAWS Application Service Workshop - Serverless Architecture
AWS Application Service Workshop - Serverless Architecture
 
From weeks to hours big data analytics with tableau and amazon web services ...
From weeks to hours  big data analytics with tableau and amazon web services ...From weeks to hours  big data analytics with tableau and amazon web services ...
From weeks to hours big data analytics with tableau and amazon web services ...
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 
AWS Summit - Atlanta
AWS Summit - Atlanta AWS Summit - Atlanta
AWS Summit - Atlanta
 
Automating Event Driven Security in the AWS Cloud - AWS Summit SG 2017pdf
Automating Event Driven Security in the AWS Cloud - AWS Summit SG 2017pdfAutomating Event Driven Security in the AWS Cloud - AWS Summit SG 2017pdf
Automating Event Driven Security in the AWS Cloud - AWS Summit SG 2017pdf
 
Data Led Migration
Data Led Migration Data Led Migration
Data Led Migration
 
FS-ISAC 2017 Amazon Web Services & Cloud Security
FS-ISAC 2017 Amazon Web Services & Cloud SecurityFS-ISAC 2017 Amazon Web Services & Cloud Security
FS-ISAC 2017 Amazon Web Services & Cloud Security
 
Providing Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of RowsProviding Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of Rows
 
AWS Financial Services Cloud Symposium | Hong Kong - Keynote
AWS Financial Services Cloud Symposium | Hong Kong - KeynoteAWS Financial Services Cloud Symposium | Hong Kong - Keynote
AWS Financial Services Cloud Symposium | Hong Kong - Keynote
 
Track 1 Session 5_數位創新 市場資料雲端分析與應用(new).pptx
Track 1 Session 5_數位創新  市場資料雲端分析與應用(new).pptxTrack 1 Session 5_數位創新  市場資料雲端分析與應用(new).pptx
Track 1 Session 5_數位創新 市場資料雲端分析與應用(new).pptx
 
AWS Initiate Berlin - Security Sessions - Mitigating Cyber Risks.pdf
AWS Initiate Berlin - Security Sessions - Mitigating Cyber Risks.pdfAWS Initiate Berlin - Security Sessions - Mitigating Cyber Risks.pdf
AWS Initiate Berlin - Security Sessions - Mitigating Cyber Risks.pdf
 
Internet of Things (IoT) with Intel
Internet of Things (IoT) with IntelInternet of Things (IoT) with Intel
Internet of Things (IoT) with Intel
 
AWSome Day Singapore Keynote 2015
AWSome Day Singapore Keynote 2015AWSome Day Singapore Keynote 2015
AWSome Day Singapore Keynote 2015
 
Opportunities derived by AI
Opportunities derived by AIOpportunities derived by AI
Opportunities derived by AI
 
Bringing the Internet of Things “IoT” to Government: Enabling Smart Nations
Bringing the Internet of Things “IoT” to Government: Enabling Smart NationsBringing the Internet of Things “IoT” to Government: Enabling Smart Nations
Bringing the Internet of Things “IoT” to Government: Enabling Smart Nations
 
AWS view of Financial Services Industry
AWS view of Financial Services IndustryAWS view of Financial Services Industry
AWS view of Financial Services Industry
 
Innovation with AWS: DevOps & Microservices
Innovation with AWS: DevOps & MicroservicesInnovation with AWS: DevOps & Microservices
Innovation with AWS: DevOps & Microservices
 

Similaire à The Beginner's Guide to Data Lakes in AWS

Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Amazon Web Services
 

Similaire à The Beginner's Guide to Data Lakes in AWS (20)

ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data Strategy
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services and Customer Case Study
Introduction to Cloud Computing with Amazon Web Services and Customer Case StudyIntroduction to Cloud Computing with Amazon Web Services and Customer Case Study
Introduction to Cloud Computing with Amazon Web Services and Customer Case Study
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
 
AWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for GovernmentAWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for Government
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

Plus de Guillermo A. Fisher (7)

Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
 
Demystifying Data Science & Analytics - 757ColorCoded 2019
Demystifying Data Science & Analytics - 757ColorCoded 2019Demystifying Data Science & Analytics - 757ColorCoded 2019
Demystifying Data Science & Analytics - 757ColorCoded 2019
 
Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018Wrestling with Cultural Identity - 757ColorCoded 2018
Wrestling with Cultural Identity - 757ColorCoded 2018
 
Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014Release Management - DE IT Summit 2014
Release Management - DE IT Summit 2014
 
Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015Building Valuable Restful APIs - HRPHP 2015
Building Valuable Restful APIs - HRPHP 2015
 
You're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning TalkYou're Probably Brilliant - Norfolk.js 2017 Lightning Talk
You're Probably Brilliant - Norfolk.js 2017 Lightning Talk
 
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
 

Dernier

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 

Dernier (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

The Beginner's Guide to Data Lakes in AWS

  • 1.
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Beginner’s Guide to Data Lakes in AWS Guillermo A. Fisher D V C 1 2 Senior Engineering Manager Handshake
  • 3. Agenda Why a Data Lake? Key Concepts Data Lakes on AWS An Example Best Practices
  • 4. Related DevChats DVC10 - Lessons from the backyard: A connected BBQ grill and smoker DVC06 - Use Neptune to discover where & when events can impact local businesses
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. “The never-ending stream of information is incredibly useful for businesses, but it can also be a challenge to draw relevant insights from such a large data pool.” Michael Brenner CEO, Marketing Inside Group
  • 6. The Data Science Hierarchy of Needs AI Learn/Optimize Aggregate/Label Explore/Transform Move/Store Collect “You need a solid foundation for your data before being effective with AI and machine learning.” Monica Rogati Data Science and AI Advisor
  • 7. The Data Warehouse Solution Data Warehouse Data Mart Data Mart Data Mart Advantages Provides precise reporting and BI Standardized, consistent data Drawbacks Limited to pre-determined questions No low-level data visibility
  • 8. Considerations for a Modern Solution Centralized Data Storage Store all data reliably in one location Multiple User Communities Business analysts, data professionals Schema on Read Schema written at time of analysis Storage vs. Compute Scale storage and compute independently Data Types & Formats Structured, semi-structured, unstructured, raw data Security Control access to the data
  • 9. Photo by Yifan Liu on Unsplash A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.
  • 10. Photo by arsalan arianmehr on Unsplash Onboard relevant data 
 Metadata should exist in a data catalog 
 Data governance policies and procedures govern storage and access 
 Automated processes manage data flow, data cleaning, and enforce practices
  • 11. Centralized Storage Amazon S3 Scalable object storage Decouples storage and compute 99.999999999% durability Cost effective lifecycle policies
  • 12. Data Ingestion Amazon Kinesis
 Data Firehose Easily and reliably stream data into data lakes AWS Snowball Migrate large datasets
 using secure devices AWS Storage
 Gateway Gain on-premises access to AWS cloud storage AWS Database
 Migration Service Migrate databases to AWS quickly and securely AWS Direct
 Connect Establish a dedicated network connection to AWS
  • 13. Catalog & Search Amazon DynamoDB Fully managed NoSQL database service Amazon Elasticsearch
 Service Fully managed Elasticsearch service AWS Glue Store metadata in a data catalog
  • 14. Move & Transform Amazon Kinesis
 Data Firehose Easily and reliably stream data into data lakes AWS Glue Fully managed ETL service AWS Lambda Event-driven, serverless computing
  • 15. Access & User Interfaces AWS AppSync Manage and synchronize mobile app data in real time across devices and users Amazon Cognito  Add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily Amazon API
 Gateway Fully managed service for creating, publishing, maintaining, and monitoring secure APIs at scale
  • 16. Analytics & Serving Amazon Redshift Fast, simple, cost- effective data warehousing service Amazon Athena Serverless, interactive query service Amazon QuickSight Fast, cloud-powered business intelligence service AWS Glue Store metadata in a data catalog Amazon DynamoDB Fully managed NoSQL database service Amazon EMR Run & Scale Spark, Hadoop, and other Big Data Frameworks AWS Direct
 Connect Establish a dedicated network connection to AWS Amazon Elasticsearch
 Service Fully managed Elasticsearch service Amazon Neptune Fully managed Graph database service Amazon RDS Distributed relational database service
  • 17. Manage & Secure AWS KMS Manage cryptographic keys and control their use across services AWS IAM Securely manage access to AWS services and resources AWS CloudTrail Enable governance, compliance, operational auditing, and risk auditing Amazon CloudWatch Monitor your AWS resources and the applications you run on AWS in real time
  • 18. A Data Lake in Days AWS Lake Formation Source crawlers, ETL and data prep, data catalog, security settings, access control Identify data sources Data lake storage Provide self- service access
  • 19. An Example Amazon S3 AWS Lambda AWS CloudTrailAWS IAM AWS Glue Amazon Athena Amazon QuickSight
  • 21. Some Best Practices Encrypt data at-rest and in-transit Partition data Compress data Use columnar file formats Use lifecycle policies Automate, automate, automate
  • 22. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Guillermo A. Fisher @guillermoandrae https://bklyn.dev
  • 23. Please complete the session survey in the mobile app. ! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.