More Related Content More from Amazon Web Services (20) How to Build a Data Lake | AWS Summit Tel Aviv 20191. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How To Build a Data Lake
Eden Perry
Solutions Architect
Amazon Web Services
D E V 3 0 5
Adir Sharabi
Solutions Architect
Amazon Web Services
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
AWSome Airlines Recap
Introduction to Data Lakes
AWS Data Platform Services and Data Lakes Patterns
Data Lake in Action:
Building a Data Lake for AWSome Airlines
and Developing Dashboards with Amazon QuickSight
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines Recap
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines Operational Dashboard
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines Operational Dashboard
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines High-Level Architecture
FrontendData Microservices Common
Interfaces
Machine Learning Services
Serverless Scheduler Data lake and Analytics
Flights
Resources
31
2
4
5
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What About the Data?
Resources
Departures
IoT Devices
Weather Data
Crews & Teams
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines Business Requirements
1. Establish a robust data pipeline that will capture and store all
the generated data on AWSome Airlines
2. Provide business insights from the collected data, track KPIs
and gain deep visibility in order to optimize the business flows
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introduction to Data Lakes
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A data lake is a centralized repository that
allows you to store all your structured and
unstructured data at any scale
Data Lake Definition
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• All data in one place, a single source of truth
• Support Different Formats - structured/semi-structured/unstructured/raw data
• Supports fast ingestion and consumption
• Schema on read
• Designed for low-cost storage
• Decouples storage and compute
• Supports protection and security rules
Data Lake Main Concepts
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Simplified Data Pipeline
Data Sources Ingest
Process &
Analyze
Consume
Amazon S3
Catalog
Store
Amazon S3
Store
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Multiple Data Sources
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Ingest
Process &
Analyze
Consume
Amazon S3
Catalog
Store
Amazon S3
Store
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
Fully managed, multi-region, multi-master database
Nonrelational database that delivers reliable performance at
any scale
Consistent single-digit millisecond latency
Built-in security, backup and restore, in-memory Caching
Support Streams
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Process &
Analyze
Consume
Ingestion Options
Ingest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Catalog
Store
Amazon S3
Store
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Real-time processing
High throughput; elastic
Easy to use
Integrated with Amazon EMR, Amazon S3, Amazon
Redshift, DynamoDB
Amazon Kinesis
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Data Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
and Amazon Elasticsearch
Amazon Kinesis
Data Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming Data Made Easy
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis: Streaming Data Made Easy
Amazon Kinesis
Data Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Data Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis
Data Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
and Amazon Elasticsearch
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Data Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
and Amazon Elasticsearch
Amazon Kinesis + AWS Lambda
AWS Lambda
• Run your code without
provisioning servers
• Allows to process and
transform records on the fly
+
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Storage Layer
Process &
Analyze
Consume
Catalog
IngestIngest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Amazon S3
Store
Amazon S3
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Secure, highly scalable, durable object storage with
millisecond latency for data access
Store any type of data–web sites, mobile apps, corporate
applications, and IoT sensors, at any scale
Store data in the format you want:
Unstructured (logs, dump files) | semi-structured (JSON, XML) | structured (CSV,
Parquet)
Storage lifecycle integration
Amazon S3-Standard | Amazon S3-Infrequent Access | Amazon Glacier
Amazon S3 is the Base
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Store
Data Discovery and Catalog
Amazon S3
Process &
Analyze
Consume
Catalog
AWS Glue
IngestIngest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Store
Amazon S3
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatically discovers data and stores schema
Data searchable, and available for ETL
Generates customizable code
Schedules and runs your ETL jobs
Serverless
AWS Glue - Serverless Data Catalog and ETL
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Ingest
Consume
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Store
Amazon S3
Process & Analyze
Process and Analyze
Ingest
Amazon Kinesis
AWS Snowball
Amazon MSK
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
Database
Migration Service
Catalog
AWS Glue
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interactive query service to analyze data in
Amazon S3 using standard SQL
No infrastructure to set up or manage and no
data to load
Supports Multiple Data Formats – Define
Schema on Demand
Amazon Athena - Interactive Analysis
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Ingest Consume
Amazon Kinesis
BI Tools
Querying the Data Lake
Database
Migration Service
AWS Snowball
Amazon MSK
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Amazon
Elasticsearch
Process & Analyze
Jupyter
Notebooks
Amazon
API Gateway
Amazon
QuickSight
Catalog
AWS Glue
Store
Amazon S3
Store
Amazon S3
Data sources
Amazon
DynamoDB
Web logs /
cookies
ERP
Connected
devices
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
Supports variety of Data source and Targets
Fully managed and scalable
Super fast and easy to use
Cost-effective
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lake in Action:
Building a Data Lake for AWSome Airlines
and
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSome Airlines Business Requirements
1. Establish a robust data pipeline that will capture and store all
the generated data on AWSome Airlines
2. Provide business insights from the collected data, track KPIs
and gain deep visibility in order to optimize the business flows
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Blocks for AWSome Airlines Data Lake
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What have we learned?
What is and when do we need to build a Data Lake?
AWS Data Lake Building Blocks and Patterns
How to use Amazon QuickSight to visualize and transform data into
business insights
Reach out to your AWS Contact or
to AWS Partners and start building your Data Lake!
37. Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Eden Perry
@edenperr
Adir Sharabi
@adirs
http://bit.ly/2SGp8Ls