SlideShare une entreprise Scribd logo
1  sur  53
P U B L I C S E C T O R
S U M M I T
Washingt on DC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
The Zen of DataOps – AWS Lake
Formation and the Data Supply
Chain Pipeline
Stephen Moon
Specialist Solutions Architect
AWS
3 0 1 3 1 8
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Agenda
DataOps
Data Supply Chain Pipeline
AWS Lake Formation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
What is DataOps?
An automated, process-oriented methodology, used by analytic and data
teams, to improve the quality and reduce the cycle time of data analytics.
The DataOps Engineer orchestrates and automates the data analytics
pipeline, promotes features to production and automates quality.
‒ Wikipedia ‒
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DataOps Principles (www.dataopsmanifesto.org)
1. Continually satisfy your customer (Customer Obsession):
Our highest priority is to satisfy the customer through the early and continuous
delivery of valuable analytic insights from a couple of minutes to weeks.
3. Embrace change (Deliver Results):
We welcome evolving customer needs, and in fact, we embrace them to generate
competitive advantage. We believe that the most efficient, effective, and agile
method of communication with customers is face-to-face conversation.
8. Reflect (Learn and Be Curious):
Analytic teams should fine-tune their operational performance by self-reflecting,
at regular intervals, on feedback provided by their customers, themselves, and
operational statistics.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DataOps Principles (www.dataopsmanifesto.org)
12. Disposable environments (Frugality):
We believe it is important to minimize the cost for analytic team members to
experiment by giving them easy to create, isolated, safe, and disposable
technical environments that reflect their production environment.
13. Simplicity (Invent and Simplify):
We believe that continuous attention to technical excellence and good design
enhances agility; likewise simplicity--the art of maximizing the amount of work
not done--is essential.
14. Analytics is manufacturing:
Analytic pipelines are analogous to lean manufacturing lines. We believe a
fundamental concept of DataOps is a focus on process-thinking aimed at
achieving continuous efficiencies in the manufacture of analytic insight.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Data Supply Chain Pipeline Mission Statement
Securely democratize data and deliver it to Communities of
Interest when they need it and how they need it.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Operating Model
Ross, Jeanne W, et al. Enterprise Architecture As Strategy: Creating a Foundation for Business Execution. Harvard Business Review Press, 2006.
https://www.amazon.com/dp/B004OC07EE/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1
Current State Future State
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Architecture & Design Principles
Principle: Minimal Disruption
Statement: Minimize disruption to data producers in how they deliver their data
Principle: Configuration (80/20 Rule)
Statement: Focus on 80% of uses cases that can be satisfied with configurable components
Principle: Right Tool for the Right Job
Statement: Processes drive tooling; not the other way around
Principle: Conscious Decoupling
Statement: The right tool today may not be the right tool tomorrow
Principle: Data Residency
Statement: Users should access the data where IT lives regardless of where THEY live
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Conceptual Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Ingest
! There is no single “tool” for receiving, inspecting, staging, and archiving data
 Focus on cultivating the organization competencies and the processes for
engaging with Data Suppliers
 Build tiger teams who understand the organizational domains of the Data
Suppliers
 Develop templates for Memorandums of Understanding (MoU) and Interface
Control Documents (ICD) to govern the relationships with Data Suppliers
The result will be a small set of common patterns that can be
standardized, automated, and scaled to service hundreds to thousands of
Data Suppliers.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Process
Extract & Load
• Cleanse
 Application of Universal Data Rules and Business Data Rules
 Entities and attributes remain distinct from other instances of the same entities and attributes
Entity Resolution
• Aggregate
 Instantiating two or more occurrences of the same entity as a single instance
 Attributes of aggregated entities remain distinct even though the attributes may be similar or the same
 Disparate IDs of the same entity become an attribute linked to a natural or synthetic UUID/GUID
• Associate – Defining the relationships among entities via the application of Business Relationship Rules
Master Data Management
• Merge – Combining aggregated instances of entity attributes into a single version of the truth
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Enrich
• Assimilate
 Organize entities & attributes for consumption by Communities of Interest
 Structured as Facts, Graphs, Time-series, and/or Matrices
 Driven by questions generated by the Communities of Interest
 CRISP-DM project scope
• Transform – Standardize
• Engineer – Normalize, Interpolate, Extrapolate
• Synthesize
 Obfuscate – Mask identifying data
 Anonymize – Apply privacy models
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Catalog & Profile
• Business Glossary
• Concept Descriptions
• Data Models
• Classifications (Labeling)
• Summary Statistics (supports Discovery and Exploration)
 Maximum
 Minimum
 Mean & Skew
 Mode
 Quartiles
 Standard Deviation
 Correlation Coefficient
 Depth & Breadth
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Interest
Influence
Control
Control
Applications/Systems which are owned and/or
directly managed by the ingesting organization
Influence
Applications/Systems of which the ingesting
organization is an internal or external stakeholder but
does not own or manage the application/system
Interest
Applications/Systems of which the ingesting
organization has a concern for the data but does not
have control or influence over the application/system
Why is this important?
Determines how data is going to be ingested!
Circles of Concern for Ingest
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Logical Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Analytics
Our portfolio
Broad and deep portfolio, purpose-built for builders
QuickSight SageMaker
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams
Data Movement
Business Intelligence & Machine Learning
Data Lake
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
RDS
MySQL, PostgreSQL, MariaDB,
Oracle, SQL Server
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
RDS on VMware
Databases
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
What is a Data Lake?
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111001
0101011100101010000101
1111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Physical Architecture
Databases
AWS DataSync
AWS Database
Migration Service
Amazon Kinesis
Amazon Aurora
Data
Data
Operational
Data Store
Data
Warehouse
Amazon S3
Amazon Aurora
Data
Lake
Amazon EMR
Amazon Athena
Amazon Redshift
Amazon QuickSight
Other Tools
AWS Glue
AWS DMS
Extract Warehouse Data
Load Raw Data
Load
Data Warehouse
Load
Data Warehouse
Amazon S3
Amazon SageMaker
Build Data Marts
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
The Power of Data Lakes
Data Warehouse
• Permanent data store for structured data
• No direct access
Data
Warehouse
Amazon Aurora Amazon S3
Data Lake
Amazon Redshift
Amazon Neptune
Amazon EMR
Apache MXNet on AWS
Data Lake
• Ephemeral/Dynamic data storage for structured data
• Data sets purpose-built based on use cases (right tool)
• Many-to-One ratio of Tools-to-Data
• Only pay for data processing as its needed
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Data Lake Challenges
• Maintaining a data catalog /
enabling self-service access
• Configuring and managing
access controls / Data
governance
• Audit logging
Building data lakes can still take
months
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Typical steps of building a data lake
Make data available
for analytics
Cleanse,
Prepare, &
Catalog Data
Move Data Configure & Enforce
Security & Compliance
Policies
Permissions
Setup Storage
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
How it works
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Key Components
• Blueprints / Data Importers - templates for ETL, metadata (schema) and
partition management
• Enhanced Data Catalog - enable users to record more metadata and tag
Data Catalog objects (i.e. databases, tables, columns)
• ML Transformations – ML algorithms that customers can use to create
their own ML Transforms (i.e. record de-duplication)
• Enhanced Security & Governance - security and governance layer at the
Data Catalog level
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You have
direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Crawlers ML-based
data prep
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Easily load data to your data lake
Logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Crawlers ML-based
data prep
one-shot
incremental
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Blueprints / Data Importers
Blueprints are templates for data ingestion, transformation, metadata
(schema) and partition management. Blueprints help customers to
quickly and easily build and maintain a data lake.
Templates
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
With blueprints
You
1. Point us to the source
2. Tell us the location to load to in
your data lake
3. Specify how often you want to
load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to the
target data format
3. Automatically partition the data
based on the partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of the
above
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Blueprints build on AWS Glue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Enhanced Data Catalog
AWS Lake Formation has an enhanced Data Catalog to enable users to
record more metadata and Tags for Databases, Tables and Columns. All
of the data is searchable.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Search and collaborate across multiple users
Text-based, faceted search
across all metadata
Add attributes like Data
owners, stewards, and other as
table properties
Add data sensitivity level,
column definitions, and others
as column properties
Text-based search and filtering
Query data in Amazon Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
ML Transformations
AWS Lake Formation includes specialized ML-based dataset
transformation algorithms customers can use to create their own ML
Transforms. These include record de-duplication and match finding.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
De-duplicate
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Enhanced Governance Layer
AWS Lake Formation provides a security and governance layer at the Data
Catalog level. Users can grant or revoke permissions to the Data Catalog
objects such as databases, tables and columns for IAM principals (IAM
users and roles). This functionality will be extended to row level access in
subsequent releases.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Security permissions in Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets
and objects
Easily view policies granted to a
particular user
Audit all data access at one place
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Security permissions in Lake Formation
Search and view permissions
granted to a user, role, or group in
one place
Verify permissions granted to a user
Easily revoke policies for a user
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Audit and monitor in real time
See detailed alerts in the console
Download audit logs for further
analytics
Data ingest and catalog notifications
also published to Amazon CloudWatch
events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Admin
Amazon QuickSight
Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Grant table and column-level permissions
User 1
User 2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Lake Formation Security Workflow
User
• IAM Users
• IAM Roles
• Active Directory (Federation)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Example: A data lake in 3 easy steps
1. Use blueprints/data importers to ingest data
2. Grant permissions to securely share data
3. Query the data (Amazon Athena)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Step 1: Use data importers to ingest data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Imported data as table in the data lake
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Step 2: Grant permissions to securely share data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Step 3: Run query in Amazon Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AWS Lake Formation Pricing
No additional charges – Only pay for the underlying services used.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Lake Formation FAQ
Q: When is Lake Formation going to be GA?
A: GA for the service will be Q2 2019.
Q: Will there will be support for data lineage in the enhanced Lake Formation
data catalog?
A: Lineage is on the roadmap for this year. We’ll have a better date after AWS
Lake Formation goes GA.
Q: Will AWS Glue’s existing certifications extend over to AWS Lake Formation?
A: Yes.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Stephen Moon
moonstep@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T

Contenu connexe

Tendances

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

Tendances (20)

AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Breaking down an Industrial IoT reference architecture.pptx
Breaking down an Industrial IoT reference architecture.pptxBreaking down an Industrial IoT reference architecture.pptx
Breaking down an Industrial IoT reference architecture.pptx
 

Similaire à The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline

Similaire à The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline (20)

Data Supply Chain Pipeline: Approach to Curating Data at Scale within the DoD
Data Supply Chain Pipeline: Approach to Curating Data at Scale within the DoDData Supply Chain Pipeline: Approach to Curating Data at Scale within the DoD
Data Supply Chain Pipeline: Approach to Curating Data at Scale within the DoD
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With Data
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
AWS Public Sector Summit 2018, Data Supply Chain Pipeline
AWS Public Sector Summit 2018, Data Supply Chain PipelineAWS Public Sector Summit 2018, Data Supply Chain Pipeline
AWS Public Sector Summit 2018, Data Supply Chain Pipeline
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Organizing for faster innovation - People, process, culture, and technology
Organizing for faster innovation - People, process, culture, and technologyOrganizing for faster innovation - People, process, culture, and technology
Organizing for faster innovation - People, process, culture, and technology
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
ABD303_Developing an Insights Platform—the Sysco Journey from Disparate Syste...
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline

  • 1. P U B L I C S E C T O R S U M M I T Washingt on DC
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline Stephen Moon Specialist Solutions Architect AWS 3 0 1 3 1 8
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Agenda DataOps Data Supply Chain Pipeline AWS Lake Formation
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T What is DataOps? An automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. The DataOps Engineer orchestrates and automates the data analytics pipeline, promotes features to production and automates quality. ‒ Wikipedia ‒
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DataOps Principles (www.dataopsmanifesto.org) 1. Continually satisfy your customer (Customer Obsession): Our highest priority is to satisfy the customer through the early and continuous delivery of valuable analytic insights from a couple of minutes to weeks. 3. Embrace change (Deliver Results): We welcome evolving customer needs, and in fact, we embrace them to generate competitive advantage. We believe that the most efficient, effective, and agile method of communication with customers is face-to-face conversation. 8. Reflect (Learn and Be Curious): Analytic teams should fine-tune their operational performance by self-reflecting, at regular intervals, on feedback provided by their customers, themselves, and operational statistics.
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DataOps Principles (www.dataopsmanifesto.org) 12. Disposable environments (Frugality): We believe it is important to minimize the cost for analytic team members to experiment by giving them easy to create, isolated, safe, and disposable technical environments that reflect their production environment. 13. Simplicity (Invent and Simplify): We believe that continuous attention to technical excellence and good design enhances agility; likewise simplicity--the art of maximizing the amount of work not done--is essential. 14. Analytics is manufacturing: Analytic pipelines are analogous to lean manufacturing lines. We believe a fundamental concept of DataOps is a focus on process-thinking aimed at achieving continuous efficiencies in the manufacture of analytic insight.
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Data Supply Chain Pipeline Mission Statement Securely democratize data and deliver it to Communities of Interest when they need it and how they need it.
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Operating Model Ross, Jeanne W, et al. Enterprise Architecture As Strategy: Creating a Foundation for Business Execution. Harvard Business Review Press, 2006. https://www.amazon.com/dp/B004OC07EE/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1 Current State Future State
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Architecture & Design Principles Principle: Minimal Disruption Statement: Minimize disruption to data producers in how they deliver their data Principle: Configuration (80/20 Rule) Statement: Focus on 80% of uses cases that can be satisfied with configurable components Principle: Right Tool for the Right Job Statement: Processes drive tooling; not the other way around Principle: Conscious Decoupling Statement: The right tool today may not be the right tool tomorrow Principle: Data Residency Statement: Users should access the data where IT lives regardless of where THEY live
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Conceptual Architecture
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Ingest ! There is no single “tool” for receiving, inspecting, staging, and archiving data  Focus on cultivating the organization competencies and the processes for engaging with Data Suppliers  Build tiger teams who understand the organizational domains of the Data Suppliers  Develop templates for Memorandums of Understanding (MoU) and Interface Control Documents (ICD) to govern the relationships with Data Suppliers The result will be a small set of common patterns that can be standardized, automated, and scaled to service hundreds to thousands of Data Suppliers.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Process Extract & Load • Cleanse  Application of Universal Data Rules and Business Data Rules  Entities and attributes remain distinct from other instances of the same entities and attributes Entity Resolution • Aggregate  Instantiating two or more occurrences of the same entity as a single instance  Attributes of aggregated entities remain distinct even though the attributes may be similar or the same  Disparate IDs of the same entity become an attribute linked to a natural or synthetic UUID/GUID • Associate – Defining the relationships among entities via the application of Business Relationship Rules Master Data Management • Merge – Combining aggregated instances of entity attributes into a single version of the truth
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Enrich • Assimilate  Organize entities & attributes for consumption by Communities of Interest  Structured as Facts, Graphs, Time-series, and/or Matrices  Driven by questions generated by the Communities of Interest  CRISP-DM project scope • Transform – Standardize • Engineer – Normalize, Interpolate, Extrapolate • Synthesize  Obfuscate – Mask identifying data  Anonymize – Apply privacy models
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Catalog & Profile • Business Glossary • Concept Descriptions • Data Models • Classifications (Labeling) • Summary Statistics (supports Discovery and Exploration)  Maximum  Minimum  Mean & Skew  Mode  Quartiles  Standard Deviation  Correlation Coefficient  Depth & Breadth
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Interest Influence Control Control Applications/Systems which are owned and/or directly managed by the ingesting organization Influence Applications/Systems of which the ingesting organization is an internal or external stakeholder but does not own or manage the application/system Interest Applications/Systems of which the ingesting organization has a concern for the data but does not have control or influence over the application/system Why is this important? Determines how data is going to be ingested! Circles of Concern for Ingest
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Logical Architecture
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Analytics Our portfolio Broad and deep portfolio, purpose-built for builders QuickSight SageMaker S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams Data Movement Business Intelligence & Machine Learning Data Lake Redshift Data warehousing EMR Hadoop + Spark Kinesis Data Analytics Real time Elasticsearch Service Operational Analytics Athena Interactive analytics RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server Aurora MySQL, PostgreSQL DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database RDS on VMware Databases
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T What is a Data Lake? A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111001 0101011100101010000101 1111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Physical Architecture Databases AWS DataSync AWS Database Migration Service Amazon Kinesis Amazon Aurora Data Data Operational Data Store Data Warehouse Amazon S3 Amazon Aurora Data Lake Amazon EMR Amazon Athena Amazon Redshift Amazon QuickSight Other Tools AWS Glue AWS DMS Extract Warehouse Data Load Raw Data Load Data Warehouse Load Data Warehouse Amazon S3 Amazon SageMaker Build Data Marts
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T The Power of Data Lakes Data Warehouse • Permanent data store for structured data • No direct access Data Warehouse Amazon Aurora Amazon S3 Data Lake Amazon Redshift Amazon Neptune Amazon EMR Apache MXNet on AWS Data Lake • Ephemeral/Dynamic data storage for structured data • Data sets purpose-built based on use cases (right tool) • Many-to-One ratio of Tools-to-Data • Only pay for data processing as its needed
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Data Lake Challenges • Maintaining a data catalog / enabling self-service access • Configuring and managing access controls / Data governance • Audit logging Building data lakes can still take months
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Typical steps of building a data lake Make data available for analytics Cleanse, Prepare, & Catalog Data Move Data Configure & Enforce Security & Compliance Policies Permissions Setup Storage
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T How it works
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Key Components • Blueprints / Data Importers - templates for ETL, metadata (schema) and partition management • Enhanced Data Catalog - enable users to record more metadata and tag Data Catalog objects (i.e. databases, tables, columns) • ML Transformations – ML algorithms that customers can use to create their own ML Transforms (i.e. record de-duplication) • Enhanced Security & Governance - security and governance layer at the Data Catalog level
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Crawlers ML-based data prep
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Easily load data to your data lake Logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Crawlers ML-based data prep one-shot incremental
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Blueprints / Data Importers Blueprints are templates for data ingestion, transformation, metadata (schema) and partition management. Blueprints help customers to quickly and easily build and maintain a data lake. Templates
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Blueprints build on AWS Glue
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Enhanced Data Catalog AWS Lake Formation has an enhanced Data Catalog to enable users to record more metadata and Tags for Databases, Tables and Columns. All of the data is searchable.
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T ML Transformations AWS Lake Formation includes specialized ML-based dataset transformation algorithms customers can use to create their own ML Transforms. These include record de-duplication and match finding.
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T De-duplicate
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Enhanced Governance Layer AWS Lake Formation provides a security and governance layer at the Data Catalog level. Users can grant or revoke permissions to the Data Catalog objects such as databases, tables and columns for IAM principals (IAM users and roles). This functionality will be extended to row level access in subsequent releases.
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Audit and monitor in real time See detailed alerts in the console Download audit logs for further analytics Data ingest and catalog notifications also published to Amazon CloudWatch events
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Admin Amazon QuickSight Amazon SageMaker
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Grant table and column-level permissions User 1 User 2
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Lake Formation Security Workflow User • IAM Users • IAM Roles • Active Directory (Federation)
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Example: A data lake in 3 easy steps 1. Use blueprints/data importers to ingest data 2. Grant permissions to securely share data 3. Query the data (Amazon Athena)
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Step 1: Use data importers to ingest data
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Imported data as table in the data lake
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Step 2: Grant permissions to securely share data
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Step 3: Run query in Amazon Athena
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.
  • 51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Lake Formation FAQ Q: When is Lake Formation going to be GA? A: GA for the service will be Q2 2019. Q: Will there will be support for data lineage in the enhanced Lake Formation data catalog? A: Lineage is on the roadmap for this year. We’ll have a better date after AWS Lake Formation goes GA. Q: Will AWS Glue’s existing certifications extend over to AWS Lake Formation? A: Yes.
  • 52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Stephen Moon moonstep@amazon.com
  • 53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T