SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a Data Lake for Your
Enterprise, ft. Sysco
Greg Nelson
Director, BI, and
Analytics Platforms
Sysco
S T G 3 0 9
Varun Kumar
Sr. Manager,
Platforms
Data & Analytics
Sysco
Laith Al-Saadoon
Sr. Solutions
Architect
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts and chalk talks
Tuesday, November 27
STG311 – Lessons Learned from a Large-Scale Legacy Migration with Sysco
4:45 PM – 5:45 PM | MGM, Level 1, Grand Ballroom 122
Thursday, November 29
STG340 – Customizing Data Lakes to Work for Your Enterprise with Sysco
4:00 PM – 5:00 PM | Venetian, Level 4, Lando 4305
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Sysco Corporation overview and data lake
goals
Sysco’s enterprise data lake architecture
Data lake ingestion and storage patterns on
Amazon S3
Data lake best practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco at a glance
Sysco is the global leader in selling, marketing, and distributing food products
to restaurants, healthcare and educational facilities, lodging establishments,
and other customers who prepare meals away from home. Its family of
products also includes equipment and supplies for the foodservice and
hospitality industries.
To be our customers' most valued
and trusted business partner.
Integrity, Teamwork, Excellence,
Inclusiveness, Responsibility.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco is a growing, global company with a strong presence in a
roughly $400B, large and fragmented foodservice market
Sysco currently operates in the U.S., Canada, Mexico, Costa Rica, Panama, Bahamas, U.K., France, Sweden,
Spain, Belgium, Luxembourg, and Ireland and services customers in an additional 81 countries via the IFG
exporting business.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why the data lake?
Capital and capacity constraints limit analytic use cases
EDW
Wall of Business
Constraint
M&A data
Revenue management
Machine learning
Unstructured data
Social media
Clickstream
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing SEED
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The journey
• Start small
• Pay for what you use
• Fail fast and pivot
Proof of concept –
use case
Sales subject area
Supply chain
All data sources
Impediment – course correct
Optimize and pivot
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outcomes and use cases
Data
science
Decision
support
Operational
management
Analyticcapability
Reactive Proactive
ApproachPast Future
•Note: Size of circles correlates to the scale of the
capability’s utilization across Sysco
Operations data insights
Category management
insights
Revenue management
insights
Customer red alert
Personalized
recommendations
Insights-driven assortment
Formatted
reporting
Parameterized
reporting
Guided
ad hoc
Exploratory
analysis &
self service
AI / ML
Predictive &
prescriptive
analytics
Our growth focus
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transactional
Pricing
Attribution
third-party
syndicated
data
Convergence of analytics within SEED Data Repository
Customer
risk
Personalization
Price migration
& engine
Assortment
optimization
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
DynamoDB
Amazon
Elasticsearch
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
EMR
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Data integrity checks & data quality report
 Advanced analytics
 Scheduled reporting
 Interactive analysis
 Prototyping reports
 Data analysis
 Custom data extracts
Amazon EMR
Amazon
SageMaker
Amazon
DynamoDB
Amazon
Elasticsearch
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling Data as a Service (DaaS) using SDR
API
Gateway
Serverless
Lambda
API
consumer
Amazon
Redshift
Spectrum
S3
bucket
Ingest
using Hive
DynamoDB
Amazon ES Amazon
EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling data science and engineering at scale
Higher conversion rate
enabled by
recommendation engine
Driver compliance by IOT live
streams
Item search and match capability to
identify similar, available, and
profitable items across various
product and business teams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned
Centralized data
repository
• What numbers mean as opposed to
what numbers are
Data as a Service
• Thousands of reports to a publish
subscribe architecture
Rapid experimentation
platform
• AI/ML limited to certain processes
and adopted with caution
Accountable continuous
process
• Manual intervention/reprocessing
for operational issues to operational
accountability
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Road ahead
SPROUT SEEDLING SAPLING TREE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Legacy data architectures exist as isolated
data silos
Hadoop cluster
OLTP
databases
Data warehouse
appliance
S
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
4. Data governance across
data silos
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Characteristics of a data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
“How can I scale up with the
volume of data being generated?”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
Scale storage and compute
independently
“How can I scale up with the
volume of data being generated?”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durable Available High performance
Scalable IntegratedEasy to use
Why Amazon S3 for data lake?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Secure your data lake
Encrypt
SSE-S3 or CSE using AWS KMS
HTTPS endpoints
Authorize and authenticate
IAM policies
Amazon S3 bucket policies
AWS Glue Data Catalog resource-based policies
Amazon S3 VPC endpoints
Audit and comply
AWS CloudTrail and bucket access logs
Lifecyle management policies
Versioning and MFA Delete
Certifications—HIPAA, PCI, SOC 1, 2, 3, etc.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use your data lake
Data
lake
Query data in
place
Load curated
data for
applications
Collect train
and test
datasets for
ML models
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000-
495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
Use lifecycle rules Use Amazon S3 Select and
Amazon Glacier Select
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake compute best practices
Spot Instance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS answers—Data lake on AWS
https://amzn.to/2k4FQX1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Tendances (20)

Sysco Investor Day 2021
Sysco Investor Day 2021Sysco Investor Day 2021
Sysco Investor Day 2021
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data mesh
Data meshData mesh
Data mesh
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 

Similaire à Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018

Similaire à Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018 (20)

BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
 
Using Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdfUsing Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdf
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
SRV309 AWS Purpose-Built Database Strategy: The Right Tool for the Right Job
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a Data Lake for Your Enterprise, ft. Sysco Greg Nelson Director, BI, and Analytics Platforms Sysco S T G 3 0 9 Varun Kumar Sr. Manager, Platforms Data & Analytics Sysco Laith Al-Saadoon Sr. Solutions Architect AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts and chalk talks Tuesday, November 27 STG311 – Lessons Learned from a Large-Scale Legacy Migration with Sysco 4:45 PM – 5:45 PM | MGM, Level 1, Grand Ballroom 122 Thursday, November 29 STG340 – Customizing Data Lakes to Work for Your Enterprise with Sysco 4:00 PM – 5:00 PM | Venetian, Level 4, Lando 4305
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Sysco Corporation overview and data lake goals Sysco’s enterprise data lake architecture Data lake ingestion and storage patterns on Amazon S3 Data lake best practices
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sysco at a glance Sysco is the global leader in selling, marketing, and distributing food products to restaurants, healthcare and educational facilities, lodging establishments, and other customers who prepare meals away from home. Its family of products also includes equipment and supplies for the foodservice and hospitality industries. To be our customers' most valued and trusted business partner. Integrity, Teamwork, Excellence, Inclusiveness, Responsibility.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sysco is a growing, global company with a strong presence in a roughly $400B, large and fragmented foodservice market Sysco currently operates in the U.S., Canada, Mexico, Costa Rica, Panama, Bahamas, U.K., France, Sweden, Spain, Belgium, Luxembourg, and Ireland and services customers in an additional 81 countries via the IFG exporting business.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why the data lake? Capital and capacity constraints limit analytic use cases EDW Wall of Business Constraint M&A data Revenue management Machine learning Unstructured data Social media Clickstream
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing SEED
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The journey • Start small • Pay for what you use • Fail fast and pivot Proof of concept – use case Sales subject area Supply chain All data sources Impediment – course correct Optimize and pivot
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Outcomes and use cases Data science Decision support Operational management Analyticcapability Reactive Proactive ApproachPast Future •Note: Size of circles correlates to the scale of the capability’s utilization across Sysco Operations data insights Category management insights Revenue management insights Customer red alert Personalized recommendations Insights-driven assortment Formatted reporting Parameterized reporting Guided ad hoc Exploratory analysis & self service AI / ML Predictive & prescriptive analytics Our growth focus
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transactional Pricing Attribution third-party syndicated data Convergence of analytics within SEED Data Repository Customer risk Personalization Price migration & engine Assortment optimization
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Amazon EMR
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon DynamoDB Amazon Elasticsearch Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Amazon EMR
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SEED data repository architecture Amazon EMR Amazon Redshift User interface API Amazon Cognito authentication Custom authorizer Data lake microservices AWS Lambda Amazon S3 AWS Glue Crawler Amazon Athena Roles IAM Upload Amazon Redshift Amazon S3 External data sources Gravity to ingest data at scale Data integrity checks & data quality report  Advanced analytics  Scheduled reporting  Interactive analysis  Prototyping reports  Data analysis  Custom data extracts Amazon EMR Amazon SageMaker Amazon DynamoDB Amazon Elasticsearch
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enabling Data as a Service (DaaS) using SDR API Gateway Serverless Lambda API consumer Amazon Redshift Spectrum S3 bucket Ingest using Hive DynamoDB Amazon ES Amazon EMR
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enabling data science and engineering at scale Higher conversion rate enabled by recommendation engine Driver compliance by IOT live streams Item search and match capability to identify similar, available, and profitable items across various product and business teams
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned Centralized data repository • What numbers mean as opposed to what numbers are Data as a Service • Thousands of reports to a publish subscribe architecture Rapid experimentation platform • AI/ML limited to certain processes and adopted with caution Accountable continuous process • Manual intervention/reprocessing for operational issues to operational accountability
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Road ahead SPROUT SEEDLING SAPLING TREE
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Legacy data architectures exist as isolated data silos Hadoop cluster OLTP databases Data warehouse appliance S
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats 3. Real-time processing
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with legacy data architectures 1. Data movement and ETL 2. Data types and formats 3. Real-time processing 4. Data governance across data silos
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reasons to build a data lake
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Characteristics of a data lake
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of a data lake—Separation of storage and compute “How can I scale up with the volume of data being generated?”
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of a data lake—Separation of storage and compute Scale storage and compute independently “How can I scale up with the volume of data being generated?”
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on AWS
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3—Center of the data lake Amazon DynamoDB Amazon Elasticsearch Service AWS AppSync Amazon API Gateway Amazon Cognito AWS KMS AWS IAM AWS CloudTrail Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS Database Migration Service Amazon Athena Amazon EMR AWS Glue Amazon Redshift Amazon DynamoDB Amazon QuickSight Amazon Kinesis Amazon Elasticsearch Service Amazon Neptune Amazon RDS
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Durable Available High performance Scalable IntegratedEasy to use Why Amazon S3 for data lake?
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Secure your data lake Encrypt SSE-S3 or CSE using AWS KMS HTTPS endpoints Authorize and authenticate IAM policies Amazon S3 bucket policies AWS Glue Data Catalog resource-based policies Amazon S3 VPC endpoints Audit and comply AWS CloudTrail and bucket access logs Lifecyle management policies Versioning and MFA Delete Certifications—HIPAA, PCI, SOC 1, 2, 3, etc.
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use your data lake Data lake Query data in place Load curated data for applications Collect train and test datasets for ML models
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake storage best practices s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000- 495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake storage best practices Use lifecycle rules Use Amazon S3 Select and Amazon Glacier Select
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lake compute best practices Spot Instance
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS answers—Data lake on AWS https://amzn.to/2k4FQX1
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary
  • 51. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.