Contenu connexe Similaire à Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 2018 (20) Plus de Amazon Web Services (20) Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a Data Lake for Your
Enterprise, ft. Sysco
Greg Nelson
Director, BI, and
Analytics Platforms
Sysco
S T G 3 0 9
Varun Kumar
Sr. Manager,
Platforms
Data & Analytics
Sysco
Laith Al-Saadoon
Sr. Solutions
Architect
AWS
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts and chalk talks
Tuesday, November 27
STG311 – Lessons Learned from a Large-Scale Legacy Migration with Sysco
4:45 PM – 5:45 PM | MGM, Level 1, Grand Ballroom 122
Thursday, November 29
STG340 – Customizing Data Lakes to Work for Your Enterprise with Sysco
4:00 PM – 5:00 PM | Venetian, Level 4, Lando 4305
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Sysco Corporation overview and data lake
goals
Sysco’s enterprise data lake architecture
Data lake ingestion and storage patterns on
Amazon S3
Data lake best practices
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco at a glance
Sysco is the global leader in selling, marketing, and distributing food products
to restaurants, healthcare and educational facilities, lodging establishments,
and other customers who prepare meals away from home. Its family of
products also includes equipment and supplies for the foodservice and
hospitality industries.
To be our customers' most valued
and trusted business partner.
Integrity, Teamwork, Excellence,
Inclusiveness, Responsibility.
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco is a growing, global company with a strong presence in a
roughly $400B, large and fragmented foodservice market
Sysco currently operates in the U.S., Canada, Mexico, Costa Rica, Panama, Bahamas, U.K., France, Sweden,
Spain, Belgium, Luxembourg, and Ireland and services customers in an additional 81 countries via the IFG
exporting business.
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why the data lake?
Capital and capacity constraints limit analytic use cases
EDW
Wall of Business
Constraint
M&A data
Revenue management
Machine learning
Unstructured data
Social media
Clickstream
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing SEED
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The journey
• Start small
• Pay for what you use
• Fail fast and pivot
Proof of concept –
use case
Sales subject area
Supply chain
All data sources
Impediment – course correct
Optimize and pivot
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Outcomes and use cases
Data
science
Decision
support
Operational
management
Analyticcapability
Reactive Proactive
ApproachPast Future
•Note: Size of circles correlates to the scale of the
capability’s utilization across Sysco
Operations data insights
Category management
insights
Revenue management
insights
Customer red alert
Personalized
recommendations
Insights-driven assortment
Formatted
reporting
Parameterized
reporting
Guided
ad hoc
Exploratory
analysis &
self service
AI / ML
Predictive &
prescriptive
analytics
Our growth focus
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transactional
Pricing
Attribution
third-party
syndicated
data
Convergence of analytics within SEED Data Repository
Customer
risk
Personalization
Price migration
& engine
Assortment
optimization
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
DynamoDB
Amazon
Elasticsearch
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Amazon EMR
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SEED data repository architecture
Amazon
EMR
Amazon
Redshift
User
interface
API
Amazon
Cognito
authentication
Custom
authorizer
Data lake
microservices
AWS Lambda
Amazon S3
AWS Glue Crawler Amazon
Athena
Roles
IAM
Upload
Amazon
Redshift
Amazon S3 External data
sources
Gravity to
ingest data
at scale
Data integrity checks & data quality report
Advanced analytics
Scheduled reporting
Interactive analysis
Prototyping reports
Data analysis
Custom data extracts
Amazon EMR
Amazon
SageMaker
Amazon
DynamoDB
Amazon
Elasticsearch
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling Data as a Service (DaaS) using SDR
API
Gateway
Serverless
Lambda
API
consumer
Amazon
Redshift
Spectrum
S3
bucket
Ingest
using Hive
DynamoDB
Amazon ES Amazon
EMR
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling data science and engineering at scale
Higher conversion rate
enabled by
recommendation engine
Driver compliance by IOT live
streams
Item search and match capability to
identify similar, available, and
profitable items across various
product and business teams
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned
Centralized data
repository
• What numbers mean as opposed to
what numbers are
Data as a Service
• Thousands of reports to a publish
subscribe architecture
Rapid experimentation
platform
• AI/ML limited to certain processes
and adopted with caution
Accountable continuous
process
• Manual intervention/reprocessing
for operational issues to operational
accountability
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Road ahead
SPROUT SEEDLING SAPLING TREE
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Legacy data architectures exist as isolated
data silos
Hadoop cluster
OLTP
databases
Data warehouse
appliance
S
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with legacy
data architectures
1. Data movement and ETL
2. Data types and formats
3. Real-time processing
4. Data governance across
data silos
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reasons to build a data lake
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Characteristics of a data lake
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
“How can I scale up with the
volume of data being generated?”
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of a data lake—Separation of storage and
compute
Scale storage and compute
independently
“How can I scale up with the
volume of data being generated?”
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on AWS
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3—Center of the data lake
Amazon
DynamoDB
Amazon Elasticsearch
Service
AWS
AppSync
Amazon
API Gateway
Amazon
Cognito
AWS
KMS
AWS
IAM
AWS
CloudTrail
Amazon
CloudWatch
AWS
Snowball
AWS Storage
Gateway
Amazon
Kinesis Data
Firehose
AWS Direct
Connect
AWS Database
Migration
Service
Amazon
Athena
Amazon
EMR
AWS
Glue
Amazon
Redshift
Amazon
DynamoDB
Amazon
QuickSight
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon
RDS
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durable Available High performance
Scalable IntegratedEasy to use
Why Amazon S3 for data lake?
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Secure your data lake
Encrypt
SSE-S3 or CSE using AWS KMS
HTTPS endpoints
Authorize and authenticate
IAM policies
Amazon S3 bucket policies
AWS Glue Data Catalog resource-based policies
Amazon S3 VPC endpoints
Audit and comply
AWS CloudTrail and bucket access logs
Lifecyle management policies
Versioning and MFA Delete
Certifications—HIPAA, PCI, SOC 1, 2, 3, etc.
44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use your data lake
Data
lake
Query data in
place
Load curated
data for
applications
Collect train
and test
datasets for
ML models
45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000-
495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake storage best practices
Use lifecycle rules Use Amazon S3 Select and
Amazon Glacier Select
48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake compute best practices
Spot Instance
49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS answers—Data lake on AWS
https://amzn.to/2k4FQX1
50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.