SlideShare a Scribd company logo
1 of 44
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Neeraj Verma – AWS Solutions Architect
Saurav Mahanti – Senior Manager – Information Systems
Dario Rivera – AWS Solutions Architect
November 28, 2016
How to Build a Big Data Analytics
Data Lake
What to expect from this short talk
• Data Lake concept
• Data Lake - Important Capabilities
• AMGEN’s Data Lake initiative
• How to Build a Data Lake in your AWS account
Data Lake Concept
What is a Data Lake?
Data Lake is a new and increasingly
popular way to store and analyze
massive volumes and heterogenous
types of data in a centralized repository.
Benefits of a Data Lake – Quick Ingest
Quickly ingest data
without needing to force it into a
pre-defined schema.
“How can I collect data quickly
from various sources and store
it efficiently?”
Benefits of a Data Lake – All Data in One Place
“Why is the data distributed in
many locations? Where is the
single source of truth ?”
Store and analyze all of your data,
from all of your sources, in one
centralized location.
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
Benefits of a Data Lake – Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
Important Capabilities of a
“Data Lake”
Important components of a Data Lake
Ingest and Store Catalogue & Search Protect & Secure Access & User
Interface
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT INGEST/STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileQueueStream
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors and
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
LoggingIoTApplicationsTransportMessaging
ETL
Many tools to Support the Data Analytics LifeCycle
Ingest and Store
Ingest real time and batch data
Support for any type of data at scale
Durable
Low cost
Use S3 as Data Substrate – Apply to Compute as Needed
EMR Kinesis
Redshift DynamoDB RDS
Data Pipeline
Spark Streaming Storm
Amazon
S3
Import/Export
Snowball
Highly Durable
Low Cost
Scalable Storage
Amazon S3 as your cluster’s persistent data store
Amazon S3
Separate compute and storage
Resize and shut down Analytics
Compute Environments with no data
loss
Point multiple compute clusters at
same data in Amazon S3
AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data Ingestion into Amazon S3
Metadata lake
Used for summary statistics and data
Classification management
Simplified model for data discovery &
governance
Catalogue & Search
Catalogue & Search Architecture
Data Collectors
(EC2,ECS)
AWS LambdaS3 Bucket
AWS Lambda
Metadata Index
(DynamoDB)
AWS Elasticsearch
Service
Extract Search Fields
Put Object
Object Created,
Object Deleted
Put Item
Update
Stream
Update
Index
Access Control - Authentication &
Authorization
Data protection – Encryption
Logging and Monitoring
Protect and Secure
Encryption ComplianceSecurity
 Identity and Access
Management (IAM) policies
 Bucket policies
 Access Control Lists (ACLs)
 Query string authentication
 Private VPC endpoints to
Amazon S3
 SSL endpoints
 Server Side Encryption
(SSE-S3)
 S3 Server Side
Encryption with
provided keys (SSE-C,
SSE-KMS)
 Client-side Encryption
 Buckets access logs
 Lifecycle Management
Policies
 Access Control Lists
(ACLs)
 Versioning & MFA
deletes
 Certifications – HIPAA,
PCI, SOC 1/2/3 etc.
Implement the right controls
Exposes the data lake to customers
Programmatically query catalogue
Expose search API
Ensures that entitlements are respected
API & User Interface
API & UI Architecture
API Gateway
AWS Lambda Metadata IndexUsers
User
Management
Static Website
Putting It All Together
Common Add-On Capability to Data Lake
Backend ETL Organizational
Control Gates for
Dataset Access
Data Correlation
Identification via
Machine Learning
Dataset Updates to
Dependent Compute
Environments
ETL
json
Data Lake is a Journey
There are Multiple implementation Methods
for Building a Data Lake
• Using a Combination of Specialized/Open Source Tools from
Various Providers to build a Customized Solution
• Use various managed services from AWS such as S3, Amazon
ElasticSearch Service, DynamoDB, Cognito, Lambda, etc as a Core
Data Lake Solution
• Using a DataLake-as-a-Service Solution
Data Lake Evolution @ Amgen
SAURAV MAHANTI
Senior Manager - Information Systems
One Data Lake
One code-base
Multiple Locations
(Cloud / On-Premise)
Common scalable
infrastructure
Multiple Business
Functions
Shared Features
A CONCEPTUAL VIEW OF THE DATA LAKE
DATA LAKE PLATFORM
Common Tools and Capabilities
Innovative new tools and
capabilities are reused
across all functions (e.g.
search, data processing &
storage, visualization tools)Functions can manage
their own data, while
contributing to the
common data layer
Business applications are
built to meet specific
information needs, from
simple data access, data
visualization, to complex
statistical/predictive models
Manufacturing Data
BatchGenealogyVisualization
PDSelf-ServiceAnalytics
InstrumentBusDataSearch
Real World Data
FDASentinelAnalytics
EpiProgrammer’sWorkbench
PatientPopulationAnalytics
Commercial Data
USFieldSalesReporting
GlobalForecasting
USMarketingAnalytics
Best
Practices
Awards
Bio IT World
2016
Winner
HIGH LEVEL COMPONENT ARCHITECTURE of the data lake
Procure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service
Data Integration
Tools
IS Owned
Automated
Integration
Data
Catalog
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Access Portal
Centralized Analytics Toolkit
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Curate and Enrichment Layer
Enhances the value and usability of the
data through automation, cataloging
and linking
Data processing & storage layer is
the core of the Enterprise Data Lake
that enables its scale and flexibility
Procure/Ingest
a collection of tools and
technologies to easily
move data to and from
the Data Lake
Reference Data
Linkage
DATA PROCESSING AND STORAGE LAYERProcure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service Data
Integration Tools
IS Owned Automated
Integration
Data Catalog
Centralized Analytics Toolkit
Analytical Toolkit Data Science
Toolkit
BI Reporting
User Specific
Apps
Centralized Access Portal
Centralized Analytics Toolkit
provides different reporting and analytics
solutions for end-users to access the data
on the Data Lake
Curate and Enrichment Layer
provides different reporting and analytics
solutions for end-users to access the data
on the Data Lake
Data processing & storage layer is the
core of the Enterprise Data Lake that
enables its scale and flexibility
Procure/Ingest
a collection of tools and technologies to
easily move data to and from the
Enterprise Data Lake
The combination of Hadoop HDFS
and Amazon S3 provides the right
cost/performance balance with
unlimited scalability while
maintaining security and encryption
at the data file level
Powerful execution
engines like YARN,
Map Reduce and
SPARK bring the
“compute to the
data”
HIVE and Impala provide SQL over
structured data;
HBASE is used for NoSQL/Transactional
jobs
Solr is used for search capabilities over
documents
MarkLogic and Neo4J provide semantic and
graph capabilities
Amazon RedShift
and Amazon EMR
low-cost elastic
computing with the
ability to spin up
clusters for data
processing and
metrics calculations
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Mark
Logic
PROCURE AND INGEST - Pre-built and configurable common components to load any type
of data into the Lake
Procure/Ingest
DataIngestion
Adapter
Managed
Infrastructure
Self-service
Adapter
Application
Accelerator
Curate and Enrichment
Self-Service
Data Integration
Tools
IS Owned
Automated
Integration
Data Catalog
Storage and Processing
User Data
Workspace
Raw Mastered Apps
Elastic
Compute
Capability
Centralized Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Centralized Access Portal
Centralized Analytics Toolkit
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Curate and Enrichment Layer
provides different reporting and
analytics solutions for end-users to
access the data on the Data Lake
Data processing & storage layer is
the core of the Enterprise Data Lake
that enables its scale and flexibilityCloud Data Integration tools like SnapLogic enable data analysts end users to build data
pipelines into the Data Lake using pre-built connectors to various cloud hosted services like Box
and make it easy to move data set between HDFS, S3, sFTP and fileshares
Structured Data Ingestion - A common component for scheduled production data
loads of incremental or full data. It uses Python and native Hadoop tools like Scoop
and HIVE for efficiency and speed.
Real-Time Data Ingestion for real time or streaming data. It
uses the Kafka messaging queue, Spark streaming, Hbase and
Java.
Unstructured Data Pipeline Morphlines Document Pipeline for document ingestion, text
processing and indexing. It uses Morphlines, Lily indexer and HBase.
CURATE AND ENRICHMENT
Enhance the value of the data by linking and cataloging
Reference Data
linkage – Connect
Datasets to
Ontologies and
Vocabularies to get
more relevant and
better results
Data Catalog is an enterprise-
wide metadata catalog that stores,
describes, indexes, and shows
how to access any registered data
asset
Self-Service
Data Integration
Tools
Analytical
Subject Areas
Data
Catalog
Curate and Enrichment
Reference Data
Linkage
Analytical Subject Areas – Build
targeted applications using Data
Integration tools like SnapLogic or
deploy packaged applications or
data marts that transform the data
in the Lake for consumption by
end user tools
PDSelf-ServiceAnalytics
FDASentinelAnalytics
USMarketingAnalytics
CENTRALIZED ANALYTICS TOOLKIT - provides reporting and analytics solutions for end-
users to access the data in the Lake
Analytics Toolkit
Analytical
Toolkit
Data Science
Toolkit
BI Reporting
User Specific
Apps
Access Portal
Data Science toolkit enables
analysts and data scientists use
tools like SAS, R, Jupyter (Python
notebooks) to connect to their
analytics sandbox within the Lake
and submit distributed computing
jobs
Reporting and Business
Intelligence tools like
MicroStrategy and Cognos can be
deployed either directly on the Lake
or on derived Analytical Subject
Areas
Analytics and
Visualization tools are
the most common
methods used to query
and analyze data in the
Lake
Focused applications that target
a specific use-case can be built
using open source products and
deployed to a specific user
community through the portal
Data Lake Portal
provides a user-friendly,
mobile-enabled and
secure portal to all the
end-user applications
Business Impact
Marketed product
defense
Clinical trial speed &
outcomes
Design & Analytic Toolbox
Pre-calculated Cohorts for All Pipeline & Marketed Medicines
Real World Data Platform: Shared capability used by multiple business functions from R&D
and Commercial
RWD Data Lake – Claims, EMR+
Superior processing speed enables simultaneous processing of terabytes of data
Datasets Converted to Common OMOP Data Model
Asia US EU ROW
Descriptive
Rapid RWD Query
Targeted Demand
Therapy areas
Study Design
Advanced Analytics
Spotfire, R, SAS, Achilles
Product value
Benefit:Risk
Evidence-based
research
That’s a lot of Work! –
Where do I even Start?!
Data Lake Solution
Introducing:
Presented by: Dario Rivera – AWS Solutions Architect
Built by: Sean Senior – AWS Solutions Builder
Data Lake Solution
Package of Code –
Deployed via CloudFormation
into your AWS Account
Architecture Overview
Data Lake Server-less Composition
• Amazon S3 (2 buckets)
• Primary data lake content storage
• Static Website Hosting
• Amazon API Gateway (1 RESTful API)
• Amazon DynamoDB (7 Tables)
• Amazon Cognito Your User Pools (1 User Pool)
• AWS Lambda
• 5 microservice functions,
• 1 Amazon API Gateway custom authorizer function,
• 1 Amazon Cognito User Pool event trigger function [Pre sign-up, Post confirmation]
• Amazon Elasticsearch Service (1 cluster)
• AWS IAM (7 policies, 7 roles)
Demo of
AWS Data Lake Solution
http://tinyurl.com/DataLakeSolution
All of this and it costs less than a $1 /
hour to run the Data Lake Solution
* Excluding Storage and Analytics Environment Costs
Data Lake Solution will be available End of Q4.
Available via AWS Answers
https://aws.amazon.com/answers/
Thank you!
Neeraj Verma – rajverma@amazon.com
Saurav Mahanti – smahanti@amgen.com
Dario Rivera – darior@amazon.com

More Related Content

What's hot

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Amazon Web Services
 
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...Amazon Web Services
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudDATAVERSITY
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...Amazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flowconfluent
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 

What's hot (20)

Cloud Migration: A How-To Guide
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...Best Practices for Database Migration to the Cloud: Improve Application Perfo...
Best Practices for Database Migration to the Cloud: Improve Application Perfo...
 
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...
FinOps: A Culture Transformation to Bring DevOps, Finance and the Business To...
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
AWS-Data-Migration-module3
AWS-Data-Migration-module3AWS-Data-Migration-module3
AWS-Data-Migration-module3
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 

Similar to AWS Big Data Analytics Data Lake

Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 

Similar to AWS Big Data Analytics Data Lake (20)

AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

AWS Big Data Analytics Data Lake

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Neeraj Verma – AWS Solutions Architect Saurav Mahanti – Senior Manager – Information Systems Dario Rivera – AWS Solutions Architect November 28, 2016 How to Build a Big Data Analytics Data Lake
  • 2. What to expect from this short talk • Data Lake concept • Data Lake - Important Capabilities • AMGEN’s Data Lake initiative • How to Build a Data Lake in your AWS account
  • 4. What is a Data Lake? Data Lake is a new and increasingly popular way to store and analyze massive volumes and heterogenous types of data in a centralized repository.
  • 5. Benefits of a Data Lake – Quick Ingest Quickly ingest data without needing to force it into a pre-defined schema. “How can I collect data quickly from various sources and store it efficiently?”
  • 6. Benefits of a Data Lake – All Data in One Place “Why is the data distributed in many locations? Where is the single source of truth ?” Store and analyze all of your data, from all of your sources, in one centralized location.
  • 7. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 8. Benefits of a Data Lake – Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  • 9. Important Capabilities of a “Data Lake”
  • 10. Important components of a Data Lake Ingest and Store Catalogue & Search Protect & Secure Access & User Interface
  • 11. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT INGEST/STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileQueueStream Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors and IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI LoggingIoTApplicationsTransportMessaging ETL Many tools to Support the Data Analytics LifeCycle
  • 12. Ingest and Store Ingest real time and batch data Support for any type of data at scale Durable Low cost
  • 13. Use S3 as Data Substrate – Apply to Compute as Needed EMR Kinesis Redshift DynamoDB RDS Data Pipeline Spark Streaming Storm Amazon S3 Import/Export Snowball Highly Durable Low Cost Scalable Storage
  • 14. Amazon S3 as your cluster’s persistent data store Amazon S3 Separate compute and storage Resize and shut down Analytics Compute Environments with no data loss Point multiple compute clusters at same data in Amazon S3
  • 15. AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose S3 Transfer Acceleration AWS Storage Gateway Data Ingestion into Amazon S3
  • 16. Metadata lake Used for summary statistics and data Classification management Simplified model for data discovery & governance Catalogue & Search
  • 17. Catalogue & Search Architecture Data Collectors (EC2,ECS) AWS LambdaS3 Bucket AWS Lambda Metadata Index (DynamoDB) AWS Elasticsearch Service Extract Search Fields Put Object Object Created, Object Deleted Put Item Update Stream Update Index
  • 18. Access Control - Authentication & Authorization Data protection – Encryption Logging and Monitoring Protect and Secure
  • 19. Encryption ComplianceSecurity  Identity and Access Management (IAM) policies  Bucket policies  Access Control Lists (ACLs)  Query string authentication  Private VPC endpoints to Amazon S3  SSL endpoints  Server Side Encryption (SSE-S3)  S3 Server Side Encryption with provided keys (SSE-C, SSE-KMS)  Client-side Encryption  Buckets access logs  Lifecycle Management Policies  Access Control Lists (ACLs)  Versioning & MFA deletes  Certifications – HIPAA, PCI, SOC 1/2/3 etc. Implement the right controls
  • 20. Exposes the data lake to customers Programmatically query catalogue Expose search API Ensures that entitlements are respected API & User Interface
  • 21. API & UI Architecture API Gateway AWS Lambda Metadata IndexUsers User Management Static Website
  • 22. Putting It All Together
  • 23.
  • 24. Common Add-On Capability to Data Lake Backend ETL Organizational Control Gates for Dataset Access Data Correlation Identification via Machine Learning Dataset Updates to Dependent Compute Environments ETL json
  • 25. Data Lake is a Journey There are Multiple implementation Methods for Building a Data Lake • Using a Combination of Specialized/Open Source Tools from Various Providers to build a Customized Solution • Use various managed services from AWS such as S3, Amazon ElasticSearch Service, DynamoDB, Cognito, Lambda, etc as a Core Data Lake Solution • Using a DataLake-as-a-Service Solution
  • 26. Data Lake Evolution @ Amgen SAURAV MAHANTI Senior Manager - Information Systems
  • 27. One Data Lake One code-base Multiple Locations (Cloud / On-Premise) Common scalable infrastructure Multiple Business Functions Shared Features
  • 28. A CONCEPTUAL VIEW OF THE DATA LAKE DATA LAKE PLATFORM Common Tools and Capabilities Innovative new tools and capabilities are reused across all functions (e.g. search, data processing & storage, visualization tools)Functions can manage their own data, while contributing to the common data layer Business applications are built to meet specific information needs, from simple data access, data visualization, to complex statistical/predictive models Manufacturing Data BatchGenealogyVisualization PDSelf-ServiceAnalytics InstrumentBusDataSearch Real World Data FDASentinelAnalytics EpiProgrammer’sWorkbench PatientPopulationAnalytics Commercial Data USFieldSalesReporting GlobalForecasting USMarketingAnalytics Best Practices Awards Bio IT World 2016 Winner
  • 29. HIGH LEVEL COMPONENT ARCHITECTURE of the data lake Procure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer Enhances the value and usability of the data through automation, cataloging and linking Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibility Procure/Ingest a collection of tools and technologies to easily move data to and from the Data Lake Reference Data Linkage
  • 30. DATA PROCESSING AND STORAGE LAYERProcure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Centralized Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Centralized Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer provides different reporting and analytics solutions for end-users to access the data on the Data Lake Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibility Procure/Ingest a collection of tools and technologies to easily move data to and from the Enterprise Data Lake The combination of Hadoop HDFS and Amazon S3 provides the right cost/performance balance with unlimited scalability while maintaining security and encryption at the data file level Powerful execution engines like YARN, Map Reduce and SPARK bring the “compute to the data” HIVE and Impala provide SQL over structured data; HBASE is used for NoSQL/Transactional jobs Solr is used for search capabilities over documents MarkLogic and Neo4J provide semantic and graph capabilities Amazon RedShift and Amazon EMR low-cost elastic computing with the ability to spin up clusters for data processing and metrics calculations Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Mark Logic
  • 31. PROCURE AND INGEST - Pre-built and configurable common components to load any type of data into the Lake Procure/Ingest DataIngestion Adapter Managed Infrastructure Self-service Adapter Application Accelerator Curate and Enrichment Self-Service Data Integration Tools IS Owned Automated Integration Data Catalog Storage and Processing User Data Workspace Raw Mastered Apps Elastic Compute Capability Centralized Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Centralized Access Portal Centralized Analytics Toolkit provides different reporting and analytics solutions for end-users to access the data on the Data Lake Curate and Enrichment Layer provides different reporting and analytics solutions for end-users to access the data on the Data Lake Data processing & storage layer is the core of the Enterprise Data Lake that enables its scale and flexibilityCloud Data Integration tools like SnapLogic enable data analysts end users to build data pipelines into the Data Lake using pre-built connectors to various cloud hosted services like Box and make it easy to move data set between HDFS, S3, sFTP and fileshares Structured Data Ingestion - A common component for scheduled production data loads of incremental or full data. It uses Python and native Hadoop tools like Scoop and HIVE for efficiency and speed. Real-Time Data Ingestion for real time or streaming data. It uses the Kafka messaging queue, Spark streaming, Hbase and Java. Unstructured Data Pipeline Morphlines Document Pipeline for document ingestion, text processing and indexing. It uses Morphlines, Lily indexer and HBase.
  • 32. CURATE AND ENRICHMENT Enhance the value of the data by linking and cataloging Reference Data linkage – Connect Datasets to Ontologies and Vocabularies to get more relevant and better results Data Catalog is an enterprise- wide metadata catalog that stores, describes, indexes, and shows how to access any registered data asset Self-Service Data Integration Tools Analytical Subject Areas Data Catalog Curate and Enrichment Reference Data Linkage Analytical Subject Areas – Build targeted applications using Data Integration tools like SnapLogic or deploy packaged applications or data marts that transform the data in the Lake for consumption by end user tools PDSelf-ServiceAnalytics FDASentinelAnalytics USMarketingAnalytics
  • 33. CENTRALIZED ANALYTICS TOOLKIT - provides reporting and analytics solutions for end- users to access the data in the Lake Analytics Toolkit Analytical Toolkit Data Science Toolkit BI Reporting User Specific Apps Access Portal Data Science toolkit enables analysts and data scientists use tools like SAS, R, Jupyter (Python notebooks) to connect to their analytics sandbox within the Lake and submit distributed computing jobs Reporting and Business Intelligence tools like MicroStrategy and Cognos can be deployed either directly on the Lake or on derived Analytical Subject Areas Analytics and Visualization tools are the most common methods used to query and analyze data in the Lake Focused applications that target a specific use-case can be built using open source products and deployed to a specific user community through the portal Data Lake Portal provides a user-friendly, mobile-enabled and secure portal to all the end-user applications
  • 34. Business Impact Marketed product defense Clinical trial speed & outcomes Design & Analytic Toolbox Pre-calculated Cohorts for All Pipeline & Marketed Medicines Real World Data Platform: Shared capability used by multiple business functions from R&D and Commercial RWD Data Lake – Claims, EMR+ Superior processing speed enables simultaneous processing of terabytes of data Datasets Converted to Common OMOP Data Model Asia US EU ROW Descriptive Rapid RWD Query Targeted Demand Therapy areas Study Design Advanced Analytics Spotfire, R, SAS, Achilles Product value Benefit:Risk Evidence-based research
  • 35. That’s a lot of Work! – Where do I even Start?!
  • 36. Data Lake Solution Introducing: Presented by: Dario Rivera – AWS Solutions Architect Built by: Sean Senior – AWS Solutions Builder
  • 37. Data Lake Solution Package of Code – Deployed via CloudFormation into your AWS Account
  • 39. Data Lake Server-less Composition • Amazon S3 (2 buckets) • Primary data lake content storage • Static Website Hosting • Amazon API Gateway (1 RESTful API) • Amazon DynamoDB (7 Tables) • Amazon Cognito Your User Pools (1 User Pool) • AWS Lambda • 5 microservice functions, • 1 Amazon API Gateway custom authorizer function, • 1 Amazon Cognito User Pool event trigger function [Pre sign-up, Post confirmation] • Amazon Elasticsearch Service (1 cluster) • AWS IAM (7 policies, 7 roles)
  • 40. Demo of AWS Data Lake Solution http://tinyurl.com/DataLakeSolution
  • 41.
  • 42. All of this and it costs less than a $1 / hour to run the Data Lake Solution * Excluding Storage and Analytics Environment Costs
  • 43. Data Lake Solution will be available End of Q4. Available via AWS Answers https://aws.amazon.com/answers/
  • 44. Thank you! Neeraj Verma – rajverma@amazon.com Saurav Mahanti – smahanti@amgen.com Dario Rivera – darior@amazon.com