SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Luis Caro, PhD
Solution Architect - AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of big data analytics
Batch/
interactive
Stream
processing
Machine
learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Delivery of big data services
Virtualized Managed
services
Serverless/
clusterless/
containerized
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Plethora of tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Big data challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architectural principles
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage managed and serverless services
• Scalable/elastic, available, reliable, secure, no/low admin
Use event-journal design patterns
• Immutable datasets (data lake), materialized views
Be cost-conscious
• Big data ≠ big cost
Machine learning (ML) enable your applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect Store Process/
Analyze
Consume
Time to answer (latency)
Throughput
Cost
Simplify big data processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
EventsData streams
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Datatransport&logging
Import/expo
rt
Files/objects
Log files
Media files
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Transactions
Data structures
Database records
Type of data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Events
Files
Transactions
Collect
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
Data streams
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Datatransport&logging
Import/expo
rt
Log files
Media files
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Data structures
Database records
Type of data Store
NoSQL
In-memory
SQL
File/object
store
Stream
storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Datatransport&logging
Import/expo
rt
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Store
NoSQL
In-memory
SQL
File/object
store
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Apache Kafka
Apache Kafka
• High throughput distributed
streaming platform
Amazon Kinesis Data
Streams
• Managed stream storage
Amazon Kinesis Data
Firehose
• Managed data delivery
Stream storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which stream/message storage should I use?
Amazon
Kinesis Data
Streams
Amazon
Kinesis Data
Firehose
Apache Kafka (on
Amazon Elastic
Compute Cloud
[Amazon EC2])
Amazon Simple
Queue Service
(Amazon SQS)
(Standard)
Amazon SQS
(FIFO)
AWS managed Yes Yes No Yes Yes
Guaranteed ordering Yes No Yes No Yes
Delivery (deduping) At least once At least once At least/At
most/exactly once
At least once Exactly once
Data retention period 7 days N/A Configurable 14 days 14 days
Availability Three AZ Three AZ Configurable Three AZ Three AZ
Scale/throughput No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
300 TPS /
queue
Multiple consumers Yes No Yes No No
Row/object size 1 MB Destination
row/object size
Configurable 256 KB 256 KB
Cost Low Low Low (+admin) Low-medium Low-medium
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect Store
File
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Hot
Stream
Mobile apps
Web apps
Devices
Sensors
IoT platforms
AWS IoT
Data centers AWS Direct
Connect
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
FILES
STREAMS
Datatransport&loggingIoTApplications
Import/expo
rt
NoSQL
In-memory
SQL
Amazon S3
Amazon Simple Storage Service
(Amazon S3)
• Managed object storage service built
to store and retrieve any amount of
data
File/object storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use Amazon S3 as the storage for your data lake
• Natively supported by big data frameworks (Spark, Hive, Presto, and others)
• Decouple storage and compute
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Amazon EMR clusters with Amazon EC2 Spot
Instances
• Multiple & heterogeneous analysis clusters and services can use the
same data
• Designed for 99.999999999% durability
• No need to pay for data replication within a region
• Secure – SSL, client/server-side encryption at rest
• Low cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What about HDFS & data tiering?
• Use HDFS/local for working data sets
• (for example, iterative read on the same data sets)
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard–IA for less
frequently accessed data
• Use Amazon Glacier for archiving cold
data
Use S3 analytics to optimize tiering
strategy
S3 analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which data store should I use?
Ask yourself some questions
What is the data structure?
How will the data be accessed?
What is the temperature of the data?
What will the solution cost?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which data store should I use?
Access patterns What to use?
Put/get (key, value) In-memory, NoSQL
Simple relationships → 1:N, M:N NoSQL
Multi-table joins, transaction, SQL SQL
Faceting, search Search
Graph traversal GraphDB
Data structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, search
Key/value In-memory, NoSQL
Graph GraphDB
How will the data be accessed?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
In-memory
SQL
Request rate
High Low
Cost/GB
High Low
Latency
Low High
Data volume
Low High
Amazon
Glacier
Structure
NoSQL
Hot data Warm data Cold data
Low
High
S3
Search
Graph
What is the temperature of the data?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process/
analyze
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process/analyze
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Interactive & batch analytics
Amazon Elasticsearch Service
Managed service for ElasticSearch
Amazon Redshift & Amazon Redshift Spectrum
Managed data warehouse
Spectrum enables querying S3
Amazon Athena
Serverless interactive query service
Amazon EMR
Managed Hadoop framework for running Apache Spark,
Flink, Presto, Tez, Hive, Pig, HBase, and others
Presto
Amazon
EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream/real-time analytics
Spark Streaming on Amazon EMR
Amazon Kinesis Data Analytics
• Managed service for running SQL on
streaming data
Amazon KCL
• Amazon Kinesis Client Library
AWS Lambda
• Run code serverless (without
provisioning or managing servers)
• Services such as S3 can publish events
to AWS Lambda
• AWS Lambda can pool event from a
Kinesis
Process/analyze
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which analytics should I use?
Process/analyze
Batch
Takes minutes to hours
Example: Daily/weekly/monthly reports
Amazon EMR (MapReduce, Hive, Pig, Spark)
Interactive
Takes seconds
Example: Self-service dashboards
Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)
Stream
Takes milliseconds to seconds
Example: Fraud alerts, one-minute metrics
Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics,
Amazon KCL, AWS Lambda, and others
Predictive
Takes milliseconds (real-time) to minutes (batch)
Example: Fraud detection, forecasting demand, speech
recognition
Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon
Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon
Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe)
FastSlow
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Presto
Amazon
EMR
Predictive
AmazonML
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
Fast
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which stream processing technology should I use?
Amazon EMR
(Spark
Streaming)
KCL application Amazon Kinesis
analytics
AWS Lambda
Managed service Yes No (EC2 + Auto
Scaling)
Yes Yes
Serverless No No Yes Yes
Scale / throughput No limits /
~ nodes
No limits /
~ nodes
No Limits /
automatic
No limits /
automatic
Availability Single AZ Multi-AZ Multi-AZ Multi-AZ
Programming
languages
Java, Python,
Scala
Java, others
through
MultiLangDaemon
ANSI SQL with
extensions
Node.js, Java, Python, .Net Core
Sliding window
functions
Build-in App needs to
implement
Built-in No
Reliability KCL and Spark
checkpoints
Managed by
Amazon KCL
Managed by
Amazon Kinesis
Data Analytics
Managed by AWS Lambda
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which Analytics Tool Should I Use?
Amazon Redshift Amazon Redshift
Spectrum
Amazon Athena Amazon EMR
Presto Spark Hive
Use case Optimized for data
warehousing
Query S3 data from
Amazon Redshift
Interactive queries
over S3 data
Interactive
query
General
purpose
Batch
Scale/throughput ~Nodes ~Nodes Automatic ~ Nodes
Managed service Yes Yes Yes, Serverless Yes
Storage Local storage Amazon S3 Amazon S3 Amazon S3, HDFS
Optimization Columnar storage,
data compression,
and zone maps
AVRO, PARQUET
TEXT, SEQ
RCFILE, ORC, etc.
AVRO, PARQUET
TEXT, SEQ
RCFILE, ORC, etc.
Framework dependent
Metadata Amazon Redshift
catalog
AWS Glue catalog AWS Glue catalog AWS Glue catalog or
Hive Meta-store
Auth/access controls IAM, users, groups,
and access controls
IAM, users, groups,
and access controls
IAM IAM, LDAP, & Kerberos
UDF support Yes (Scalar) Yes (Scalar) No Yes
Slow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Raw data stored in data lake
Preparation
N orm ali zed
P art i t i oned
C om pressed
St o rage o p t i m i zed
ELT/ETL
ELT/ETL: Preparing raw data for consumption
Data lake
on AWS
Raw immutable
DataSets
Curated
DataSets
Data catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why ELT?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which ELT/ETL tool should I use?
AWS Glue ETL AWS Data
Pipeline
Amazon Data
Migration Service
(Amazon DMS)
Amazon EMR Apache NiFi Partner
solution
Use case Serverless ETL Data
workflow
service
Migrate databases
(and to/from data
lake)
Customize
developed
Hadoop/Spark
ETL
Automate the
flow of data
between
systems
Rich partner
ecosystem for
ETL
Scale/throughput ~DPUs ~Nodes
through
EMR cluster
EC2 instance type ~Nodes Self managed Self manager or
through partner
Managed service Clusterless Managed Managed EC2 on
your behalf
Managed EC2
on your behalf
Self managed
on Amazon
EMR or
Marketplace
Self manager or
through partner
Data sources S3, RDBMS,
Amazon
RedShift,
DynamoDB
S3, JDBC,
DynamoDB,
custom
RDBMS, data
warehouses,
Amazon S3*
(*limited)
Various
Hadoop/Spark
managed
Various through
rich processor
framework
Various
Skills needed Wizard for
simple
mapping, code
snippets for
advanced ETL
Wizard and
code
snippets
Wizard and
drag/drop
Hadoop/Spark
coding
NiFi processors
and some
coding
Self manager or
through partner
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consume
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect Store ConsumeProcess/analyzeETL
Amazon
QuickSight
Analysis&visualizationDatasceince
AI Apps
Apps
SW engineers/
business
users
Data scientists/
data engineers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Putting it all together
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect Store ConsumeProcess/analyze
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Hot
SQLNoSQLCacheFileStream
Mobile apps
Web apps
Devices
Sensors
IoT platforms
AWS IoT
Data centers AWS Direct
Connect
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
FILES
STREAMS
Amazon
QuickSight
Analysis&visualizationDatasceince
DataTransport&LoggingIoTApplications
Amazon S3
Import/expo
rt
AI Apps
Apps
Model
Train/
Eval
Models
Deploy
ETL
FastSlow
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Presto
Amazon
EMR
Predictive
AmazonML
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
Fast
Amazon
DynamoDB
Amazon RDS
Amazon Aurora
Amazon
Neptune
Amazon ElastiCache
Amazon DAX
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Design patterns
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interactive
&
batch
analytics
Process
Store
Batch
Interactive
Amazon EMR
Hive
Pig
Spark
Amazon
AI
Batch prediction
Real-time prediction
Amazon S3
Files
Amazon
Kinesis Data
Firehose
Amazon Kinesis
Data Analytics
Amazon Redshift
Amazon ES
Consume
Amazon EMR
Presto
Spark
Amazon Athena
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interactive
&
batch
Amazon S3
Amazon Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
AWS Lambda
Spark Streaming
on Amazon EMR
Applications
Amazon
Kinesis
App state
or
materialized
view
KCL
Amazon
AI
Real-time
Amazon
DynamoDB
Amazon
RDS
Change data capture
or export
Transactions
Stream
Files
Data lake
Amazon Kinesis
Data Analytics
Amazon Athena
Amazon Kinesis
Data Firehose
Amazon ES
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What about metadata?
• AWS Glue catalog
• Hive Metastore compliant
• Crawlers - Detect new data, schema, partitions
• Search - Metadata discovery
• Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum
compatible
• Hive Metastore (Presto, Spark, Hive, Pig)
• Can be hosted on Amazon RDS
Data
catalog Metastore Amazon RDS
AWS Glue
catalog AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed and serverless services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable logs, data lake, materialized views
Be cost-conscious
• Big data ≠ big cost
AI/ML enable your applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...
Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...
Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...Amazon Web Services
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage GatewayAmazon Web Services
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...Amazon Web Services
 
AWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAmazon Web Services
 
Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:Amazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 
How to Bring Microsoft Apps to AWS - AWS Online Tech Talks
How to Bring Microsoft Apps to AWS - AWS Online Tech TalksHow to Bring Microsoft Apps to AWS - AWS Online Tech Talks
How to Bring Microsoft Apps to AWS - AWS Online Tech TalksAmazon Web Services
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksAmazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
 
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudHumans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudAmazon Web Services
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational DatabasesAmazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications Amazon Web Services
 

Tendances (20)

Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...
Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...
Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1) - AWS re:I...
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
Best of AWS re:Invent 2017
Best of AWS re:Invent 2017Best of AWS re:Invent 2017
Best of AWS re:Invent 2017
 
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 
Amazon Aurora_Deep Dive
Amazon Aurora_Deep DiveAmazon Aurora_Deep Dive
Amazon Aurora_Deep Dive
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...
Hybrid Cloud Storage for Recovery & Migration with AWS Storage Gateway (STG30...
 
AWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best Practices
 
Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:Migrating to Amazon RDS with Database Migration Service:
Migrating to Amazon RDS with Database Migration Service:
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
How to Bring Microsoft Apps to AWS - AWS Online Tech Talks
How to Bring Microsoft Apps to AWS - AWS Online Tech TalksHow to Bring Microsoft Apps to AWS - AWS Online Tech Talks
How to Bring Microsoft Apps to AWS - AWS Online Tech Talks
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
 
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your CloudHumans and Data Don't Mix- Best Practices to Secure Your Cloud
Humans and Data Don't Mix- Best Practices to Secure Your Cloud
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications
 

Similaire à Builders' Day - Building Data Lakes for Analytics On AWS LC

Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFData Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 

Similaire à Builders' Day - Building Data Lakes for Analytics On AWS LC (20)

Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Data Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFData Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SF
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 

Plus de Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

Plus de Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Dernier

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Builders' Day - Building Data Lakes for Analytics On AWS LC

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Luis Caro, PhD Solution Architect - AWS
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Types of big data analytics Batch/ interactive Stream processing Machine learning
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Delivery of big data services Virtualized Managed services Serverless/ clusterless/ containerized
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Plethora of tools
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Big data challenges Why? How? What tools should I use? Is there a reference architecture?
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architectural principles Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage managed and serverless services • Scalable/elastic, available, reliable, secure, no/low admin Use event-journal design patterns • Immutable datasets (data lake), materialized views Be cost-conscious • Big data ≠ big cost Machine learning (ML) enable your applications
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Store Process/ Analyze Consume Time to answer (latency) Throughput Cost Simplify big data processing
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Devices Sensors IoT platforms AWS IoT STREAMS IoT EventsData streams Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Datatransport&logging Import/expo rt Files/objects Log files Media files Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications Transactions Data structures Database records Type of data
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Events Files Transactions Collect Devices Sensors IoT platforms AWS IoT STREAMS IoT Data streams Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Datatransport&logging Import/expo rt Log files Media files Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications Data structures Database records Type of data Store NoSQL In-memory SQL File/object store Stream storage
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Devices Sensors IoT platforms AWS IoT STREAMS IoT Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Datatransport&logging Import/expo rt Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications Store NoSQL In-memory SQL File/object store Amazon Kinesis Firehose Amazon Kinesis Streams Apache Kafka Apache Kafka • High throughput distributed streaming platform Amazon Kinesis Data Streams • Managed stream storage Amazon Kinesis Data Firehose • Managed data delivery Stream storage
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which stream/message storage should I use? Amazon Kinesis Data Streams Amazon Kinesis Data Firehose Apache Kafka (on Amazon Elastic Compute Cloud [Amazon EC2]) Amazon Simple Queue Service (Amazon SQS) (Standard) Amazon SQS (FIFO) AWS managed Yes Yes No Yes Yes Guaranteed ordering Yes No Yes No Yes Delivery (deduping) At least once At least once At least/At most/exactly once At least once Exactly once Data retention period 7 days N/A Configurable 14 days 14 days Availability Three AZ Three AZ Configurable Three AZ Three AZ Scale/throughput No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic 300 TPS / queue Multiple consumers Yes No Yes No No Row/object size 1 MB Destination row/object size Configurable 256 KB 256 KB Cost Low Low Low (+admin) Low-medium Low-medium
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Store File Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Hot Stream Mobile apps Web apps Devices Sensors IoT platforms AWS IoT Data centers AWS Direct Connect Migration Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS FILES STREAMS Datatransport&loggingIoTApplications Import/expo rt NoSQL In-memory SQL Amazon S3 Amazon Simple Storage Service (Amazon S3) • Managed object storage service built to store and retrieve any amount of data File/object storage
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use Amazon S3 as the storage for your data lake • Natively supported by big data frameworks (Spark, Hive, Presto, and others) • Decouple storage and compute • No need to run compute clusters for storage (unlike HDFS) • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Multiple & heterogeneous analysis clusters and services can use the same data • Designed for 99.999999999% durability • No need to pay for data replication within a region • Secure – SSL, client/server-side encryption at rest • Low cost
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What about HDFS & data tiering? • Use HDFS/local for working data sets • (for example, iterative read on the same data sets) • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard–IA for less frequently accessed data • Use Amazon Glacier for archiving cold data Use S3 analytics to optimize tiering strategy S3 analytics
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which data store should I use? Ask yourself some questions What is the data structure? How will the data be accessed? What is the temperature of the data? What will the solution cost?
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which data store should I use? Access patterns What to use? Put/get (key, value) In-memory, NoSQL Simple relationships → 1:N, M:N NoSQL Multi-table joins, transaction, SQL SQL Faceting, search Search Graph traversal GraphDB Data structure What to use? Fixed schema SQL, NoSQL Schema-free (JSON) NoSQL, search Key/value In-memory, NoSQL Graph GraphDB How will the data be accessed?
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. In-memory SQL Request rate High Low Cost/GB High Low Latency Low High Data volume Low High Amazon Glacier Structure NoSQL Hot data Warm data Cold data Low High S3 Search Graph What is the temperature of the data?
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Process/ analyze
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Process/analyze Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Interactive & batch analytics Amazon Elasticsearch Service Managed service for ElasticSearch Amazon Redshift & Amazon Redshift Spectrum Managed data warehouse Spectrum enables querying S3 Amazon Athena Serverless interactive query service Amazon EMR Managed Hadoop framework for running Apache Spark, Flink, Presto, Tez, Hive, Pig, HBase, and others Presto Amazon EMR
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Stream/real-time analytics Spark Streaming on Amazon EMR Amazon Kinesis Data Analytics • Managed service for running SQL on streaming data Amazon KCL • Amazon Kinesis Client Library AWS Lambda • Run code serverless (without provisioning or managing servers) • Services such as S3 can publish events to AWS Lambda • AWS Lambda can pool event from a Kinesis Process/analyze KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which analytics should I use? Process/analyze Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Stream Takes milliseconds to seconds Example: Fraud alerts, one-minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics, Amazon KCL, AWS Lambda, and others Predictive Takes milliseconds (real-time) to minutes (batch) Example: Fraud detection, forecasting demand, speech recognition Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe) FastSlow Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Presto Amazon EMR Predictive AmazonML KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming Fast
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which stream processing technology should I use? Amazon EMR (Spark Streaming) KCL application Amazon Kinesis analytics AWS Lambda Managed service Yes No (EC2 + Auto Scaling) Yes Yes Serverless No No Yes Yes Scale / throughput No limits / ~ nodes No limits / ~ nodes No Limits / automatic No limits / automatic Availability Single AZ Multi-AZ Multi-AZ Multi-AZ Programming languages Java, Python, Scala Java, others through MultiLangDaemon ANSI SQL with extensions Node.js, Java, Python, .Net Core Sliding window functions Build-in App needs to implement Built-in No Reliability KCL and Spark checkpoints Managed by Amazon KCL Managed by Amazon Kinesis Data Analytics Managed by AWS Lambda
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which Analytics Tool Should I Use? Amazon Redshift Amazon Redshift Spectrum Amazon Athena Amazon EMR Presto Spark Hive Use case Optimized for data warehousing Query S3 data from Amazon Redshift Interactive queries over S3 data Interactive query General purpose Batch Scale/throughput ~Nodes ~Nodes Automatic ~ Nodes Managed service Yes Yes Yes, Serverless Yes Storage Local storage Amazon S3 Amazon S3 Amazon S3, HDFS Optimization Columnar storage, data compression, and zone maps AVRO, PARQUET TEXT, SEQ RCFILE, ORC, etc. AVRO, PARQUET TEXT, SEQ RCFILE, ORC, etc. Framework dependent Metadata Amazon Redshift catalog AWS Glue catalog AWS Glue catalog AWS Glue catalog or Hive Meta-store Auth/access controls IAM, users, groups, and access controls IAM, users, groups, and access controls IAM IAM, LDAP, & Kerberos UDF support Yes (Scalar) Yes (Scalar) No Yes Slow
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Raw data stored in data lake Preparation N orm ali zed P art i t i oned C om pressed St o rage o p t i m i zed ELT/ETL ELT/ETL: Preparing raw data for consumption Data lake on AWS Raw immutable DataSets Curated DataSets Data catalog
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why ELT?
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Which ELT/ETL tool should I use? AWS Glue ETL AWS Data Pipeline Amazon Data Migration Service (Amazon DMS) Amazon EMR Apache NiFi Partner solution Use case Serverless ETL Data workflow service Migrate databases (and to/from data lake) Customize developed Hadoop/Spark ETL Automate the flow of data between systems Rich partner ecosystem for ETL Scale/throughput ~DPUs ~Nodes through EMR cluster EC2 instance type ~Nodes Self managed Self manager or through partner Managed service Clusterless Managed Managed EC2 on your behalf Managed EC2 on your behalf Self managed on Amazon EMR or Marketplace Self manager or through partner Data sources S3, RDBMS, Amazon RedShift, DynamoDB S3, JDBC, DynamoDB, custom RDBMS, data warehouses, Amazon S3* (*limited) Various Hadoop/Spark managed Various through rich processor framework Various Skills needed Wizard for simple mapping, code snippets for advanced ETL Wizard and code snippets Wizard and drag/drop Hadoop/Spark coding NiFi processors and some coding Self manager or through partner
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Consume
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Store ConsumeProcess/analyzeETL Amazon QuickSight Analysis&visualizationDatasceince AI Apps Apps SW engineers/ business users Data scientists/ data engineers
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Putting it all together
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect Store ConsumeProcess/analyze Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Hot SQLNoSQLCacheFileStream Mobile apps Web apps Devices Sensors IoT platforms AWS IoT Data centers AWS Direct Connect Migration Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS FILES STREAMS Amazon QuickSight Analysis&visualizationDatasceince DataTransport&LoggingIoTApplications Amazon S3 Import/expo rt AI Apps Apps Model Train/ Eval Models Deploy ETL FastSlow Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Presto Amazon EMR Predictive AmazonML KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming Fast Amazon DynamoDB Amazon RDS Amazon Aurora Amazon Neptune Amazon ElastiCache Amazon DAX
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Design patterns
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Interactive & batch analytics Process Store Batch Interactive Amazon EMR Hive Pig Spark Amazon AI Batch prediction Real-time prediction Amazon S3 Files Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics Amazon Redshift Amazon ES Consume Amazon EMR Presto Spark Amazon Athena
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Interactive & batch Amazon S3 Amazon Redshift Amazon EMR Presto Hive Pig Spark Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES AWS Lambda Spark Streaming on Amazon EMR Applications Amazon Kinesis App state or materialized view KCL Amazon AI Real-time Amazon DynamoDB Amazon RDS Change data capture or export Transactions Stream Files Data lake Amazon Kinesis Data Analytics Amazon Athena Amazon Kinesis Data Firehose Amazon ES
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What about metadata? • AWS Glue catalog • Hive Metastore compliant • Crawlers - Detect new data, schema, partitions • Search - Metadata discovery • Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum compatible • Hive Metastore (Presto, Spark, Hive, Pig) • Can be hosted on Amazon RDS Data catalog Metastore Amazon RDS AWS Glue catalog AWS Glue
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed and serverless services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable logs, data lake, materialized views Be cost-conscious • Big data ≠ big cost AI/ML enable your applications
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.