Data Analytics on AWS

Data Analytics on AWS
Danilo Poccia
Technical Evangelist
@danilop
Carlos Conde
Sr. Manager Technical Evangelism
@caarlco

THE MORE DATA YOU COLLECT
THE MORE VALUE YOU CAN
DERIVE FROM IT

THE COST OF DATA
GENERATION IS FALLING

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Lower cost,  
higher throughput

Lower cost,  
higher throughput
Highly 
constrained

+ ELASTIC AND HIGHLY SCALABLE
+ NO UPFRONT CAPITAL EXPENSE
+ ONLY PAY FOR WHAT YOU USE 
+ AVAILABLE ON-DEMAND
= REMOVE CONSTRAINTS

AWS Import / Export 
AWS Direct Connect

Inbound data transfer is free
Multipart upload to S3
AWS Direct Connect
AWS Import / Export

Amazon Snowball
• Petabyte-scale data transport solution
• 50 TB per appliance
• 10Gbps connectivity to device
• Tamper resistant, 256-bit encryption
and Trusted Platform Module
• Low Cost
• End-to-end tracking via Amazon SNS,
text message or the AWS Console

Amazon S3, 
Amazon Glacier, 
Amazon DynamoDB, 
Amazon RDS, 
Amazon Redshift, 
AWS Storage Gateway, 
Data on Amazon EC2

AMAZON S3 
SIMPLE STORAGE SERVICE

CASE STUDY:
SPOTIFY ADDS 20,000 TRACKS/DAY TO ITS CATALOGUE

AMAZON  
DYNAMODB
HIGH-PERFORMANCE, FULLY MANAGED
NoSQL DATABASE SERVICE

DURABLE &
AVAILABLE 
CONSISTENT, DISK-ONLY  
WRITES (SSD)

LOW LATENCY 
AVERAGE READS < 5MS, 
WRITES < 10MS

CASE STUDY:
SHAZAM SUPPORTED 500,000 WRITES/SEC
DURING SUPER BOWL

AMAZON 
REDSHIFT
FULLY MANAGED, PETA-BYTE SCALE
DATAWAREHOUSE ON AWS

30 MINUTES  
DOWN TO 
12 SECONDS

Amazon EC2 
Amazon Elastic
MapReduce

AMAZON EC2 
ELASTIC COMPUTE CLOUD

Instead of  
$20+ MILLIONS 
in infrastructure

GPU INSTANCES
G2
CG1
1x NVIDIA Kepler GK104 
8 vCPU (Intel Xeon E5-2670)
2x NVIDIA Fermi M2050 
16 vCPU (Intel Xeon X5570)
$0.65/h
$2.10/h

ON A SINGLE INSTANCE
COMPUTE TIME: 4h 
COST: 4h x $2.1 = $8.4

ON MULTIPLE INSTANCES
COMPUTE TIME: 1h 
COST: 1h x 4 x $2.1 = $8.4

Amazon S3, 
Amazon DynamoDB, 
Amazon RDS, 
Amazon Redshift, 
Data on Amazon EC2

The Number of 
Connected Sensors and Devices 
is Growing Exponentially

AWS IoT
Secure, bi-directional communication 
between Internet-connected things 
(such as sensors, actuators, embedded devices, 
or smart appliances) 
and the AWS cloud over MQTT and HTTP

DEVICE SDK
Set of client libraries to
connect, authenticate and
exchange messages
DEVICE GATEWAY
Communicate with devices via
MQTT and HTTP
AUTHENTICATION
AUTHORIZATION
Secure with mutual
authentication and encryption
RULES ENGINE
Transform messages
based on rules and route
to AWS Services
AWS Services
- - - - -
3P Services
DEVICE SHADOW
Persistent thing state during
intermittent connections
APPLICATIONS
AWS IoT API
DEVICE REGISTRY
Identity and Management of
your things
AWS IoT

C-SDK
(Ideal for 
embedded OS)
JS-SDK
(Ideal for Embedded
Linux Platforms)
Arduino Library
(Arduino Yun)
Mobile SDK
(Android and iOS)
AWS IoT

Amazon S3, 
Amazon DynamoDB, 
Amazon RDS, 
Amazon Redshift, 
Data on Amazon EC2
Amazon EC2 
Amazon Elastic
MapReduce
Amazon S3, 
Amazon Glacier, 
Amazon DynamoDB, 
Amazon RDS, 
Amazon Redshift, 
AWS Storage Gateway, 
Data on Amazon EC2
AWS Import / Export 
AWS Direct Connect

GENERATE ➔ ➔ SHARE
STREAM
PROCESSING

GENERATE ➔ ➔ SHARE
STREAM
PROCESSING
Amazon S3, 
Amazon DynamoDB, 
Amazon RDS, 
Amazon Redshift, 
Data on Amazon EC2
Amazon Kinesis 
Stream Processing
on Amazon EC2

FROM DATA TO 
ACTIONABLE
INFORMATION

RAW DATA
BUSINESS
INTELLIGENCE
RAW
INFORMATION
DATA
PREDICTIONS

Data Analytics
Value > Costs Storage and
Analysis Costs 
are Going Down
Making 
New Use Cases
Possible

Structured
Vs
Unstructured
Data
High Degree 
of Organization
Data Model
Free Text
Multimedia
Social Media

Structured
Semi-structured
Unstructured
Data
XML
JSON

Batch
Vs
Real-time
Data
Fixed
Dataset
Updated in 
Discrete Moments
Continuous 
Stream of Data

Batch
Report
Real-time
Alerts
Prediction
Forecast

?
Unstructured 
Data
Structured 
Data

Unstructured 
Data
Structured 
Data
Resilient Distributed Datasets (RDDs)
Memory
Fast Processing
Large Quantity of Data
Disk
Hadoop
MapReduce
Spark
?

Amazon 
Elastic MapReduce 
(Amazon EMR)
Unstructured 
Data
Structured 
Data

Amazon 
(Amazon EMR)
Structured 
Data
Unstructured 
Data
Structured 
Data

Amazon 
(Amazon EMR) Managed clusters
For Hadoop, Spark, Presto 
or any other applications 
in the Apache / Hadoop stackWhat is 
Amazon EMR?

Amazon 
(Amazon EMR)
Overview of 
Amazon EMR 
Architecture
Storage
HDFS EMRFS
Local 
File System
Data Processing Frameworks
Hadoop Spark …
Applications and Programs
Hive Pig …
ClusterResourceManagement
YARNAgent…

Amazon 
(Amazon EMR)
Overview of 
Amazon EMR 
Architecture
Master 
Instance 
Group
Core 
Instance 
Group
Task 
Instance 
Group
EC2 Spot
Instances

Amazon 
(Amazon EMR)
Hadoop NextGen
MapReduce (YARN)
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Amazon 
(Amazon EMR)
Spark Cluster Mode
(RDDs)
http://spark.apache.org/docs/latest/cluster-overview.html

Separate Compute and Storage
Resize and shut down 
Amazon EMR clusters with no data loss
Point multiple Amazon EMR clusters 
at the same data in Amazon S3
Easily evolve your analytic infrastructure 
as technology evolves
Leverage 
Amazon S3 with  
EMR File System
(EMRFS)
S3 Bucket
Cluster
EMR Cluster
Cluster
EMR Cluster
Amazon 
(Amazon EMR)

Read-after-write consistency 
Very fast list operations 
(thanks to Amazon DynamoDB)
Transparent to applications as s3://…
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Amazon 
(Amazon EMR)
EMRFS 
makes it easier 
to use Amazon S3

CREATE EXTERNAL TABLE serde_regex(
host STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
)
LOCATION ‘some/path/input/'
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Amazon 
(Amazon EMR)
Going 
from HDFS 
…

S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Amazon 
(Amazon EMR)
Going 
from HDFS 
to Amazon S3
CREATE EXTERNAL TABLE serde_regex(
host STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
)
LOCATION 's3://bucket/path/input/'

Amazon 
(Amazon EMR)
Consistent view and
fast listing using  
the optional EMRFS
metadata layer
List and read-after-write consistency
Faster list operations
Number of
objects
Without
consistent
view
With  
consistent
view
1,000,000 147.72 29.70
100,000 12.70 3.69
Tested using a single node cluster with a m3.xlarge instance

Amazon 
(Amazon EMR)
EMRFS client-side
encryption
S3 Bucket
Cluster
EMR Cluster
Cluster
EMR Cluster
AWS KMS or your custom key vendor
AmazonS3encryptionclients
EMRFSenabledfor 
AmazonS3client-sideencryption

Iterative workloads
If you’re processing 
the same dataset more than once,
consider using Spark & RDDs for this too
Disk I/O intensive workloads
Persist data on Amazon S3 and use S3DistCp to
copy to/from HDFS for processingHDFS is still there 
if you need it
Amazon 
(Amazon EMR)

Use S3 as your persistent data store 
Query it using Presto, Hive, Spark, etc.
Use Amazon EC2 Spot Instances to save >80%
Use Amazon EC2 Reserved Instances 
for steady workloads
Use Amazon CloudWatch alarms to notify you 
if a cluster is underutilized, then shut it down: 
e.g. 0 mappers running for >N hours
Cost saving tips for
Amazon EMR
Amazon 
(Amazon EMR)

Resize your cluster, 
or create clusters when needed 
and only pay for compute when you need it
Intelligent Scale Down
(including YARN / HDFS)Cost saving tips for
Amazon EMR
Amazon 
(Amazon EMR)

Amazon S3 is your Data Lake
S3 Bucket
Cluster
Hive, Pig
Cluster
Presto
Cluster
Spark
Cluster
Ad Hoc
Cluster
Cascading
Logical Separation of Jobs

A managed service that makes it easy 
to deploy, operate, and scale Elasticsearch 
in the AWS Cloud
High availability, patch management, failure detection 
and node replacement, backups, and monitoring
Integrated with Logstash and Kibana
Scale up and scale down your cluster to deliver optimum
performance as data and usage patterns change, paying
only for the resources you actually consume
Control access to the Elasticsearch APIs using AWS Identity
and Access Management (IAM) policies
What is 
Amazon ES?
Amazon 
Elasticsearch Service 
(Amazon ES)

Amazon ES 
Architecture
Amazon 
Elasticsearch Service 
(Amazon ES)
Elasticsearch
Kibana
Amazon 
CloudWatch
AWS 
CloudTrail
Elastic 
Load Balancing
Amazon 
Route 53
Elasticsearch 
APIs
AWS Credentials 
(AWS IAM)

?
Structured 
Data
Information

Amazon 
Redshift
Structured 
Data
Information

Relational Data Warehouse
a lot faster
a lot simpler
a lot cheaper
 
Massively parallel + Petabyte scale 
Fully managed 
HDD and SSD Platforms 
$1,000/TB/Year; starts at $0.25/hour
What is 
Amazon Redshift?
Amazon 
Redshift

Amazon Redshift
Architecture
Amazon 
Redshift
Compute 
Node
Compute 
Node
Compute 
Node
Leader 
Node
SQL Clients / BI Tools
Amazon S3 / Amazon DynamoDB / SSH
10GbE
Ingestion/Backup
JDBC / ODBC

Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
Amazon Redshift
Performance
Amazon 
Redshift
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959

Sort Keys 
and 
Zone Maps
Amazon 
Redshift
SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09-JUNE-2013’
Unsorted Sorted by Date
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013

Parallel
and
Distributed
Amazon 
Redshift
Compute 
Node
Compute 
Node
Compute 
Node
Leader 
Node
Query
Load / Export / Backup / Restore

Parallel
and
Distributed
Amazon 
Redshift
Compute 
Node
Compute 
Node
Compute 
Node
Leader 
Node
Compute 
Node
Query
Load / Export / Backup / Restore
Resize

Load encrypted from S3
SSL to secure data in transit
ECDHE perfect forward security
Amazon VPC for network isolation
Encryption to secure data at rest
All blocks on disks & in Amazon S3 encrypted
Block key, Cluster key, Master key (AES-256)
On-premises HSM & AWS CloudHSM support
Audit logging and AWS CloudTrail integration
SOC 1/2/3, PCI-DSS, FedRAMP, BAA
Amazon Redshift
Security
Amazon 
Redshift

Amazon Redshift
Innovation
Amazon 
Redshift
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy, Distributed
Tables, Audit Logging/CloudTrail,
Concurrency, Resize Perf., Approximate
Count Distinct, SNS Alerts, Cross Region
Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
Well over 100 new features added since launch
Release every two weeks
Automatic patching

Amazon Redshift
Features
Amazon 
Redshift
Approximate functions
User deﬁned functions
Machine Learning
Data Science
Amazon ML

Amazon Redshift
Ecosystem
Amazon 
Redshift
Data Integration Systems IntegratorsBusiness Intelligence

?Data Stream
Real-time 
Information

Amazon 
Kinesis
Data Stream
Real-time 
Information

A Platform for Streaming Data on AWS
What is 
Amazon Kinesis?
Amazon 
Kinesis
Amazon 
Kinesis 
Streams
Amazon 
Kinesis 
Firehose
Amazon 
Kinesis 
Analytics

Amazon 
Kinesis 
Streams
Amazon 
Kinesis
Build your own custom applications 
that process or analyze streaming data

Amazon 
Kinesis 
Streams
Amazon 
Kinesis
Use the Kinesis Client Library (KCL) 
to consume data from Kinesys Streams

Amazon 
Kinesis 
Streams
Amazon 
Kinesis
AWS Lambda 
Functions
Use AWS Lambda for a serverless architecture

Amazon 
Kinesis 
Streams
Amazon 
Kinesis
Low latency I/O
Conﬁgurable retention period from 1 to 7 days
The maximum size of a data blob is up to 1 MB
Each shard can support:
up to 5 transactions / second and
up to 2 MB / second for reads
up to 1,000 records / second and
up to 1 MB / second for writes

Amazon 
Kinesis 
Firehose
Amazon 
Kinesis
Easily load massive volumes 
of streaming data into AWS

Amazon 
Kinesis 
Analytics
Amazon 
Kinesis
Easily analyze streaming data with standard SQL
(Coming Soon)

Amazon 
Machine Learning 
(Amazon ML)
Data Model

Machine learning is the technology that automatically
ﬁnds patterns in your data and uses them to make
predictions for new data points as they become
available
Your Data + Machine Learning 
= Smart Applications
What is 
Machine Learning?
Amazon 
Machine Learning 
(Amazon ML)

UNDERSTAND YOUR
CUSTOMER
Who is my customer really?
What does he really like?
What is happening with my products?
Where do people consume my product?

THREE TYPES OF DATA-DRIVEN ANALYSIS
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
applications

MACHINE LEARNING
Technology that automatically finds patterns in
your data and uses them to make predictions
for new data points

MACHINE LEARNING 
IS EVERYWHERE

AMAZON
MACHINE LEARNING
Scalable & managed machine learning service

BEST PRACTICES
&
LESSONS LEARNED

B
EST
PR
A
C
TIC
ES
USE ALL  
AVAILABLE DATA 
Your company has more data on
your users than what you think…

Quizz
What percentage of data  
do firms use for analytics?
A: 12% C: 52%
B: 34% D: 68%

B
EST
PR
A
C
TIC
ES
ENRICH DATA BASED 
ON SOCIAL NETWORKS 
User’s friends are valuable 
sources of information

75% of users select movies based
on recommendations

HOMOPHILY
People are friends with people like them.

B
EST
PR
A
C
TIC
ES
ON USER ENVIRONMENT 
User’s context heavily  
influences their behavior.

Geo-location data
Device information
Time of day and week
Metadata from third parties
…

B
EST
PR
A
C
TIC
ES
ON USER BEHAVIOR 
The way users interacts with the 
UI gives valuable information

B
EST
PR
A
C
TIC
ES
THE MORE, THE BETTER 
(SOMETIMES) 
More data usually gives you  
better prediction accuracy.

B
EST
PR
A
C
TIC
ES
MAKE NO PREMATURE
ASSUMPTIONS 
We all have preconceived ideas about who
are our users … Most often they are wrong.

B
EST
PR
A
C
TIC
ES
GET CLOSER  
TO YOUR USERS 
Your customers live in the real world. 
Use IoT to bring your services closer.

Easy. Buy a copper tube ice maker kit

…And a T-connector to tap the cold water supply line…

…And a hacksaw to cut the copper pipe…

…Finally, A special drill bit to make a hole in
the kitchen floor for the copper tubing.

“It is not the strongest species that
survive, nor the most intelligent, but the
ones most responsive to change.”

Data 
Orchestration & Visualization

Data Orchestration can be a Task by Itself
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Redshift DB
RDS Instance
S3 Bucket
On
Premises

Helps you reliably process and move data between
different AWS compute and storage services, as well
as on-premise data sources, at speciﬁed intervals
What is AWS 
Data Pipeline?
AWS 
Data Pipeline

Access your data where it’s stored, transform and
process it at scale, and efﬁciently transfer the results
to other AWS services
What is AWS 
Data Pipeline?
AWS 
Data Pipeline

Create complex data processing workloads that are
fault tolerant, repeatable, and highly available
What is AWS 
Data Pipeline?
AWS 
Data Pipeline

Helps you migrate databases to AWS easily and
securely: the source database remains fully
operational during the migration, minimizing
downtime to applications that rely on the database
What is 
AWS Database 
Migration Service?
AWS Database 
Migration Service
Customer
Premises
Application Users
AWS
Internet
VPN
AWS
Database Migration
Service

Migrate off Oracle and SQL Server
Move your tables, views, stored procedures and DML
to MySQL, MariaDB, and Amazon Aurora
AWS Schema
Conversion Tool
AWS Database 
Migration Service

Know exactly where manual edits are needed
AWS Schema
Conversion Tool
AWS Database 
Migration Service

AWS Marketplace
Structured 
Data
Visual

https://aws.amazon.com/marketplace

Amazon 
QuickSight
Structured 
Data
Visual

A very fast, cloud-powered business intelligence (BI)
service that makes it easy to build visualizations,
perform ad-hoc analysis, and quickly get business
insights from their data
What is Amazon
QuickSight?
Amazon 
QuickSight

First analysis
in about
60 seconds
Amazon 
QuickSight
Business user
Sign-in

Amazon 
QuickSight
Architecture
Amazon 
QuickSight Business User
QuickSight API
Data Prep Metadata SuggestionsConnectors SPICE
Business User
QuickSight UI
Mobile Devices Web Browsers
Partner BI products
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
EMR
Amazon
Redshift
Amazon RDSFiles Third-party

Dynamically Optimized Graphics

Get Answers 
Fast
Amazon 
QuickSight
Amazon QuickSight uses SPICE – a Super-fast,
Parallel, In-memory optimized Calculation Engine
built from the ground up to generate answers on
large datasets

Use AWS Partner 
BI Solutions
with Amazon
QuickSight
Amazon 
QuickSight
Amazon QuickSight provides partners 
a simple SQL-like interface to query the data stored
in SPICE, so that customers can continue using their
existing BI tools while beneﬁting from the faster
performance delivered by SPICE

Tell a Story 
with Your Data
Share insights 
and collaborate 
with others
Amazon 
QuickSight
Securely share your analysis with others in your
organization by building interactive stories for
collaboration using the storyboard and annotations.
Recipients can further explore the data and respond
back with their insights and knowledge, making the
whole organization efﬁcient and effective.

Let’s Put Everything Together

Unstructured 
Data S3 Bucket
(unstructured)

Unstructured 
Data S3 Bucket
(unstructured)
Cluster
EMR Cluster
S3 Bucket
(structured)
Reporting
Apps

Unstructured 
Data S3 Bucket
(unstructured)
Cluster
EMR Cluster
S3 Bucket
(structured)
Elasticsearch 
Cluster
Reporting
Apps

Unstructured 
Data
Structured 
Data
S3 Bucket
(unstructured)
Cluster
EMR Cluster
S3 Bucket
(structured)
Elasticsearch 
Cluster
Reporting
Apps

Unstructured 
Data
Structured 
Data
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Reporting
Apps

Unstructured 
Data
Structured 
Data
Data 
Stream
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Amazon 
Kinesis 
Streams
KCL
AWS Lambda 
Functions
Reporting
Apps

Unstructured 
Data
Structured 
Data
Data 
Stream
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Amazon 
Kinesis 
Streams
Amazon 
Kinesis 
Firehose
KCL
AWS Lambda 
Functions
Reporting
Apps

Unstructured 
Data
Structured 
Data
Data 
Stream
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Amazon 
Kinesis 
Streams
Amazon 
Kinesis 
Firehose Amazon 
Kinesis 
Analytics
KCL
AWS Lambda 
Functions
Reporting
Apps

Unstructured 
Data
Structured 
Data
Data 
Stream
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Amazon 
Kinesis 
Streams
Amazon 
Kinesis 
Firehose Amazon 
Kinesis 
Analytics
KCL
AWS Lambda 
Functions
Reporting
Apps
Amazon ML

Unstructured 
Data
Structured 
Data
Data 
Stream
S3 Bucket
(unstructured)
Cluster
EMR Cluster
DynamoDB Table
S3 Bucket
(structured)
Redshift DB
Elasticsearch 
Cluster
Amazon 
Kinesis 
Streams
Amazon 
Kinesis 
Firehose Amazon 
Kinesis 
Analytics
KCL
AWS Lambda 
Functions
Reporting
Apps
Amazon ML
Amazon 
QuickSight
AWS Data 
Pipeline

Collect Store Analyze
AWS Direct 
Connect
AWS 
Import/Export 
Disk
AWS 
Import/Export 
Snowball
Amazon 
Kinesis 
Streams
Amazon VPC 
VPN Connection
AWS Database 
Migration Service
AWS 
Data Pipeline
Amazon 
Kinesis 
Firehose
Amazon 
Kinesis 
Analytics
AWS Storage 
Gateway
Amazon S3
Amazon 
Glacier
Amazon RDS
Amazon 
Redshift
Amazon 
Elastisearch 
Service
Amazon 
DynamoDB
Amazon EMR Amazon EC2
Amazon EC2
Container Service
Amazon ML
Amazon 
QuickSight

Start Simple
Amazon S3 + Amazon EMR
or
Amazon S3 + Amazon Redshift

Data Analytics on AWS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Data Analytics on AWS

Similaire à Data Analytics on AWS (20)

Plus de Danilo Poccia

Plus de Danilo Poccia (20)

Dernier

Dernier (20)

Data Analytics on AWS