9. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE
AWS Import / Export
AWS Direct Connect
10. Inbound data transfer is free
Multipart upload to S3
AWS Direct Connect
AWS Import / Export
11. Amazon Snowball
• Petabyte-scale data transport solution
• 50 TB per appliance
• 10Gbps connectivity to device
• Tamper resistant, 256-bit encryption
and Trusted Platform Module
• Low Cost
• End-to-end tracking via Amazon SNS,
text message or the AWS Console
12.
13. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
53. AWS IoT
Secure, bi-directional communication
between Internet-connected things
(such as sensors, actuators, embedded devices,
or smart appliances)
and the AWS cloud over MQTT and HTTP
54. DEVICE SDK
Set of client libraries to
connect, authenticate and
exchange messages
DEVICE GATEWAY
Communicate with devices via
MQTT and HTTP
AUTHENTICATION
AUTHORIZATION
Secure with mutual
authentication and encryption
RULES ENGINE
Transform messages
based on rules and route
to AWS Services
AWS Services
- - - - -
3P Services
DEVICE SHADOW
Persistent thing state during
intermittent connections
APPLICATIONS
AWS IoT API
DEVICE REGISTRY
Identity and Management of
your things
AWS IoT
74. Amazon
Elastic MapReduce
(Amazon EMR) Managed clusters
For Hadoop, Spark, Presto
or any other applications
in the Apache / Hadoop stackWhat is
Amazon EMR?
75. Amazon
Elastic MapReduce
(Amazon EMR)
Overview of
Amazon EMR
Architecture
Storage
HDFS EMRFS
Local
File System
Data Processing Frameworks
Hadoop Spark …
Applications and Programs
Hive Pig …
ClusterResourceManagement
YARNAgent…
79. Separate Compute and Storage
Resize and shut down
Amazon EMR clusters with no data loss
Point multiple Amazon EMR clusters
at the same data in Amazon S3
Easily evolve your analytic infrastructure
as technology evolves
Leverage
Amazon S3 with
EMR File System
(EMRFS)
S3 Bucket
Cluster
EMR Cluster
Cluster
EMR Cluster
Amazon
Elastic MapReduce
(Amazon EMR)
80. Read-after-write consistency
Very fast list operations
(thanks to Amazon DynamoDB)
Transparent to applications as s3://…
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Amazon
Elastic MapReduce
(Amazon EMR)
EMRFS
makes it easier
to use Amazon S3
83. Amazon
Elastic MapReduce
(Amazon EMR)
Consistent view and
fast listing using
the optional EMRFS
metadata layer
List and read-after-write consistency
Faster list operations
Number of
objects
Without
consistent
view
With
consistent
view
1,000,000 147.72 29.70
100,000 12.70 3.69
Tested using a single node cluster with a m3.xlarge instance
85. Iterative workloads
If you’re processing
the same dataset more than once,
consider using Spark & RDDs for this too
Disk I/O intensive workloads
Persist data on Amazon S3 and use S3DistCp to
copy to/from HDFS for processingHDFS is still there
if you need it
Amazon
Elastic MapReduce
(Amazon EMR)
86. Use S3 as your persistent data store
Query it using Presto, Hive, Spark, etc.
Use Amazon EC2 Spot Instances to save >80%
Use Amazon EC2 Reserved Instances
for steady workloads
Use Amazon CloudWatch alarms to notify you
if a cluster is underutilized, then shut it down:
e.g. 0 mappers running for >N hours
Cost saving tips for
Amazon EMR
Amazon
Elastic MapReduce
(Amazon EMR)
87. Resize your cluster,
or create clusters when needed
and only pay for compute when you need it
Intelligent Scale Down
(including YARN / HDFS)Cost saving tips for
Amazon EMR
Amazon
Elastic MapReduce
(Amazon EMR)
88. Amazon S3 is your Data Lake
S3 Bucket
Cluster
Hive, Pig
Cluster
Presto
Cluster
Spark
Cluster
Ad Hoc
Cluster
Cascading
Logical Separation of Jobs
91. A managed service that makes it easy
to deploy, operate, and scale Elasticsearch
in the AWS Cloud
High availability, patch management, failure detection
and node replacement, backups, and monitoring
Integrated with Logstash and Kibana
Scale up and scale down your cluster to deliver optimum
performance as data and usage patterns change, paying
only for the resources you actually consume
Control access to the Elasticsearch APIs using AWS Identity
and Access Management (IAM) policies
What is
Amazon ES?
Amazon
Elasticsearch Service
(Amazon ES)
97. Relational Data Warehouse
a lot faster
a lot simpler
a lot cheaper
Massively parallel + Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
What is
Amazon Redshift?
Amazon
Redshift
103. Load encrypted from S3
SSL to secure data in transit
ECDHE perfect forward security
Amazon VPC for network isolation
Encryption to secure data at rest
All blocks on disks & in Amazon S3 encrypted
Block key, Cluster key, Master key (AES-256)
On-premises HSM & AWS CloudHSM support
Audit logging and AWS CloudTrail integration
SOC 1/2/3, PCI-DSS, FedRAMP, BAA
Amazon Redshift
Security
Amazon
Redshift
104. Amazon Redshift
Innovation
Amazon
Redshift
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy, Distributed
Tables, Audit Logging/CloudTrail,
Concurrency, Resize Perf., Approximate
Count Distinct, SNS Alerts, Cross Region
Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
Well over 100 new features added since launch
Release every two weeks
Automatic patching
112. A Platform for Streaming Data on AWS
What is
Amazon Kinesis?
Amazon
Kinesis
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
Kinesis
Analytics
116. Amazon
Kinesis
Streams
Amazon
Kinesis
Low latency I/O
Configurable retention period from 1 to 7 days
The maximum size of a data blob is up to 1 MB
Each shard can support:
up to 5 transactions / second and
up to 2 MB / second for reads
up to 1,000 records / second and
up to 1 MB / second for writes
125. Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
Your Data + Machine Learning
= Smart Applications
What is
Machine Learning?
Amazon
Machine Learning
(Amazon ML)
132. UNDERSTAND YOUR
CUSTOMER
Who is my customer really?
What does he really like?
What is happening with my products?
Where do people consume my product?
133. THREE TYPES OF DATA-DRIVEN ANALYSIS
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
applications
134. MACHINE LEARNING
Technology that automatically finds patterns in
your data and uses them to make predictions
for new data points
175. Data Orchestration can be a Task by Itself
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Redshift DB
RDS Instance
S3 Bucket
On
Premises
176. Helps you reliably process and move data between
different AWS compute and storage services, as well
as on-premise data sources, at specified intervals
What is AWS
Data Pipeline?
AWS
Data Pipeline
177. Access your data where it’s stored, transform and
process it at scale, and efficiently transfer the results
to other AWS services
What is AWS
Data Pipeline?
AWS
Data Pipeline
178. Create complex data processing workloads that are
fault tolerant, repeatable, and highly available
What is AWS
Data Pipeline?
AWS
Data Pipeline
180. Helps you migrate databases to AWS easily and
securely: the source database remains fully
operational during the migration, minimizing
downtime to applications that rely on the database
What is
AWS Database
Migration Service?
AWS Database
Migration Service
Customer
Premises
Application Users
AWS
Internet
VPN
AWS
Database Migration
Service
181. Migrate off Oracle and SQL Server
Move your tables, views, stored procedures and DML
to MySQL, MariaDB, and Amazon Aurora
AWS Schema
Conversion Tool
AWS Database
Migration Service
182. Know exactly where manual edits are needed
AWS Schema
Conversion Tool
AWS Database
Migration Service
188. A very fast, cloud-powered business intelligence (BI)
service that makes it easy to build visualizations,
perform ad-hoc analysis, and quickly get business
insights from their data
What is Amazon
QuickSight?
Amazon
QuickSight
190. Amazon
QuickSight
Architecture
Amazon
QuickSight Business User
QuickSight API
Data Prep Metadata SuggestionsConnectors SPICE
Business User
QuickSight UI
Mobile Devices Web Browsers
Partner BI products
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
EMR
Amazon
Redshift
Amazon RDSFiles Third-party
196. Use AWS Partner
BI Solutions
with Amazon
QuickSight
Amazon
QuickSight
Amazon QuickSight provides partners
a simple SQL-like interface to query the data stored
in SPICE, so that customers can continue using their
existing BI tools while benefiting from the faster
performance delivered by SPICE
197. Tell a Story
with Your Data
Share insights
and collaborate
with others
Amazon
QuickSight
Securely share your analysis with others in your
organization by building interactive stories for
collaboration using the storyboard and annotations.
Recipients can further explore the data and respond
back with their insights and knowledge, making the
whole organization efficient and effective.