SlideShare une entreprise Scribd logo
1  sur  75
Télécharger pour lire hors ligne
Journey Through the Cloud
ianmas@amazon.com
@IanMmmm
Ian Massingham — Technical Evangelist
Data Analysis
Journey Through the Cloud
Learn from the journeys taken by other AWS customers
Discover best practices that you can use to bootstrap your projects
Common use cases and adoption models for the AWS Cloud
1
2
3
Data Analysis
Collect and store Big Data in the AWS Cloud

Meet the challenge of the increasing volume, variety, and velocity of data

Reduce costs, scale to meet demand & increase the speed of innovation

Make use of solutions for every stage of the big data lifecycle
Agenda
Why Build Big Data Applications on AWS?
Collecting Big Data in the AWS Cloud

Real-time Streaming and Analysis

Big Data Cloud Storage Solutions
AWS Database Services 

Analytics with Hadoop with Amazon EMR

Case Studies & Useful Resources
WHY BUILD BIG DATA
APPLICATIONS ON AWS?
It’s Never Been Easier And Less Expensive To 

Collect, Store, Analyze & Share Data
We are constantly producing more data
From all types of industries
From a diverse range of sources
Sources of Truth Analysis PlatformsHigh Performance Databases
AWS Services For Big Data Workloads
Amazon S3
Amazon EFS
Amazon Redshift
Amazon DynamoDB
Amazon Aurora
Amazon EMR
Real time
Amazon Kinesis
Broad Analytics Usage In The AWS Cloud
Discovery Development Delivery
Risk Marketing Reporting Trade
Sales
WHEN OUR ANALYSTS
FIRST STARTED TO DO
QUERIES ON AMAZON
REDSHIFT, THEY THOUGHT
IT WAS BROKEN BECAUSE IT
WAS WORKING SO FAST.
John	
  O’Donovan	
  
CTO	
  
Financial	
  Times
• Needed a way to increase speed, performance and flexibility
of data analysis at a low cost
• Using AWS enabled FT to run queries 98% faster than
previously—helping FT make business decisions quickly
• Easier to track and analyze trends
• Reduced infrastructure costs by 80% over traditional data
center model
Financial Times Uses AWS to Reduce Infrastructure
Costs by 80%
Find out more here: aws.amazon.com/solutions/case-studies/financial-times/
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
COLLECTING BIG DATA
IN THE AWS CLOUD
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
Amazon S3 Multipart upload
AWS Import/Export
AWS Direct Connect
AWS Storage Gateway
Amazon S3
Secure, durable, highly-scalable object storage

Accessible via a simple web services interface
Store & retrieve any amount of data
Use alone or together with other AWS services
Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
Amazon S3 Multipart Upload
Large file
(Size < 5TB)
Large object
(Size < 5TB)
Split file into parts Send parts to S3 S3 rejoins the parts
AWS Import/Export
Move large amounts of data into and out of the AWS
cloud using portable storage devices
Transfer your data directly onto and off of storage
devices using Amazon’s high-speed internal network
For significant data sets, AWS Import/Export is often
faster than Internet transfer and more cost effective
than upgrading your connectivity
Supports upload & download from S3 & upload to
Amazon EBS snapshots & Amazon Glacier Vaults
aws.amazon.com/importexport/
When to Use AWS Import/Export
aws.amazon.com/importexport/
AWS Direct Connect
Makes it easy to establish a dedicated network
connection from your premises to AWS
Establish private connectivity between AWS & your
datacenter, office, or colocation environment
Reduce your network costs, increase bandwidth
throughput, and provide a more consistent network
experience
The dedicated connection can be partitioned into
multiple virtual interfaces using 802.1q VLANs
aws.amazon.com/directconnect
AWS Direct Connect Locations & Partners
aws.amazon.com/directconnect/partners/
1GB and 10GB ports are
available from AWS
50Mbps, 100Mbps, 200Mbps,
300Mbps, 400Mbps, and
500Mbps can be ordered from
any APN partners supporting
AWS Direct Connect
AWS Storage Gateway
An on-premises software appliance connecting with
cloud-based storage
Supports industry-standard storage protocols that
work with your existing applications and workflows
Provides low-latency performance by maintaining
frequently accessed data on-premises while securely
storing all of your data encrypted in Amazon S3 or
Amazon Glacier
aws.amazon.com/storagegateway/
AWS Storage Gateway
Designed for user with other AWS Services
Enables you to easily mirror data from your on
premises environment for access within the AWS
Cloud
Easy to integrate into existing ETL workflows
aws.amazon.com/storagegateway/
REAL-TIME STREAMING
AND ANALYSIS
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
Amazon Kinesis
Amazon Kinesis
A fully managed, cloud-based service for real-time
data processing over large, distributed data streams
Continuously capture and store terabytes of data per
hour from hundreds of thousands of sources
Emit data to other AWS services such as Amazon
S3, Amazon Redshift, Amazon Elastic Map Reduce
(Amazon EMR)
aws.amazon.com/kinesis
As a startup, using AWS
has allowed us to scale nicely
and use resources without
spending a lot of capital.
Brian	
  Langel	
  
CTO	
  
Dash
• Needed scale IT resources to create an app that would offer
real-time information to drivers
• Developed and deployed the Dash application on the AWS
Cloud
• Streams more than 1 TB of real-time data per day using
Amazon Kinesis and processes billions of entries using
Amazon DynamoDB
• Scaled up to support large traffic spikes–several thousand
updates per second–in app usage
• Reduced operating costs by $200,000 per year
Using AWS, Dash Streams More Than 1 TB of Real-
Time Data Per Day
Find out more here: aws.amazon.com/solutions/case-studies/dash/
Millions of sources
producing 100s of
TB per hour
Front
End
Authentication

Authorization
AZAZAZ
Durable, consistent replicas across
three AWS Availability Zones
Amazon Web Services Region
Inexpensive: $0.0165 per million PUT Payload Units 

(in EU Ireland)
Aggregate and
archive to S3
Real-time
dashboards and
alarms
Machine learning
algorithms
Aggregate analysis
in Hadoop or a data
warehouse
Ordered stream of 

events supporting 

multiple readers
Amazon Kinesis Architecture
New
New
BIG DATA CLOUD
STORAGE SOLUTIONS
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
Amazon S3
Amazon Glacier

Amazon EBS
Amazon S3
Secure, durable, highly-scalable object storage

Accessible via a simple web services interface
Store & retrieve any amount of data
Use alone or together with other AWS services
Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
Amazon S3
Allows you to decouple
compute from storage
for analytics workloads
Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
Amazon Glacier
Durable
Designed for 99.999999999%
durability of archives
Cost Effective
Write-once, read-never. Cost effective for long
term storage. Pay for accessing data
aws.amazon.com/glacier
Amazon Elastic Block Store (EBS)
Persistent block level storage volumes
For use with Amazon EC2 instances
Automatically replicated within Availability Zones
Offer consistent and low-latency performance
EBS Snapshot
(stored on S3)
EBS Volume
EC2
Instance
aws.amazon.com/ebs
EC2

Instance
Very Fast
Block devices to attach
to EC2 Instances
Fast
API Accessible
Object Storage
3-5 hour access latency
Intended for write once,
read never use-cases
Elastic Block Store
Amazon EBS
Simple Storage Service
Amazon S3
Amazon Glacier
1GB to 16TB Volumes
up to 20,000 IOPS per
volume with EBS PIOPS
Highly Scalable Object Store
Objects from 1 byte to 5TB
99.99999999% durability
Long term archive storage
Extremely low cost per GB
99.99999999% durability
AWS DATABASE SERVICES
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
Amazon RDS
Amazon Redshift
Amazon DynamoDB
Amazon Relational Database Service (RDS)
Easy to set up, operate, and scale a relational database
Provides cost-efficient and resizable capacity
Manages time-consuming database management tasks
aws.amazon.com/rds/
Amazon Redshift
A fast, fully managed, petabyte-scale data warehouse
Cost-effectively & efficiently analyze all your data
Use existing Business Intelligence tools
Fast query performance using columnar storage technology
aws.amazon.com/redshift/
Getting Started with Amazon Redshift
aws.amazon.com/redshift/getting-started/
2 Month Free Trial
6 Step Getting Started Tutorial
Best Practices Guides
— loading data, table design & performance tuning
Cluster Management Guide
BI & ETL Tools for
Amazon Redshift
aws.amazon.com/redshift/partners/
Amazon DynamoDB
A fast and flexible NoSQL database service
Consistent, single-digit millisecond latency at any scale
A fully managed cloud database
Supports both document and key-value store models
Flexible data model and reliable performance
aws.amazon.com/dynamodb/
ANALYTICS WITH
HADOOP & AMAZON EMR
COLLECT STREAM STORE
RDBMS
DATA WAREHOUSE
NOSQL
ANALYTICS➤ ➤ ➤ ➤
GENERATE
➤
➤
ARCHIVE
Amazon EMR
AMAZON ELASTIC
MAPREDUCE

A MANAGED HADOOP FRAMEWORK
HADOOP

DISTRIBUTED FILESYSTEM
(HDFS)
+
DISTRIBUTED PROCESSING ENGINE
(MAPREDUCE)
Amazon Elastic MapReduce (EMR)
A managed Hadoop framework
Quickly & cost-effectively process vast amounts of data
Dynamically scale across fleets of Amazon EC2 instances
Run other popular distributed frameworks such as Spark
aws.amazon.com/emr/
Amazon Elastic MapReduce (EMR)
Splits data in pieces using the HDFS filesystem
Manages distributed access to data and task execution
Gathers the results and deposits these in S3 for access
Very large
clickstream
logging data
(e.g TBs)
Lots of actions by
John Smith
Very large
clickstream
logging data
(e.g TBs)
Lots of actions by
John Smith
Split the log
into many
small pieces
Very large
clickstream
logging data
(e.g TBs)
Lots of actions by
John Smith
Split the log
into many
small pieces
Process in an EMR
cluster
Very large
clickstream
logging data
(e.g TBs)
Lots of actions by
John Smith
Split the log
into many
small pieces
Process in an EMR
cluster
Aggregate the
results from all
the nodes
Very large
clickstream
logging data
(e.g TBs)
Lots of actions by
John Smith
Split the log
into many
small pieces
Process in an EMR
cluster
Aggregate the
results from all
the nodes
Very large
clickstream
logging data
(e.g TBs)
What John
Smith did
Insight in a fraction of the time
Very large
clickstream
logging data
(e.g TBs)
What John
Smith did
Analytics languages/enginesData management
Amazon
Redshift
AWS Data Pipeline
Amazon
Kinesis
Amazon
S3
Amazon
DynamoDB
Amazon
RDSAmazon EMR
Data Sources
DEMO:
ANALYZING AMAZON S3 ACCESS
LOGS WITH EMR AND HUE
PREDICTIVE ANALYTICS WITH
AMAZON MACHINE LEARNING
Email targeting Recommendations Social news
Digital health Language processing Auto-scaling
More & More Customers Are
Using Prediction Technologies
Large opportunity to
apply ML
Low barrier to
entry
Easily create machine learning models
Visualize and optimize models
Put models into production in seconds
Battle-hardened technology
New
Introducing Amazon Machine Learning
aws.amazon.com/ml/
Train and optimize models on GBs of data
Batch process predictions
Real-time prediction API in one-click
No servers to provision or manage
Easy to Use, High Performance
3 Make predictions
Asynchronous predictions
with trained model
Batch predictions
Synchronous, low latency,
high throughput
Mount API end-point with a
single click
Real-time predictions
1 Build model
2 Validate & optimize
RESOURCES YOU CAN USE
TO LEARN MORE
aws.amazon.com/big-data/
aws.amazon.com/importexport
aws.amazon.com/directconnect
aws.amazon.com/kinesis
aws.amazon.com/rds
aws.amazon.com/redshift
aws.amazon.com/elasticmapreduce
Big Data Analytics Options on AWS
Erik Swensson
December 2014
Amazon Web Services – Big Data Analytics Options on AWS December 2014
Page 2 of 29
Contents
Contents 2
Abstract 3
Introduction 3
The AWS Advantage in Big Data Analytics 3
Amazon Redshift 4
Amazon Kinesis 7
Amazon Elastic MapReduce 10
Amazon DynamoDB 14
Application on Amazon EC2 17
Solving Big Data Problems 19
Example 1: Enterprise Data Warehouse 21
Example 2: Capturing and Analyzing Sensor Data 23
Conclusion 27
Further Reading 27
Amazon Web Services – Big Data Analytics Options on AWS December 2014
Page 3 of 29
Abstract
Amazon Web Services (AWS) is a flexible, cost-effective, easy-to-use cloud computing
platform. The AWS Cloud delivers a comprehensive portfolio of secure and scalable
cloud computing services in a self-service, pay-as-you-go model, with zero capital
expense needed to handle your big data analytics workloads, such as real-time
streaming analytics, data warehousing, NoSQL and relational databases, object storage,
analytics tools, and data workflow services. This whitepaper provides an overview of the
different big data options available in the AWS Cloud for architects, data scientists, and
developers. For each of the big data analytics options, this paper describes the
following:
Ideal usage patterns
Performance
Durability and availability
Cost model
Scalability
Elasticity
Interfaces
Anti-patterns
This paper describes two scenarios showcasing the analytics options in use and
provides additional resources to get started with big data analytics on AWS.
Introduction
As we become a more digital society the amount of data being created and collected is
accelerating significantly. The analysis of this ever-growing data set becomes a
challenge using traditional analytical tools. Innovation is required to bridge the gap
between the amount of data that is being generated and the amount of data that can be
analyzed effectively. Big data tools and technologies offer ways to efficiently analyze
data to better understand customer preferences, to gain a competitive advantage in the
marketplace, and to use as a lever to grow your business. The AWS ecosystem of
analytical solutions is specifically designed to handle this growing amount of data and
provide insight into ways your business can collect and analyze it.
The AWS Advantage in Big Data Analytics
Analyzing large data sets requires significant compute capacity that can vary in size
based on the amount of input data and the analysis required. This characteristic of big
data workloads is ideally suited to the pay-as-you-go cloud computing model, where
applications can easily scale up and down based on demand. As requirements change
you can easily resize your environment (horizontally or vertically) on AWS to meet your
Amazon Web Services – Big Data Analytics Options on AWS December 2014
Page 4 of 29
needs without having to wait for additional hardware, or being required to over-invest to
provision enough capacity. For mission-critical applications on a more traditional
infrastructure, system designers have no choice but to over-provision, because a surge
in additional data due to an increase in business need must be something the system
can handle. By contrast, on AWS you can provision more capacity and compute in a
matter of minutes, meaning that your big data applications grow and shrink as demand
dictates, and your system runs as close to optimal efficiency as possible. In addition, you
get flexible computing on a world-class infrastructure with access to the many different
geographic regions that AWS offers1
, along with the ability to utilize other scalable
services that Amazon offers such as Amazon Simple Storage Service (S3)2
and AWS
Data Pipeline.3
These capabilities of the AWS platform make it an extremely good fit for
solving big data problems. You can read about many customers that have implemented
successful big data analytics workloads on AWS on the AWS case studies web page. 4
Amazon Redshift
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that
makes it simple and cost-effective to efficiently analyze all your data using your existing
business intelligence tools.5
It is optimized for datasets ranging from a few hundred
gigabytes to a petabyte or more, and is designed to cost less than a tenth of the cost of
most traditional data warehousing solutions. Amazon Redshift delivers fast query and
I/O performance for virtually any size dataset by using columnar storage technology
while parallelizing and distributing queries across multiple nodes. As a managed service,
automation is provided for most of the common administrative tasks associated with
provisioning, configuring, monitoring, backing up, and securing a data warehouse,
making it very easy and inexpensive to manage and maintain. This automation allows
you to build a petabyte-scale data warehouse in minutes, a task that has traditionally
taken weeks, or months, to complete in an on-premises implementation.
Ideal Usage Pattern
Amazon Redshift is ideal for online analytical processing (OLAP) using your existing
business intelligence tools. Organizations are using Amazon Redshift to do the following:
Analyze global sales data for multiple products
Store historical stock trade data
Analyze ad impressions and clicks
Aggregate gaming data
Analyze social trends
1
http://aws.amazon.com/about-aws/globalinfrastructure/
2
http://aws.amazon.com/s3/
3
http://aws.amazon.com/datapipeline/
4
http://aws.amazon.com/solutions/case-studies/big-data/
5
http://aws.amazon.com/redshift/
AWS White Paper - Big Data Analytics Options on AWS
aws.amazon.com/solutions/case-studies/analytics/
aws.amazon.com/solutions/case-studies/big-data/
blogs.aws.amazon.com/bigdata/
aws.amazon.com/architecture/
Certification
aws.amazon.com/certification
Self-Paced Labs
aws.amazon.com/training/

self-paced-labs
Try products, gain new skills, and
get hands-on practice working
with AWS technologies
aws.amazon.com/training
Training
Validate your proven skills and
expertise with the AWS platform
Build technical expertise to
design and operate scalable,
efficient applications on AWS
AWS Training & Certification
Follow
us
for m
ore
events
&
w
ebinars
@AWScloud for Global AWS News & Announcements
@AWS_UKI for local AWS events & news
@IanMmmm
Ian Massingham — Technical Evangelist

Contenu connexe

Tendances

Tendances (20)

AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
Keynote AWS Experience Day Cali
Keynote AWS Experience Day CaliKeynote AWS Experience Day Cali
Keynote AWS Experience Day Cali
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 

En vedette

EC2 Masterclass from the AWS User Group Scotland Meetup
EC2 Masterclass from the AWS User Group Scotland MeetupEC2 Masterclass from the AWS User Group Scotland Meetup
EC2 Masterclass from the AWS User Group Scotland Meetup
Ian Massingham
 
Social & Mobile Apps journey through the cloud
Social & Mobile Apps   journey through the cloudSocial & Mobile Apps   journey through the cloud
Social & Mobile Apps journey through the cloud
Ian Massingham
 
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
Amazon Web Services
 

En vedette (17)

Partner Event Slides - 24 April 2014
Partner Event Slides - 24 April 2014Partner Event Slides - 24 April 2014
Partner Event Slides - 24 April 2014
 
AWS CloudFormation Masterclass
AWS CloudFormation Masterclass AWS CloudFormation Masterclass
AWS CloudFormation Masterclass
 
Scalable Web Applications Session at Codebase
Scalable Web Applications Session at CodebaseScalable Web Applications Session at Codebase
Scalable Web Applications Session at Codebase
 
EC2 Masterclass from the AWS User Group Scotland Meetup
EC2 Masterclass from the AWS User Group Scotland MeetupEC2 Masterclass from the AWS User Group Scotland Meetup
EC2 Masterclass from the AWS User Group Scotland Meetup
 
Social & Mobile Apps journey through the cloud
Social & Mobile Apps   journey through the cloudSocial & Mobile Apps   journey through the cloud
Social & Mobile Apps journey through the cloud
 
Digipack creation
Digipack creationDigipack creation
Digipack creation
 
Scalable Web Apps - Journey Through the Cloud
Scalable Web Apps - Journey Through the CloudScalable Web Apps - Journey Through the Cloud
Scalable Web Apps - Journey Through the Cloud
 
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
 
AWS DevOps Event - Innovating with DevOps on AWS
AWS DevOps Event - Innovating with DevOps on AWSAWS DevOps Event - Innovating with DevOps on AWS
AWS DevOps Event - Innovating with DevOps on AWS
 
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
Indian Case Studies - How AWS Customers Have Successfully Built and Migrated ...
 
AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv Loft
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless Cloud
 
Cost Optimisation with AWS
Cost Optimisation with AWSCost Optimisation with AWS
Cost Optimisation with AWS
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 

Similaire à Data Analysis - Journey Through the Cloud

Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Amazon Web Services
 

Similaire à Data Analysis - Journey Through the Cloud (20)

Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
Cloud storage
Cloud storageCloud storage
Cloud storage
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Day 2 Intro AWS.pptx
Day 2 Intro AWS.pptxDay 2 Intro AWS.pptx
Day 2 Intro AWS.pptx
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Intro-to-AWS.pptx
Intro-to-AWS.pptxIntro-to-AWS.pptx
Intro-to-AWS.pptx
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Overview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWSOverview of AWS by Andy Jassy - SVP, AWS
Overview of AWS by Andy Jassy - SVP, AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro SessionAWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11
 
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
 

Plus de Ian Massingham

Plus de Ian Massingham (20)

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relations
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best Practices
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop Keynote
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management Track
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without Servers
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day Roadshow
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow Intro
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda Breakout
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry Pi
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups
 
AWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroAWSome Day London January 2016 Intro
AWSome Day London January 2016 Intro
 
AWS AWSome Day London October 2015
AWS AWSome Day London October 2015 AWS AWSome Day London October 2015
AWS AWSome Day London October 2015
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Data Analysis - Journey Through the Cloud

  • 1. Journey Through the Cloud ianmas@amazon.com @IanMmmm Ian Massingham — Technical Evangelist Data Analysis
  • 2. Journey Through the Cloud Learn from the journeys taken by other AWS customers Discover best practices that you can use to bootstrap your projects Common use cases and adoption models for the AWS Cloud 1 2 3
  • 3. Data Analysis Collect and store Big Data in the AWS Cloud
 Meet the challenge of the increasing volume, variety, and velocity of data
 Reduce costs, scale to meet demand & increase the speed of innovation
 Make use of solutions for every stage of the big data lifecycle
  • 4. Agenda Why Build Big Data Applications on AWS? Collecting Big Data in the AWS Cloud
 Real-time Streaming and Analysis
 Big Data Cloud Storage Solutions AWS Database Services 
 Analytics with Hadoop with Amazon EMR
 Case Studies & Useful Resources
  • 5. WHY BUILD BIG DATA APPLICATIONS ON AWS?
  • 6. It’s Never Been Easier And Less Expensive To 
 Collect, Store, Analyze & Share Data
  • 7. We are constantly producing more data
  • 8. From all types of industries
  • 9. From a diverse range of sources
  • 10. Sources of Truth Analysis PlatformsHigh Performance Databases AWS Services For Big Data Workloads Amazon S3 Amazon EFS Amazon Redshift Amazon DynamoDB Amazon Aurora Amazon EMR Real time Amazon Kinesis
  • 11. Broad Analytics Usage In The AWS Cloud Discovery Development Delivery Risk Marketing Reporting Trade Sales
  • 12. WHEN OUR ANALYSTS FIRST STARTED TO DO QUERIES ON AMAZON REDSHIFT, THEY THOUGHT IT WAS BROKEN BECAUSE IT WAS WORKING SO FAST. John  O’Donovan   CTO   Financial  Times • Needed a way to increase speed, performance and flexibility of data analysis at a low cost • Using AWS enabled FT to run queries 98% faster than previously—helping FT make business decisions quickly • Easier to track and analyze trends • Reduced infrastructure costs by 80% over traditional data center model Financial Times Uses AWS to Reduce Infrastructure Costs by 80% Find out more here: aws.amazon.com/solutions/case-studies/financial-times/
  • 13. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE
  • 14. COLLECTING BIG DATA IN THE AWS CLOUD
  • 15. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE Amazon S3 Multipart upload AWS Import/Export AWS Direct Connect AWS Storage Gateway
  • 16. Amazon S3 Secure, durable, highly-scalable object storage
 Accessible via a simple web services interface Store & retrieve any amount of data Use alone or together with other AWS services Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
  • 17. Amazon S3 Multipart Upload Large file (Size < 5TB) Large object (Size < 5TB) Split file into parts Send parts to S3 S3 rejoins the parts
  • 18. AWS Import/Export Move large amounts of data into and out of the AWS cloud using portable storage devices Transfer your data directly onto and off of storage devices using Amazon’s high-speed internal network For significant data sets, AWS Import/Export is often faster than Internet transfer and more cost effective than upgrading your connectivity Supports upload & download from S3 & upload to Amazon EBS snapshots & Amazon Glacier Vaults aws.amazon.com/importexport/
  • 19. When to Use AWS Import/Export aws.amazon.com/importexport/
  • 20. AWS Direct Connect Makes it easy to establish a dedicated network connection from your premises to AWS Establish private connectivity between AWS & your datacenter, office, or colocation environment Reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience The dedicated connection can be partitioned into multiple virtual interfaces using 802.1q VLANs aws.amazon.com/directconnect
  • 21. AWS Direct Connect Locations & Partners aws.amazon.com/directconnect/partners/ 1GB and 10GB ports are available from AWS 50Mbps, 100Mbps, 200Mbps, 300Mbps, 400Mbps, and 500Mbps can be ordered from any APN partners supporting AWS Direct Connect
  • 22. AWS Storage Gateway An on-premises software appliance connecting with cloud-based storage Supports industry-standard storage protocols that work with your existing applications and workflows Provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier aws.amazon.com/storagegateway/
  • 23. AWS Storage Gateway Designed for user with other AWS Services Enables you to easily mirror data from your on premises environment for access within the AWS Cloud Easy to integrate into existing ETL workflows aws.amazon.com/storagegateway/
  • 25. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE Amazon Kinesis
  • 26. Amazon Kinesis A fully managed, cloud-based service for real-time data processing over large, distributed data streams Continuously capture and store terabytes of data per hour from hundreds of thousands of sources Emit data to other AWS services such as Amazon S3, Amazon Redshift, Amazon Elastic Map Reduce (Amazon EMR) aws.amazon.com/kinesis
  • 27.
  • 28. As a startup, using AWS has allowed us to scale nicely and use resources without spending a lot of capital. Brian  Langel   CTO   Dash • Needed scale IT resources to create an app that would offer real-time information to drivers • Developed and deployed the Dash application on the AWS Cloud • Streams more than 1 TB of real-time data per day using Amazon Kinesis and processes billions of entries using Amazon DynamoDB • Scaled up to support large traffic spikes–several thousand updates per second–in app usage • Reduced operating costs by $200,000 per year Using AWS, Dash Streams More Than 1 TB of Real- Time Data Per Day Find out more here: aws.amazon.com/solutions/case-studies/dash/
  • 29. Millions of sources producing 100s of TB per hour Front End Authentication Authorization AZAZAZ Durable, consistent replicas across three AWS Availability Zones Amazon Web Services Region Inexpensive: $0.0165 per million PUT Payload Units (in EU Ireland) Aggregate and archive to S3 Real-time dashboards and alarms Machine learning algorithms Aggregate analysis in Hadoop or a data warehouse Ordered stream of events supporting multiple readers Amazon Kinesis Architecture New
  • 30. New
  • 32. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE Amazon S3 Amazon Glacier
 Amazon EBS
  • 33. Amazon S3 Secure, durable, highly-scalable object storage
 Accessible via a simple web services interface Store & retrieve any amount of data Use alone or together with other AWS services Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
  • 34. Amazon S3 Allows you to decouple compute from storage for analytics workloads Amazon S3 Masterclass webinar: https://youtu.be/VC0k-noNwOU
  • 35. Amazon Glacier Durable Designed for 99.999999999% durability of archives Cost Effective Write-once, read-never. Cost effective for long term storage. Pay for accessing data aws.amazon.com/glacier
  • 36. Amazon Elastic Block Store (EBS) Persistent block level storage volumes For use with Amazon EC2 instances Automatically replicated within Availability Zones Offer consistent and low-latency performance EBS Snapshot (stored on S3) EBS Volume EC2 Instance aws.amazon.com/ebs
  • 37. EC2 Instance Very Fast Block devices to attach to EC2 Instances Fast API Accessible Object Storage 3-5 hour access latency Intended for write once, read never use-cases Elastic Block Store Amazon EBS Simple Storage Service Amazon S3 Amazon Glacier 1GB to 16TB Volumes up to 20,000 IOPS per volume with EBS PIOPS Highly Scalable Object Store Objects from 1 byte to 5TB 99.99999999% durability Long term archive storage Extremely low cost per GB 99.99999999% durability
  • 39. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE Amazon RDS Amazon Redshift Amazon DynamoDB
  • 40. Amazon Relational Database Service (RDS) Easy to set up, operate, and scale a relational database Provides cost-efficient and resizable capacity Manages time-consuming database management tasks aws.amazon.com/rds/
  • 41. Amazon Redshift A fast, fully managed, petabyte-scale data warehouse Cost-effectively & efficiently analyze all your data Use existing Business Intelligence tools Fast query performance using columnar storage technology aws.amazon.com/redshift/
  • 42. Getting Started with Amazon Redshift aws.amazon.com/redshift/getting-started/ 2 Month Free Trial 6 Step Getting Started Tutorial Best Practices Guides — loading data, table design & performance tuning Cluster Management Guide
  • 43. BI & ETL Tools for Amazon Redshift aws.amazon.com/redshift/partners/
  • 44. Amazon DynamoDB A fast and flexible NoSQL database service Consistent, single-digit millisecond latency at any scale A fully managed cloud database Supports both document and key-value store models Flexible data model and reliable performance aws.amazon.com/dynamodb/
  • 46. COLLECT STREAM STORE RDBMS DATA WAREHOUSE NOSQL ANALYTICS➤ ➤ ➤ ➤ GENERATE ➤ ➤ ARCHIVE Amazon EMR
  • 49. Amazon Elastic MapReduce (EMR) A managed Hadoop framework Quickly & cost-effectively process vast amounts of data Dynamically scale across fleets of Amazon EC2 instances Run other popular distributed frameworks such as Spark aws.amazon.com/emr/
  • 50. Amazon Elastic MapReduce (EMR) Splits data in pieces using the HDFS filesystem Manages distributed access to data and task execution Gathers the results and deposits these in S3 for access
  • 52. Lots of actions by John Smith Very large clickstream logging data (e.g TBs)
  • 53. Lots of actions by John Smith Split the log into many small pieces Very large clickstream logging data (e.g TBs)
  • 54. Lots of actions by John Smith Split the log into many small pieces Process in an EMR cluster Very large clickstream logging data (e.g TBs)
  • 55. Lots of actions by John Smith Split the log into many small pieces Process in an EMR cluster Aggregate the results from all the nodes Very large clickstream logging data (e.g TBs)
  • 56. Lots of actions by John Smith Split the log into many small pieces Process in an EMR cluster Aggregate the results from all the nodes Very large clickstream logging data (e.g TBs) What John Smith did
  • 57. Insight in a fraction of the time Very large clickstream logging data (e.g TBs) What John Smith did
  • 58. Analytics languages/enginesData management Amazon Redshift AWS Data Pipeline Amazon Kinesis Amazon S3 Amazon DynamoDB Amazon RDSAmazon EMR Data Sources
  • 59. DEMO: ANALYZING AMAZON S3 ACCESS LOGS WITH EMR AND HUE
  • 61. Email targeting Recommendations Social news Digital health Language processing Auto-scaling More & More Customers Are Using Prediction Technologies
  • 62. Large opportunity to apply ML Low barrier to entry
  • 63. Easily create machine learning models Visualize and optimize models Put models into production in seconds Battle-hardened technology New Introducing Amazon Machine Learning aws.amazon.com/ml/
  • 64. Train and optimize models on GBs of data Batch process predictions Real-time prediction API in one-click No servers to provision or manage Easy to Use, High Performance
  • 65. 3 Make predictions Asynchronous predictions with trained model Batch predictions Synchronous, low latency, high throughput Mount API end-point with a single click Real-time predictions 1 Build model 2 Validate & optimize
  • 66. RESOURCES YOU CAN USE TO LEARN MORE
  • 69. Big Data Analytics Options on AWS Erik Swensson December 2014 Amazon Web Services – Big Data Analytics Options on AWS December 2014 Page 2 of 29 Contents Contents 2 Abstract 3 Introduction 3 The AWS Advantage in Big Data Analytics 3 Amazon Redshift 4 Amazon Kinesis 7 Amazon Elastic MapReduce 10 Amazon DynamoDB 14 Application on Amazon EC2 17 Solving Big Data Problems 19 Example 1: Enterprise Data Warehouse 21 Example 2: Capturing and Analyzing Sensor Data 23 Conclusion 27 Further Reading 27 Amazon Web Services – Big Data Analytics Options on AWS December 2014 Page 3 of 29 Abstract Amazon Web Services (AWS) is a flexible, cost-effective, easy-to-use cloud computing platform. The AWS Cloud delivers a comprehensive portfolio of secure and scalable cloud computing services in a self-service, pay-as-you-go model, with zero capital expense needed to handle your big data analytics workloads, such as real-time streaming analytics, data warehousing, NoSQL and relational databases, object storage, analytics tools, and data workflow services. This whitepaper provides an overview of the different big data options available in the AWS Cloud for architects, data scientists, and developers. For each of the big data analytics options, this paper describes the following: Ideal usage patterns Performance Durability and availability Cost model Scalability Elasticity Interfaces Anti-patterns This paper describes two scenarios showcasing the analytics options in use and provides additional resources to get started with big data analytics on AWS. Introduction As we become a more digital society the amount of data being created and collected is accelerating significantly. The analysis of this ever-growing data set becomes a challenge using traditional analytical tools. Innovation is required to bridge the gap between the amount of data that is being generated and the amount of data that can be analyzed effectively. Big data tools and technologies offer ways to efficiently analyze data to better understand customer preferences, to gain a competitive advantage in the marketplace, and to use as a lever to grow your business. The AWS ecosystem of analytical solutions is specifically designed to handle this growing amount of data and provide insight into ways your business can collect and analyze it. The AWS Advantage in Big Data Analytics Analyzing large data sets requires significant compute capacity that can vary in size based on the amount of input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can easily scale up and down based on demand. As requirements change you can easily resize your environment (horizontally or vertically) on AWS to meet your Amazon Web Services – Big Data Analytics Options on AWS December 2014 Page 4 of 29 needs without having to wait for additional hardware, or being required to over-invest to provision enough capacity. For mission-critical applications on a more traditional infrastructure, system designers have no choice but to over-provision, because a surge in additional data due to an increase in business need must be something the system can handle. By contrast, on AWS you can provision more capacity and compute in a matter of minutes, meaning that your big data applications grow and shrink as demand dictates, and your system runs as close to optimal efficiency as possible. In addition, you get flexible computing on a world-class infrastructure with access to the many different geographic regions that AWS offers1 , along with the ability to utilize other scalable services that Amazon offers such as Amazon Simple Storage Service (S3)2 and AWS Data Pipeline.3 These capabilities of the AWS platform make it an extremely good fit for solving big data problems. You can read about many customers that have implemented successful big data analytics workloads on AWS on the AWS case studies web page. 4 Amazon Redshift Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.5 It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more, and is designed to cost less than a tenth of the cost of most traditional data warehousing solutions. Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology while parallelizing and distributing queries across multiple nodes. As a managed service, automation is provided for most of the common administrative tasks associated with provisioning, configuring, monitoring, backing up, and securing a data warehouse, making it very easy and inexpensive to manage and maintain. This automation allows you to build a petabyte-scale data warehouse in minutes, a task that has traditionally taken weeks, or months, to complete in an on-premises implementation. Ideal Usage Pattern Amazon Redshift is ideal for online analytical processing (OLAP) using your existing business intelligence tools. Organizations are using Amazon Redshift to do the following: Analyze global sales data for multiple products Store historical stock trade data Analyze ad impressions and clicks Aggregate gaming data Analyze social trends 1 http://aws.amazon.com/about-aws/globalinfrastructure/ 2 http://aws.amazon.com/s3/ 3 http://aws.amazon.com/datapipeline/ 4 http://aws.amazon.com/solutions/case-studies/big-data/ 5 http://aws.amazon.com/redshift/ AWS White Paper - Big Data Analytics Options on AWS
  • 74. Certification aws.amazon.com/certification Self-Paced Labs aws.amazon.com/training/
 self-paced-labs Try products, gain new skills, and get hands-on practice working with AWS technologies aws.amazon.com/training Training Validate your proven skills and expertise with the AWS platform Build technical expertise to design and operate scalable, efficient applications on AWS AWS Training & Certification
  • 75. Follow us for m ore events & w ebinars @AWScloud for Global AWS News & Announcements @AWS_UKI for local AWS events & news @IanMmmm Ian Massingham — Technical Evangelist