SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@julsimon
Webinar “Salon du Big Data”
02/03/2016
Simplify Big Data with AWS
Julien Simon, Principal Technical Evangelist
Simplify Big Data Processing
ingest /
collect
store
process /
analyze
consume /
visualize
Time to Answer (Latency)
Throughput
Cost
Collect /
Ingest
Types of Data
•  Transactional
•  Database reads & writes (OLTP)
•  Cache
•  Search
•  Logs
•  Streams
•  File
•  Log files (/var/log)
•  Log collectors & frameworks
•  Stream
•  Log records
•  Sensors & IoT data
Database
File
Storage
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Logging
IoT
Applications
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Search
Collect Store
Logging
IoT
Store
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
File
Storage
Search
Collect Store
Logging
IoT
Applications
ü 
File
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
Search
Collect Store
Logging
IoT
Applications
ü 
Database +
Search
Tier
A
Amazon

S3
iOS
 Android
Web Apps
Logstash
Amazon
RDS / Aurora
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store
ü Logging
IoT
Applications
Database + Search Tier Anti-pattern
RDBMS
Database + Search Tier
Applications
What Data Store Should I Use?
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
Aurora
Amazon
Elasticsearch
Amazon
EMR (HDFS)
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec sec,min,hrs ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(64 TB
Max)
GB–TB GB–PB
(~nodes)
MB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(400 KB
max)
KB
(64 KB)
KB
(1 MB max)
MB-GB KB-GB
(5 TB max)
GB
(40 TB max)
Request rate High -
Very High
Very High
(no limit)
High High Low – Very
High
Low –
Very High
(no limit)
Very Low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10
Durability Low -
Moderate
Very High Very High High High Very High Very High
Hot Data Warm Data Cold Data
Hot Data Warm Data Cold Data
Process /
Analyze
AnalyzeA
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Hot
Cold
Warm
Hot
Hot
ML
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store Analyze
ü  ü 
Analysis Tools and Frameworks
Machine Learning
•  Mahout, Spark ML, Amazon ML
Interactive Analytics
•  Amazon Redshift, Presto, Impala, Spark
Batch Processing
•  MapReduce, Hive, Pig, Spark
Stream Processing
•  Micro-batch: Spark Streaming, KCL, Hive, Pig
•  Real-time: Storm, AWS Lambda, KCL
Amazon
Redshift
Impala
Pig
Amazon Machine
Learning
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
StreamProcessing
Batch
Interactive
ML
Analyze
What Data Processing Technology Should I Use?
Amazon
Redshift
Impala Presto Spark Hive
Query
Latency
Low Low Low Low Medium (Tez) –
High (MapReduce)
Durability High High High High High
Data Volume 1.6 PB
Max
~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR)
Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3
SQL
Compatibility
High Medium High Low (SparkSQL) Medium (HQL)
Query Latency
High
(Low is better)
Medium
Consume /
Visualize
Collect Store Analyze Consume
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Amazon
QuickSight
Consume
•  Predictions
•  Analysis and Visualization
•  Notebooks
•  IDE
•  Applications & API
Consume
Analysis&Visualization
Amazon
QuickSight
Notebooks
Predictions
Apps & APIs
IDE
Store Analyze ConsumeETL
Business
users
Data Scientist,
Developers
Putting It All Together
Collect Store Analyze Consume
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Amazon
QuickSight
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Interactive
& Batch
Analytics
Producer Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Batch
Interactive
Batch Prediction
Real-time Prediction
Real-time Analytics
Producer
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time Prediction
KPI
process
store
DynamoDB
Streams
Amazon
Kinesis
Batch Layer
Amazon
Kinesis
data
process
store
Lambda
Architecture
Amazon
Kinesis S3
Connector
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed Layer
answer
Serving
Layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Spark Streaming
Storm
Summary
•  Use the right tool for the job
•  Latency, throughput, access patterns
•  Leverage AWS managed services
•  No/low admin
•  Be cost conscious
•  Big data ≠ big cost
Thank you. Let’s keep in touch!
@aws_actus @julsimon
facebook.com/groups/AWSFrance/
AWS User Groups in Paris,
Lyon, Nantes, Lille & Rennes
(meetup.com)
March 7-8
AWS Summit
May 31st
April 20-22
March 23-24 April 6-7 (Lyon)
April 25
March 16
Customer references & further reading
•  Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/
•  Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/
•  Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/
•  Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/
•  Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/
•  Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/
•  AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/
•  Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/
•  Whitepaper: “Big Data Analytics Options on AWS” :
http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
•  AWS Big Data blog: https://blogs.aws.amazon.com/bigdata

Contenu connexe

Tendances

Automate Migration to AWS with Datapipe
Automate Migration to AWS with DatapipeAutomate Migration to AWS with Datapipe
Automate Migration to AWS with Datapipe
Amazon Web Services
 

Tendances (20)

Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECS
 
SMC302 Building Serverless Web Applications
SMC302 Building Serverless Web ApplicationsSMC302 Building Serverless Web Applications
SMC302 Building Serverless Web Applications
 
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
 
SRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #WinningSRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #Winning
 
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
Scaling on AWS for the First 10 Million Users (ARC206) | AWS re:Invent 2013
 
Automate Migration to AWS with Datapipe
Automate Migration to AWS with DatapipeAutomate Migration to AWS with Datapipe
Automate Migration to AWS with Datapipe
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
AWS Batch: Simplifying batch computing in the cloud
AWS Batch: Simplifying batch computing in the cloudAWS Batch: Simplifying batch computing in the cloud
AWS Batch: Simplifying batch computing in the cloud
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 

En vedette

En vedette (20)

AWS Big Data combo
AWS Big Data comboAWS Big Data combo
AWS Big Data combo
 
Influencer marketing: Buying and Selling Audience Impressions
Influencer marketing: Buying and Selling Audience ImpressionsInfluencer marketing: Buying and Selling Audience Impressions
Influencer marketing: Buying and Selling Audience Impressions
 
Scale, baby, scale! (June 2016)
Scale, baby, scale! (June 2016)Scale, baby, scale! (June 2016)
Scale, baby, scale! (June 2016)
 
Machine Learning for everyone
Machine Learning for everyoneMachine Learning for everyone
Machine Learning for everyone
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)
 
A 60-minute tour of AWS Compute (November 2016)
A 60-minute tour of AWS Compute (November 2016)A 60-minute tour of AWS Compute (November 2016)
A 60-minute tour of AWS Compute (November 2016)
 
Hands-on with AWS IoT (November 2016)
Hands-on with AWS IoT (November 2016)Hands-on with AWS IoT (November 2016)
Hands-on with AWS IoT (November 2016)
 
AWS Machine Learning Workshp
AWS Machine Learning WorkshpAWS Machine Learning Workshp
AWS Machine Learning Workshp
 
Hands-on with AWS IoT
Hands-on with AWS IoTHands-on with AWS IoT
Hands-on with AWS IoT
 
Intro to AWS Machine Learning
Intro to AWS Machine LearningIntro to AWS Machine Learning
Intro to AWS Machine Learning
 
Deep Dive AWS CloudTrail
Deep Dive AWS CloudTrailDeep Dive AWS CloudTrail
Deep Dive AWS CloudTrail
 
Amazon Machine Learning: Empowering Developers to Build Smart Applications
Amazon Machine Learning: Empowering Developers to Build Smart ApplicationsAmazon Machine Learning: Empowering Developers to Build Smart Applications
Amazon Machine Learning: Empowering Developers to Build Smart Applications
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
AWS re:Invent 2016: Machine Learning State of the Union Mini Con (MAC206)
AWS re:Invent 2016: Machine Learning State of the Union Mini Con (MAC206)AWS re:Invent 2016: Machine Learning State of the Union Mini Con (MAC206)
AWS re:Invent 2016: Machine Learning State of the Union Mini Con (MAC206)
 
IoT: it's all about Data!
IoT: it's all about Data!IoT: it's all about Data!
IoT: it's all about Data!
 
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine Learning
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Amazon Machine Learning
Amazon Machine LearningAmazon Machine Learning
Amazon Machine Learning
 

Similaire à Simplify Big Data with AWS

Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Similaire à Simplify Big Data with AWS (20)

Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 

Plus de Julien SIMON

Plus de Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Simplify Big Data with AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. @julsimon Webinar “Salon du Big Data” 02/03/2016 Simplify Big Data with AWS Julien Simon, Principal Technical Evangelist
  • 2. Simplify Big Data Processing ingest / collect store process / analyze consume / visualize Time to Answer (Latency) Throughput Cost
  • 4. Types of Data •  Transactional •  Database reads & writes (OLTP) •  Cache •  Search •  Logs •  Streams •  File •  Log files (/var/log) •  Log collectors & frameworks •  Stream •  Log records •  Sensors & IoT data Database File Storage Stream Storage A iOS Android Web Apps Logstash Logging IoT Applications Transactional Data File Data Stream Data Mobile Apps Search Data Search Collect Store Logging IoT
  • 8. Database + Search Tier A Amazon
 S3 iOS Android Web Apps Logstash Amazon RDS / Aurora Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store ü Logging IoT Applications
  • 9. Database + Search Tier Anti-pattern RDBMS Database + Search Tier Applications
  • 10. What Data Store Should I Use? Amazon ElastiCache Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Amazon EMR (HDFS) Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec sec,min,hrs ms,sec,min (~ size) hrs Data volume GB GB–TBs (no limit) GB–TB (64 TB Max) GB–TB GB–PB (~nodes) MB–PB (no limit) GB–PB (no limit) Item size B-KB KB (400 KB max) KB (64 KB) KB (1 MB max) MB-GB KB-GB (5 TB max) GB (40 TB max) Request rate High - Very High Very High (no limit) High High Low – Very High Low – Very High (no limit) Very Low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10 Durability Low - Moderate Very High Very High High High Very High Very High Hot Data Warm Data Cold Data Hot Data Warm Data Cold Data
  • 12. AnalyzeA iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Hot Cold Warm Hot Hot ML Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store Analyze ü  ü 
  • 13. Analysis Tools and Frameworks Machine Learning •  Mahout, Spark ML, Amazon ML Interactive Analytics •  Amazon Redshift, Presto, Impala, Spark Batch Processing •  MapReduce, Hive, Pig, Spark Stream Processing •  Micro-batch: Spark Streaming, KCL, Hive, Pig •  Real-time: Storm, AWS Lambda, KCL Amazon Redshift Impala Pig Amazon Machine Learning Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce StreamProcessing Batch Interactive ML Analyze
  • 14. What Data Processing Technology Should I Use? Amazon Redshift Impala Presto Spark Hive Query Latency Low Low Low Low Medium (Tez) – High (MapReduce) Durability High High High High High Data Volume 1.6 PB Max ~Nodes ~Nodes ~Nodes ~Nodes Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR) Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3 SQL Compatibility High Medium High Low (SparkSQL) Medium (HQL) Query Latency High (Low is better) Medium
  • 16. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Amazon QuickSight
  • 17. Consume •  Predictions •  Analysis and Visualization •  Notebooks •  IDE •  Applications & API Consume Analysis&Visualization Amazon QuickSight Notebooks Predictions Apps & APIs IDE Store Analyze ConsumeETL Business users Data Scientist, Developers
  • 18. Putting It All Together
  • 19. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL
  • 20. Interactive & Batch Analytics Producer Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Impala Spark Batch Interactive Batch Prediction Real-time Prediction
  • 22. Batch Layer Amazon Kinesis data process store Lambda Architecture Amazon Kinesis S3 Connector Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed Layer answer Serving Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Spark Streaming Storm
  • 23. Summary •  Use the right tool for the job •  Latency, throughput, access patterns •  Leverage AWS managed services •  No/low admin •  Be cost conscious •  Big data ≠ big cost
  • 24. Thank you. Let’s keep in touch! @aws_actus @julsimon facebook.com/groups/AWSFrance/ AWS User Groups in Paris, Lyon, Nantes, Lille & Rennes (meetup.com) March 7-8 AWS Summit May 31st April 20-22 March 23-24 April 6-7 (Lyon) April 25 March 16
  • 25. Customer references & further reading •  Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/ •  Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/ •  Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/ •  Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/ •  Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/ •  Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/ •  AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/ •  Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/ •  Whitepaper: “Big Data Analytics Options on AWS” : http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf •  AWS Big Data blog: https://blogs.aws.amazon.com/bigdata