SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Martin Schade, Solutions Architect, Amazon Web Services
January 2016
Big Data Architectural
Patterns and Best Practices
on AWS
What to Expect from the Session
Big data challenges
How to simplify big data processing
What technologies should you use?
• Why?
• How?
Reference architecture
Design patterns
Ever Increasing Big Data
Volume
Velocity
Variety
Big Data Evolution
Batch
Report
Real-time
Alerts
Prediction
Forecast
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data PipelineAmazon Kinesis
Cassandra
CloudSearch
Kinesis-
enabled
app
Lambda ML
SQS
ElastiCache
DynamoDB
Streams
Is there a reference architecture ?
What tools should I use ?
How ?
Why ?
Architectural Principles
• Decoupled “data bus”
• Data → Store → Process → Answers
• Use the right tool for the job
• Data structure, latency, throughput, access patterns, cost
• Use Lambda architecture ideas
• Immutable (append-only) log, batch/speed/serving layer
• Leverage AWS managed services
• No/low admin
• Big data ≠ big cost
Simplify Big Data Processing
ingest /
collect
store process /
analyze
consume /
visualize
Time to Answer (Latency)
Throughput
Cost
Collect /
Ingest
Types of Data
• Transactional
• Database reads & writes (OLTP)
• Cache
• Search
• Logs
• Streams
• File
• Log files (/var/log)
• Log collectors & frameworks
• Stream
• Log records
• Sensors & IoT data
Database
File
Storage
Stream
Storage
A
iOS Android
Web Apps
Logstash
LoggingIoTApplications
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Search
Collect Store
LoggingIoT
Store
Stream
Storage
A
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCacheStreamStorageFileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
File
Storage
Search
Collect Store
LoggingIoTApplications

Which stream storage should I use?
Amazon
Kinesis
DynamoDB
Streams
Amazon SQS
Amazon SNS
Kafka
Managed Yes Yes Yes No
Ordering Yes Yes No Yes
Delivery at-least-once exactly-once at-least-once at-least-once
Lifetime 7 days 24 hours 14 days Configurable
Replication 3 AZ 3 AZ 3 AZ Configurable
Throughput No Limit No Limit No Limit ~ Nodes
Parallel Clients Yes Yes No (SQS) Yes
MapReduce Yes Yes No Yes
Record size 1MB 400KB 256KB Configurable
Cost Low Higher(table cost) Low-Medium Low (+admin)
File
Storage
A
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCacheStreamStorageFileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
Search
Collect Store
LoggingIoTApplications

Why Is Amazon S3 Good for Big Data?
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Hadoop clusters & Amazon EC2 Spot instances
• Multiple distinct (Spark, Hive, Presto) clusters can use the same data
• Unlimited number of objects
• Very high bandwidth – no aggregate throughput limit
• Highly available – can tolerate AZ failure
• Designed for 99.999999999% durability
• Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy
• Secure – SSL, client/server-side encryption at rest
• Low cost
What about HDFS & Amazon Glacier?
• Use HDFS for very frequently
accessed (hot) data
• Use Amazon S3 Standard for
frequently accessed data
• Use Amazon S3 Standard –
IA for infrequently accessed
data
• Use Amazon Glacier for
archiving cold data
Database +
Search
Tier
A
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCacheStreamStorageFileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store

Database + Search Tier Anti-pattern
Database + Search Tier
Best Practice — Use the Right Tool for the Job
Data Tier
Search
Amazon
Elasticsearch
Service
Amazon
CloudSearch
Cache
Redis
Memcached
SQL
Amazon Aurora
MySQL
PostgreSQL
Oracle
SQL Server
NoSQL
Cassandra
Amazon
DynamoDB
HBase
MongoDB
Database + Search Tier
What Data Store Should I Use?
• Data structure → Fixed schema, JSON, key-value
• Access patterns → Store data in the format you will
access it
• Data / access characteristics → Hot, warm, cold
• Cost → Right cost
Data Structure and Access Patterns
Access Patterns What to use?
Put/Get (Key, Value) Cache, NoSQL
Simple relationships → 1:N, M:N NoSQL
Cross table joins, transaction, SQL SQL
Faceting, Search Search
Data Structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, Search
(Key, Value) Cache, NoSQL
Process /
Analyze
AnalyzeA
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon
Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessingBatchInteractive
Logging
StreamStorage
IoTApplications
FileStorage
Hot
Cold
Warm
Hot
Hot
ML
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store Analyze
 
Process / Analyze
Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of discovering
useful information, suggesting conclusions, and supporting
decision-making.
Examples
• Interactive dashboards → Interactive analytics
• Daily/weekly/monthly reports → Batch analytics
• Billing/fraud alerts, 1 minute metrics → Real-time analytics
• Sentiment analysis, prediction models → Machine learning
Analysis Tools and Frameworks
Machine Learning
• Mahout, Spark ML, Amazon ML
Interactive Analytics
• Amazon Redshift, Presto, Impala, Spark
Batch Processing
• MapReduce, Hive, Pig, Spark
Stream Processing
• Micro-batch: Spark Streaming, KCL, Hive, Pig
• Real-time: Storm, AWS Lambda, KCL
Amazon
Redshift
Impala
Pig
Amazon Machine
Learning
Streaming
Amazon
Kinesis
AWS
Lambda
AmazonElasticMapReduce
StreamProcessingBatchInteractiveML
Analyze
What Stream Processing Technology Should I Use?
Spark Streaming Apache Storm Amazon Kinesis
Client Library
AWS Lambda Amazon EMR (Hive,
Pig)
Scale /
Throughput
~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes
Batch or Real-
time
Real-time Real-time Real-time Real-time Batch
Manageability Yes (Amazon EMR) Do it yourself Amazon EC2 +
Auto Scaling
AWS managed Yes (Amazon EMR)
Fault Tolerance Single AZ Configurable Multi-AZ Multi-AZ Single AZ
Programming
languages
Java, Python, Scala Any language
via Thrift
Java, via
MultiLangDaemon (
.Net, Python, Ruby,
Node.js)
Node.js, Java Hive, Pig, Streaming
languages
High
What Data Processing Technology Should I Use?
Amazon
Redshift
Impala Presto Spark Hive
Query
Latency
Low Low Low Low Medium (Tez) –
High (MapReduce)
Durability High High High High High
Data Volume 2 PB
Max
~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR)
Storage Native HDFS / S3A* HDFS / S3 HDFS / S3 HDFS / S3
SQL
Compatibility
High Medium High Low (SparkSQL) Medium (HQL)
HighMedium
What about ETL?
Store Analyze
https://aws.amazon.com/big-data/partner-solutions/
ETL
Consume /
Visualize
Collect Store Analyze Consume
A
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon
Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessingBatchInteractive
Logging
StreamStorage
IoTApplications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Amazon
QuickSight
Consume
• Predictions
• Analysis and Visualization
• Notebooks
• IDE
• Applications & API
Consume
Analysis&Visualization
Amazon
QuickSight
Notebooks
Predictions
Apps & APIs
IDE
Store Analyze ConsumeETL
Business
users
Data Scientist,
Developers
Putting It All Together
Collect Store Analyze Consume
A
iOS Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon
S3
Apache
Kafka
Amazon
Glacier
Amazon
Kinesis
Amazon
DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon
Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessingBatchInteractive
Logging
StreamStorage
IoTApplications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Amazon
QuickSight
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Reference Architecture
Thank you!
Have Fun!
Design Patterns
Multi-Stage Decoupled “Data Bus”
• Multiple stages
• Storage decoupled from processing
Store Process Store Process
process
store
Multiple Processing Applications (or
Connectors) Can Read from or Write to Multiple
Data Stores
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
Amazon
Kinesis S3
Connector
Amazon S3
process
store
Processing Frameworks (KCL, Storm, Hive,
Spark, etc.) Could Read from Multiple Data
Stores
Amazon
Kinesis
AWS
Lambda
Amazon
S3
Amazon
DynamoDB
Hive SparkStorm
Amazon
Kinesis S3
Connector
process
store
Real-time Analytics
Producer
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time Prediction
KPI
process
store
DynamoDB
Streams
Amazon
Kinesis
Interactive &
Batch
Analytics
Producer Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Batch
Interactive
Batch Prediction
Real-time Prediction
Batch Layer
Amazon
Kinesis
data
process
store
Lambda Architecture
Amazon
Kinesis S3
Connector
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed Layer
answer
Serving
Layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Spark Streaming
Storm
Summary
• Build decoupled “data bus”
• Data → Store ↔ Process → Answers
• Use the right tool for the job
• Latency, throughput, access patterns
• Use Lambda architecture ideas
• Immutable (append-only) log, batch/speed/serving layer
• Leverage AWS managed services
• No/low admin
• Be cost conscious
• Big data ≠ big cost
Remember to complete
your evaluations!
Thank you!

Contenu connexe

Tendances

Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...Amazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
2016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S32016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S31Strategy
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudAmazon Web Services
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierAdrian Hornsby
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveAmazon Web Services
 
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...Amazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBAmazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)Amazon Web Services
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Amazon Web Services
 

Tendances (20)

Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
2016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S32016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S3
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
 

En vedette

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview externalmattlieber
 
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...City of Maricopa
 
Steinschaler Genussgarten
Steinschaler GenussgartenSteinschaler Genussgarten
Steinschaler GenussgartenJohann Weiss
 
Pro watch max pro class ppt5
Pro watch max pro class ppt5Pro watch max pro class ppt5
Pro watch max pro class ppt5quientravels
 
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, Germany
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, GermanyPrelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, Germany
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, GermanyStartupDorf e.V.
 
Teaching object-oriented programming in primary education. The case of the Al...
Teaching object-oriented programming in primary education. The case of the Al...Teaching object-oriented programming in primary education. The case of the Al...
Teaching object-oriented programming in primary education. The case of the Al...Vasilis Sotiroudas
 
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...Para el sector panadero es muy dificil introducirse en el mercado exterior (1...
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...EAE Business School
 
World Report on Disability
World Report on DisabilityWorld Report on Disability
World Report on Disabilitybuxybiz
 
Sentidos en torno a la “obligatoriedad” de la educación secundaria
Sentidos en torno a la “obligatoriedad” de la educación secundariaSentidos en torno a la “obligatoriedad” de la educación secundaria
Sentidos en torno a la “obligatoriedad” de la educación secundariaPolitica29
 
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...Shaun Lewis
 
Reportaje "Talento y conocimieto emprendedor en 18 minutos"
Reportaje "Talento y conocimieto emprendedor en 18 minutos"Reportaje "Talento y conocimieto emprendedor en 18 minutos"
Reportaje "Talento y conocimieto emprendedor en 18 minutos"José Carlos Martín Marco
 
03.presentacion corporativa octubre2013
03.presentacion corporativa octubre201303.presentacion corporativa octubre2013
03.presentacion corporativa octubre20134andgo
 
Celulas del sistema nervioso central y periferico
Celulas del sistema nervioso central y perifericoCelulas del sistema nervioso central y periferico
Celulas del sistema nervioso central y perifericoShirly Gazabon Cantillo
 
TaskForce #SimpleSales Presentación
TaskForce #SimpleSales PresentaciónTaskForce #SimpleSales Presentación
TaskForce #SimpleSales PresentaciónLuis Felipe Meneses
 
Catalogo Iguana Med
Catalogo Iguana MedCatalogo Iguana Med
Catalogo Iguana Medj_albertovb
 
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_Herlinda
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_HerlindaEl Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_Herlinda
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_HerlindaFRANCISCO VER APRADO
 

En vedette (20)

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview external
 
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...
Pinal Power, Turning Arizona’s Green Waste into Renewable, Reliable Baseload ...
 
Steinschaler Genussgarten
Steinschaler GenussgartenSteinschaler Genussgarten
Steinschaler Genussgarten
 
El principito
El principito El principito
El principito
 
Pro watch max pro class ppt5
Pro watch max pro class ppt5Pro watch max pro class ppt5
Pro watch max pro class ppt5
 
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, Germany
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, GermanyPrelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, Germany
Prelaunch StartupDorf Keynote @garagebilk 09/25/13 - Düsseldorf, NRW, Germany
 
Teaching object-oriented programming in primary education. The case of the Al...
Teaching object-oriented programming in primary education. The case of the Al...Teaching object-oriented programming in primary education. The case of the Al...
Teaching object-oriented programming in primary education. The case of the Al...
 
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...Para el sector panadero es muy dificil introducirse en el mercado exterior (1...
Para el sector panadero es muy dificil introducirse en el mercado exterior (1...
 
Going Global?
Going Global? Going Global?
Going Global?
 
World Report on Disability
World Report on DisabilityWorld Report on Disability
World Report on Disability
 
Diseño Gráfico.
Diseño Gráfico. Diseño Gráfico.
Diseño Gráfico.
 
Sentidos en torno a la “obligatoriedad” de la educación secundaria
Sentidos en torno a la “obligatoriedad” de la educación secundariaSentidos en torno a la “obligatoriedad” de la educación secundaria
Sentidos en torno a la “obligatoriedad” de la educación secundaria
 
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
 
Reportaje "Talento y conocimieto emprendedor en 18 minutos"
Reportaje "Talento y conocimieto emprendedor en 18 minutos"Reportaje "Talento y conocimieto emprendedor en 18 minutos"
Reportaje "Talento y conocimieto emprendedor en 18 minutos"
 
03.presentacion corporativa octubre2013
03.presentacion corporativa octubre201303.presentacion corporativa octubre2013
03.presentacion corporativa octubre2013
 
Celulas del sistema nervioso central y periferico
Celulas del sistema nervioso central y perifericoCelulas del sistema nervioso central y periferico
Celulas del sistema nervioso central y periferico
 
TaskForce #SimpleSales Presentación
TaskForce #SimpleSales PresentaciónTaskForce #SimpleSales Presentación
TaskForce #SimpleSales Presentación
 
Catalogo Iguana Med
Catalogo Iguana MedCatalogo Iguana Med
Catalogo Iguana Med
 
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_Herlinda
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_HerlindaEl Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_Herlinda
El Modelado del_Relieve_Sulma_Neri_Bertha_Teovaldo_Herlinda
 

Similaire à 2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on AWS

AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 

Similaire à 2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on AWS (20)

AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on AWS

  • 1. Martin Schade, Solutions Architect, Amazon Web Services January 2016 Big Data Architectural Patterns and Best Practices on AWS
  • 2. What to Expect from the Session Big data challenges How to simplify big data processing What technologies should you use? • Why? • How? Reference architecture Design patterns
  • 3. Ever Increasing Big Data Volume Velocity Variety
  • 5. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data PipelineAmazon Kinesis Cassandra CloudSearch Kinesis- enabled app Lambda ML SQS ElastiCache DynamoDB Streams
  • 6. Is there a reference architecture ? What tools should I use ? How ? Why ?
  • 7. Architectural Principles • Decoupled “data bus” • Data → Store → Process → Answers • Use the right tool for the job • Data structure, latency, throughput, access patterns, cost • Use Lambda architecture ideas • Immutable (append-only) log, batch/speed/serving layer • Leverage AWS managed services • No/low admin • Big data ≠ big cost
  • 8. Simplify Big Data Processing ingest / collect store process / analyze consume / visualize Time to Answer (Latency) Throughput Cost
  • 10. Types of Data • Transactional • Database reads & writes (OLTP) • Cache • Search • Logs • Streams • File • Log files (/var/log) • Log collectors & frameworks • Stream • Log records • Sensors & IoT data Database File Storage Stream Storage A iOS Android Web Apps Logstash LoggingIoTApplications Transactional Data File Data Stream Data Mobile Apps Search Data Search Collect Store LoggingIoT
  • 11. Store
  • 13. Which stream storage should I use? Amazon Kinesis DynamoDB Streams Amazon SQS Amazon SNS Kafka Managed Yes Yes Yes No Ordering Yes Yes No Yes Delivery at-least-once exactly-once at-least-once at-least-once Lifetime 7 days 24 hours 14 days Configurable Replication 3 AZ 3 AZ 3 AZ Configurable Throughput No Limit No Limit No Limit ~ Nodes Parallel Clients Yes Yes No (SQS) Yes MapReduce Yes Yes No Yes Record size 1MB 400KB 256KB Configurable Cost Low Higher(table cost) Low-Medium Low (+admin)
  • 15. Why Is Amazon S3 Good for Big Data? • Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • No need to run compute clusters for storage (unlike HDFS) • Can run transient Hadoop clusters & Amazon EC2 Spot instances • Multiple distinct (Spark, Hive, Presto) clusters can use the same data • Unlimited number of objects • Very high bandwidth – no aggregate throughput limit • Highly available – can tolerate AZ failure • Designed for 99.999999999% durability • Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy • Secure – SSL, client/server-side encryption at rest • Low cost
  • 16. What about HDFS & Amazon Glacier? • Use HDFS for very frequently accessed (hot) data • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard – IA for infrequently accessed data • Use Amazon Glacier for archiving cold data
  • 17. Database + Search Tier A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon ElastiCache SearchSQLNoSQLCacheStreamStorageFileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store 
  • 18. Database + Search Tier Anti-pattern Database + Search Tier
  • 19. Best Practice — Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch Service Amazon CloudSearch Cache Redis Memcached SQL Amazon Aurora MySQL PostgreSQL Oracle SQL Server NoSQL Cassandra Amazon DynamoDB HBase MongoDB Database + Search Tier
  • 20. What Data Store Should I Use? • Data structure → Fixed schema, JSON, key-value • Access patterns → Store data in the format you will access it • Data / access characteristics → Hot, warm, cold • Cost → Right cost
  • 21. Data Structure and Access Patterns Access Patterns What to use? Put/Get (Key, Value) Cache, NoSQL Simple relationships → 1:N, M:N NoSQL Cross table joins, transaction, SQL SQL Faceting, Search Search Data Structure What to use? Fixed schema SQL, NoSQL Schema-free (JSON) NoSQL, Search (Key, Value) Cache, NoSQL
  • 23. AnalyzeA iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Hot Cold Warm Hot Hot ML Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store Analyze  
  • 24. Process / Analyze Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Examples • Interactive dashboards → Interactive analytics • Daily/weekly/monthly reports → Batch analytics • Billing/fraud alerts, 1 minute metrics → Real-time analytics • Sentiment analysis, prediction models → Machine learning
  • 25. Analysis Tools and Frameworks Machine Learning • Mahout, Spark ML, Amazon ML Interactive Analytics • Amazon Redshift, Presto, Impala, Spark Batch Processing • MapReduce, Hive, Pig, Spark Stream Processing • Micro-batch: Spark Streaming, KCL, Hive, Pig • Real-time: Storm, AWS Lambda, KCL Amazon Redshift Impala Pig Amazon Machine Learning Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce StreamProcessingBatchInteractiveML Analyze
  • 26. What Stream Processing Technology Should I Use? Spark Streaming Apache Storm Amazon Kinesis Client Library AWS Lambda Amazon EMR (Hive, Pig) Scale / Throughput ~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes Batch or Real- time Real-time Real-time Real-time Real-time Batch Manageability Yes (Amazon EMR) Do it yourself Amazon EC2 + Auto Scaling AWS managed Yes (Amazon EMR) Fault Tolerance Single AZ Configurable Multi-AZ Multi-AZ Single AZ Programming languages Java, Python, Scala Any language via Thrift Java, via MultiLangDaemon ( .Net, Python, Ruby, Node.js) Node.js, Java Hive, Pig, Streaming languages High
  • 27. What Data Processing Technology Should I Use? Amazon Redshift Impala Presto Spark Hive Query Latency Low Low Low Low Medium (Tez) – High (MapReduce) Durability High High High High High Data Volume 2 PB Max ~Nodes ~Nodes ~Nodes ~Nodes Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR) Storage Native HDFS / S3A* HDFS / S3 HDFS / S3 HDFS / S3 SQL Compatibility High Medium High Low (SparkSQL) Medium (HQL) HighMedium
  • 28. What about ETL? Store Analyze https://aws.amazon.com/big-data/partner-solutions/ ETL
  • 30. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Amazon QuickSight
  • 31. Consume • Predictions • Analysis and Visualization • Notebooks • IDE • Applications & API Consume Analysis&Visualization Amazon QuickSight Notebooks Predictions Apps & APIs IDE Store Analyze ConsumeETL Business users Data Scientist, Developers
  • 32. Putting It All Together
  • 33. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Reference Architecture
  • 36. Multi-Stage Decoupled “Data Bus” • Multiple stages • Storage decoupled from processing Store Process Store Process process store
  • 37. Multiple Processing Applications (or Connectors) Can Read from or Write to Multiple Data Stores Amazon Kinesis AWS Lambda Amazon DynamoDB Amazon Kinesis S3 Connector Amazon S3 process store
  • 38. Processing Frameworks (KCL, Storm, Hive, Spark, etc.) Could Read from Multiple Data Stores Amazon Kinesis AWS Lambda Amazon S3 Amazon DynamoDB Hive SparkStorm Amazon Kinesis S3 Connector process store
  • 40. Interactive & Batch Analytics Producer Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Impala Spark Batch Interactive Batch Prediction Real-time Prediction
  • 41. Batch Layer Amazon Kinesis data process store Lambda Architecture Amazon Kinesis S3 Connector Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed Layer answer Serving Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Spark Streaming Storm
  • 42. Summary • Build decoupled “data bus” • Data → Store ↔ Process → Answers • Use the right tool for the job • Latency, throughput, access patterns • Use Lambda architecture ideas • Immutable (append-only) log, batch/speed/serving layer • Leverage AWS managed services • No/low admin • Be cost conscious • Big data ≠ big cost

Notes de l'éditeur

  1. Wednesday, Oct 7 2:45 PM - 3:45 PM Marcello 4501B Siva Raghupathy leads the Americas Big Data Solutions Architecture team at AWS. He guides developers and architects to build successful big data solutions on AWS. Previously, as a principal technical program manager for AWS database service, he gathered emerging NoSQL requirements and wrote the first version of DynamoDB product specification. Later, as a development manager for Amazon Relational Database Services (RDS) he drove several enhancements. Prior to AWS, he spent several years at Microsoft. Abstract: The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many services for solving big data problems. But what service should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
  2. Abstract: The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many services for solving big data problems. But what service should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
  3. Volume – 100 to 150TB a day Velocity – 1million reads and writes per second is becoming a norm Variety -> IOT/log data/ streaming data Transactional data File data Fixed Schema CSV Parquet Avero Schema-free JSON Key-value Small files, large files,
  4. Hourly server logs: were your systems misbehaving 1hr ago Weekly / Monthly Bill: what you spent this billing cycle Daily customer-preferences report from your web site’s click stream: what deal or ad to try next time Daily fraud reports: was there fraud yesterday Real-time alerts: what went wrong now Real-time spending caps: prevent overspending now Real-time analysis: what to offer the current customer now Real-time detection: block fraudulent use now I need to harness big data, fast I want more happy customers I want to save/make more money
  5. Hive Spark Storm Kafka HBase Flume Impala Cascading EMR DynamoDB S3 Redshift Kinesis RDS Glacier
  6. Before we go into solving the Big architecture, I want to introduce some “tried and test” architecture principles. Here at AWS we believe you should be using the right tool for the job – “instead of using a big swiss army knife for using a screw dreive, it will be best to use a screw drive - this is especially important for big data architectures. We’ll talk about this more. Decoupled architecture http://whatis.techtarget.com/definition/decoupled-architecture - In general, a decoupled architecture is a framework for complex work that allows components to remain completely autonomous and unaware of each other…this has been tried and battle test. Managed services – this is relatively now - Should I install Cassandra or MongoDB or CouchDB on AWS. You obviously can. Sometimes there are good reasons for doing this. Many customers still do this. Netflix is a great example. They run a multi-region Cassandra and are a poster child for how to do this. But for most customers, delegating this task to AWS makes more sense….you are better of spending your time on building features for your customers rather than building highly scalable distributed systems. Lambda Architecture - Nathan Marz designed this generic architecture addressing common requirements Twitter
  7. throughput = f (volume, request rate) latency Cost   Event to action/answer latency?
  8. http://calculator.s3.amazonaws.com/index.html#r=IAD&key=calc-BE3BA3E4-1AC5-4E7A-B542-015056D8EDAF Kinesis -> $52.14 per month SQS -> $133.42 per month for puts or $400/month (put, get, delete) DynamoDB -> $3809.88 per month (10TB of storage cost itself is $2500/month) Cost (100rpsx 35KB) $52/month $133/month * 2 = $266/month ? Amazon DynamoDB Service (US-East) $ Provisioned Throughput Capacity: $120 Indexed Data Storage: $2560.90 DynamoDB Streams: $1.3 Amazon SQS Service (US-East) Pricing Example Let’s assume that our data producers put 100 records per second in aggregate, and each record is 35KB. In this case, the total data input rate is 3.4MB/sec (100 records/sec*35KB/record). For simplicity, we assume that the throughput and data size of each record are stable and constant throughout the day. Please note that we can dynamically adjust the throughput of our Amazon Kinesis stream at any time. We first calculate the number of shards needed for our stream to achieve the required throughput. As one shard provides a capacity of 1MB/sec data input and supports 1000 records/sec, four shards provide a capacity of 4MB/sec data input and support 4000 records/sec. So a stream with four shards satisfies our required throughput of 3.4MB/sec at 100 records/sec. We then calculate our monthly Amazon Kinesis costs using Amazon Kinesis pricing in the US-East Region: Shard Hour: One shard costs $0.015 per hour, or $0.36 per day ($0.015*24). Our stream has four shards so that it costs $1.44 per day ($0.36*4). For a month with 31 days, our monthly Shard Hour cost is $44.64 ($1.44*31). PUT Payload Unit (25KB): As our record is 35KB, each record contains two PUT Payload Units. Our data producers put 100 records or 200 PUT Payload Units per second in aggregate. That is 267,840,000 records or 535,680,000 PUT Payload Units per month. As one million PUT Payload Units cost $0.014, our monthly PUT Payload Units cost is $7.499 ($0.014*535.68). Adding the Shard Hour and PUT Payload Unit costs together, our total Amazon Kinesis costs are $1.68 per day, or $52.14 per month. For $1.68 per day, we have a fully-managed streaming data infrastructure that enables us to continuously ingest 4MB of data per second, or 337GB of data per day in a reliable and elastic manner.
  9. No need to run compute clusters for storage (unlike HDFS) Can run transient Hadoop clusters & Amazon EC2 spot Instances Multiple distinct (Spark, Hive, Presto) clusters can use the same data Highly durable, highly available, highly scalable Secure Designed for 99.999999999% durability https://aws.amazon.com/s3/storage-classes/ Lifecycle Policies - migrated to S3-Standard - IA, archive to Amazon Glacier, or deleted after a specific period of time! No need to run a Hadoop cluster for storage (unlike HDFS) Unlimited number of objects Object size up to 5TB Very high bandwidth Versioning Lifecycle Policies to Achieve to Glacier Designed for 99.999999999% durability Server size encryption Highly available – SLA 99.99 (Standard) Highly scalable Low cost Event notification Cross-region replication Natively supported by almost all big data frameworks & tools
  10. 2 x 2 Matrix Structured Level of query (from none to complex) Draw down the slide
  11. Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Examples: Data Warehousing: Monthly bill, Interactive dashboards Machine Learning: Sentiment Analysis, Prediction Stream processing: Alerts, 1 minute metrics, etc. http://www.jmlr.org/papers/volume9/li08a/li08a.pdf Forecasting Web Page Views: http://nlp.stanford.edu/sentiment/ http://nlp.stanford.edu/sentiment/Methods and Observations https://en.wikipedia.org/wiki/Data_analysis
  12. Add connector Direct Acyclic Graphs? Exactly once processing & DAG? – how do you do this?? https://storm.apache.org/documentation/Rationale.html http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming
  13. Cost: Redshift – Moderate Impala - Presto – Low S3A* is an open source connector. It is not in EMR 1.2.1 – using bootstrap you can install 2.2 ( we have a bootstrap action) Query Speed Redshift – Extremely fast SQL queries Spark, Impala – Extremely Fast to Fast Hive QL Hive, Tez – Moderately Fast to Slow Hive QL Data Volume? UDFs? Manageability? http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn https://amplab.cs.berkeley.edu/benchmark/ http://nerds.airbnb.com/redshift-performance-cost/
  14. Similar to multi-tier web-app-data architectures Concept of a “data bus” or “data pipeline”
  15. Best practices BDT318 - Netflix Keystone: How Netflix Handles Data Streams Up to 8 Million Events Per Second Use Kinesis or Kafka for stream storage Use appropriate stream processing tool KCL – Simple, only for Kinesis Apache Storm – general purpose Spark Streaming – Windowing, MLlib, Statefull AWS Lambda – fully managed, no servers, stateless Use Amazon SNS for Alerts Use Amazon ML for predictions Use a managed database/cache for state management Use a visualization tool for KPI visualization
  16. Lambda Architecture best practices Use Kinesis or Kafka for stream storage Maintain an immutable (append only) raw master data in Amazon S3 - use Amazon Kinesis S3 connector or Firehose Use appropriate batch processing technologies for creating batch views Use appropriate stream processing technology for real-time views Use a serving layer with an appropriate database to serve downstream applications
  17. Before we go into solving the Big architecture, I want to introduce some “tried and test” architecture principles. Here at AWS we believe you should be using the right tool for the job – “instead of using a big swiss army knife for using a screw dreive, it will be best to use a screw drive - this is especially important for big data architectures. We’ll talk about this more. Decoupled architecture http://whatis.techtarget.com/definition/decoupled-architecture - In general, a decoupled architecture is a framework for complex work that allows components to remain completely autonomous and unaware of each other…this has been tried and battle test. Managed services – this is relatively now - Should I install Cassandra or MongoDB or CouchDB on AWS. You obviously can. Sometimes there are good reasons for doing this. Many customers still do this. Netflix is a great example. They run a multi-region Cassandra and are a poster child for how to do this. But for most customers, delegating this task to AWS makes more sense….you are better of spending your time on building features for your customers rather than building highly scalable distributed systems. Lambda Architecture -