SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Combining Batch and Stream
Processing to Get the Best of Both
Worlds
R a j e e v S r i n i v a s a n – A W S S o l u t i o n A r c h i t e c t
U j j w a l R a t a n – A W S S o l u t i o n A r c h i t e c t
A B D 3 3 0
N o v e m b e r 2 7 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Is This Chalk Talk About ?
Problem Statement
Today, businesses are looking for ways to process large scale batch and
high velocity streaming data simultaneously in the cloud using a proven
architecture to meet their latency, accuracy, and throughput requirements.
Lambda Architecture is a data-processing architecture designed to handle
massive quantities of data by taking advantage of both batch-processing
and stream-processing methods.” —Wikipedia
Lambda Architecture not to be confused with AWS Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architect Block Diagram
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stream
Processing
Lambda Architecture – Flow Diagram
New
Data Merged
Query
SPEED LAYER
BATCH LAYER
SERVING LAYER
Batch View
Real-Time View
Master
Dataset
Pre-Compute
View
Batch Recompute
Incremental
View
Real-Time Increment
Partial
aggregates…
Real-Time Data
Partial
aggregates
Partial
aggregates
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS – Batch
Layer Landing S3
bucket
AWS Glue
ETL
Amazon
QuickSight
Data Source
Amazon Athena
Batch View
S3 bucket
AWS Glue
Data Catalog
Amazon Glue
Crawler
Amazon EMR
Batch View
S3 bucket
Serverless
Batch Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS – Speed
Layer
AWS Lambda
Amazon Kinesis
Firehose
(Incremental stream)
Amazon Kinesis
Analytics
Incremental View
S3 bucket
Amazon
Kinesis StreamData Source
Amazon EMR
Serverless
Stream Processing
Visualization
Incremental View
S3 bucket
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS
D
a
t
a
S
o
u
r
c
e
Landing
S3 bucket
AWS Lambda
Kinesis Firehose
(Incremental stream)
Kinesis Analytics
Amazon Athena
Batch View
S3 bucket
Incremental View
S3 bucket
Batch Layer
Speed Layer
Serving Layer
Kinesis Stream
V
i
s
u
a
l
i
z
a
t
i
o
n
Merged View
S3 bucket
Amazon EMR
AWS Glue
Crawler
AWS Glue
Catalog
AWS Glue
ETL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS
D
a
t
a
S
o
u
r
c
e
Landing
S3 bucket
Batch View
S3 bucket
Batch Layer
Speed Layer
Serving Layer
Kinesis Stream
V
i
s
u
a
l
i
z
a
t
i
o
n
Merged View
S3 bucket
Amazon EMR
AWS Glue
Crawler
AWS Glue
Catalog
AWS Glue
ETL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
import . . .
DataFrame dataFromS3 = sqlContext.read().json(”s3://").toDF();
dataFromS3.registerTempTable(”batchData");
. . .
val ssc = new StreamingContext(sc, …)
val kinesisStreams = (0 until numStreams).map { i => KinesisUtils.createStream(ssc, appName, streamName, endpointUrl,
regionName, InitialPositionInStream.LATEST, kinesisCheckpointInterval, StorageLevel.MEMORY_AND_DISK_2) }
val unionStreams = ssc.union(kinesisStreams)
unionStreams.foreachRDD((rdd:RDD[String])=>{
. . .
rdd.toDF().registerTempTable(”streamData")
val mergedResult = sqlContext.sql("SELECT ... FROM streamData s JOIN batchData b ON a.data = b.data ...")
mergedResult.save(”s3://... ", "parquet", SaveMode.Overwrite)
}})
ssc.start()
C omb ining Stre am and B atch Proce ssing Qu e ry – Sp ark
Querying Batch
data from Amazon S3 using Spark SQL
Querying
data from Kinesis Stream using
Spark Streaming
Merge query
using Spark SQL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Stream and Kinesis Firehose
• Server-side encryption with a AWS KMS-managed
key (SSE-KMS)
• Kinesis Stream also supports client side encryption
AWS Lambda
• KMS/customer key to encrypt in both rest and
transit
Amazon Athena
• Server side encryption with an Amazon S3-
managed key (SSE-S3)
• Server-side encryption with a AWS Key
Management Service (AWS KMS)-managed key
(SSE-KMS)
• Create table statement with TBLPROPERTIES
'has_encrypted_data'='true'
Security Options
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHITEBOARDING SESSION
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
P l e a s e c o m p l e t e y o u r s u r v e y !

Contenu connexe

Tendances

Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksAmazon Web Services
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017Amazon Web Services
 
HLC301-Simplifying Healthcare Data Management on AWS.pdf
HLC301-Simplifying Healthcare Data Management on AWS.pdfHLC301-Simplifying Healthcare Data Management on AWS.pdf
HLC301-Simplifying Healthcare Data Management on AWS.pdfAmazon Web Services
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Amazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017Amazon Web Services
 
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...Amazon Web Services
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 

Tendances (20)

Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017
 
HLC301-Simplifying Healthcare Data Management on AWS.pdf
HLC301-Simplifying Healthcare Data Management on AWS.pdfHLC301-Simplifying Healthcare Data Management on AWS.pdf
HLC301-Simplifying Healthcare Data Management on AWS.pdf
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017
 
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
BDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post MortemsBDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post Mortems
 

Similaire à Combining Batch and Stream Processing on AWS

Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaHelen Rogers
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform Amazon Web Services
 
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...Amazon Web Services
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowGary Stafford
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBAmazon Web Services
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...Amazon Web Services
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSAmazon Web Services
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your DataDanilo Poccia
 

Similaire à Combining Batch and Stream Processing on AWS (20)

Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon Elisha
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...
Bridge OLTP and Stream Processing with Amazon Kinesis, AWS Lambda, & MongoDB ...
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDS
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your Data
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Combining Batch and Stream Processing on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Combining Batch and Stream Processing to Get the Best of Both Worlds R a j e e v S r i n i v a s a n – A W S S o l u t i o n A r c h i t e c t U j j w a l R a t a n – A W S S o l u t i o n A r c h i t e c t A B D 3 3 0 N o v e m b e r 2 7 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Is This Chalk Talk About ? Problem Statement Today, businesses are looking for ways to process large scale batch and high velocity streaming data simultaneously in the cloud using a proven architecture to meet their latency, accuracy, and throughput requirements. Lambda Architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch-processing and stream-processing methods.” —Wikipedia Lambda Architecture not to be confused with AWS Lambda
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Architect Block Diagram
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stream Processing Lambda Architecture – Flow Diagram New Data Merged Query SPEED LAYER BATCH LAYER SERVING LAYER Batch View Real-Time View Master Dataset Pre-Compute View Batch Recompute Incremental View Real-Time Increment Partial aggregates… Real-Time Data Partial aggregates Partial aggregates
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Architecture on AWS – Batch Layer Landing S3 bucket AWS Glue ETL Amazon QuickSight Data Source Amazon Athena Batch View S3 bucket AWS Glue Data Catalog Amazon Glue Crawler Amazon EMR Batch View S3 bucket Serverless Batch Processing
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Architecture on AWS – Speed Layer AWS Lambda Amazon Kinesis Firehose (Incremental stream) Amazon Kinesis Analytics Incremental View S3 bucket Amazon Kinesis StreamData Source Amazon EMR Serverless Stream Processing Visualization Incremental View S3 bucket
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Architecture on AWS D a t a S o u r c e Landing S3 bucket AWS Lambda Kinesis Firehose (Incremental stream) Kinesis Analytics Amazon Athena Batch View S3 bucket Incremental View S3 bucket Batch Layer Speed Layer Serving Layer Kinesis Stream V i s u a l i z a t i o n Merged View S3 bucket Amazon EMR AWS Glue Crawler AWS Glue Catalog AWS Glue ETL
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Architecture on AWS D a t a S o u r c e Landing S3 bucket Batch View S3 bucket Batch Layer Speed Layer Serving Layer Kinesis Stream V i s u a l i z a t i o n Merged View S3 bucket Amazon EMR AWS Glue Crawler AWS Glue Catalog AWS Glue ETL
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. import . . . DataFrame dataFromS3 = sqlContext.read().json(”s3://").toDF(); dataFromS3.registerTempTable(”batchData"); . . . val ssc = new StreamingContext(sc, …) val kinesisStreams = (0 until numStreams).map { i => KinesisUtils.createStream(ssc, appName, streamName, endpointUrl, regionName, InitialPositionInStream.LATEST, kinesisCheckpointInterval, StorageLevel.MEMORY_AND_DISK_2) } val unionStreams = ssc.union(kinesisStreams) unionStreams.foreachRDD((rdd:RDD[String])=>{ . . . rdd.toDF().registerTempTable(”streamData") val mergedResult = sqlContext.sql("SELECT ... FROM streamData s JOIN batchData b ON a.data = b.data ...") mergedResult.save(”s3://... ", "parquet", SaveMode.Overwrite) }}) ssc.start() C omb ining Stre am and B atch Proce ssing Qu e ry – Sp ark Querying Batch data from Amazon S3 using Spark SQL Querying data from Kinesis Stream using Spark Streaming Merge query using Spark SQL
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Stream and Kinesis Firehose • Server-side encryption with a AWS KMS-managed key (SSE-KMS) • Kinesis Stream also supports client side encryption AWS Lambda • KMS/customer key to encrypt in both rest and transit Amazon Athena • Server side encryption with an Amazon S3- managed key (SSE-S3) • Server-side encryption with a AWS Key Management Service (AWS KMS)-managed key (SSE-KMS) • Create table statement with TBLPROPERTIES 'has_encrypted_data'='true' Security Options
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHITEBOARDING SESSION
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! P l e a s e c o m p l e t e y o u r s u r v e y !