SlideShare une entreprise Scribd logo
1  sur  11
Lambda Architecture
Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the
problem of computing arbitrary functions on arbitrary data in real time. In a real time system
the requirement is something like this -
result = function (all data)
With increasing volume of data, the query will take a significant amount of time to execute no
matter what resources we have used.
Lambda Architecture uses three layer architecture and a concept of pre-computed views to
solve this problem. Three layers are
● Batch Layer
● Speed Layer
● Serving Layer
Batch Layer
Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views.
Function of batch layer can be summarized as
batch view = function (all data)
Batch layer continuously does this job and updates batch views.
Traffic from Social Media
Serving Layer
Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views.
When batch layer computes new views, they are updated in Serving Layer by Batch Layer.
The Serving Layer can be achieved by using a random access database.
Speed Layer
While batch layer computes batch view, it will not include data which came while re-computing batch views.
The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views.
These views are called real time views.
A Speed Layer can be summarized as
real time view = function (real time view, new data)
So, our final query can be served by speed layer or serving layer.
batch view = function (all data)
real time view = function (real time view, new data)
result = merge (query (batch view), query (real time view))
An Example using Apache Spark
Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture
using Apache Spark to build this system.
Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save
it to Cassandra database table.
Batch.java
Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature.
We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this
real time view to Cassandra for simplicity.
Speed.java :
Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query
Cassandra tables to get the final result in real time.
Unique Page Views
References and image credits
http://www.databasetube.com/database/big-data-lambda-architecture/
Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren

Contenu connexe

Plus de Quovantis

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve BlankQuovantis
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!Quovantis
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenterQuovantis
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principlesQuovantis
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development managerQuovantis
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkQuovantis
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureQuovantis
 

Plus de Quovantis (7)

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenter
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principles
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development manager
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation Framework
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about Architecture
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Lambda Architecture using Apache Spark – with Java code examples

  • 1.
  • 3. Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time. In a real time system the requirement is something like this - result = function (all data) With increasing volume of data, the query will take a significant amount of time to execute no matter what resources we have used. Lambda Architecture uses three layer architecture and a concept of pre-computed views to solve this problem. Three layers are ● Batch Layer ● Speed Layer ● Serving Layer
  • 4.
  • 5. Batch Layer Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views. Function of batch layer can be summarized as batch view = function (all data) Batch layer continuously does this job and updates batch views.
  • 6. Traffic from Social Media Serving Layer Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views. When batch layer computes new views, they are updated in Serving Layer by Batch Layer. The Serving Layer can be achieved by using a random access database. Speed Layer While batch layer computes batch view, it will not include data which came while re-computing batch views. The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views. These views are called real time views. A Speed Layer can be summarized as real time view = function (real time view, new data) So, our final query can be served by speed layer or serving layer. batch view = function (all data) real time view = function (real time view, new data) result = merge (query (batch view), query (real time view))
  • 7.
  • 8. An Example using Apache Spark Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture using Apache Spark to build this system. Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save it to Cassandra database table. Batch.java
  • 9. Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature. We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this real time view to Cassandra for simplicity. Speed.java :
  • 10. Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query Cassandra tables to get the final result in real time.
  • 11. Unique Page Views References and image credits http://www.databasetube.com/database/big-data-lambda-architecture/ Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren