SlideShare une entreprise Scribd logo
1  sur  51
Lambda Architecture:
from zero to One
Serhiy Masyutin
Me
• Staff Engineer @ Lohika
• Passionate Developer
• Father
• Mountain Biker
Agenda
• Project Overview
• Architecture Evolution
• What is Lambda Architecture?
• Cluster Evolution
• What We Achieved?
Project Overview
Project Goals
• Portfolio-driven R&D project
• Focus on Technology
• Focus on Knowledge
• Focus on a new remote Team
Service designed to offload highly
concurrent scenario of live voting
Service designed to offload highly
concurrent scenario of live voting
• User puts a vote
• User requests results on campaign
• Manager requests reports on campaigns
• Admin controls the system
Architecture Goals
• SaaS Solution
• High Throughput
• Scalability
• Low Latency
Essential Data Model
• campaign { startDate, endDate }
• vote { user, campaign, timestamp }
Architecture Evolution
Votes
Start Simple
Reports
Start Simple
Java 8
Spring Boot 1.2.5
MariaDB 5.5
Angularjs 1.4
Benchmark it!
• Simple throughout scenario:
user.vote()
user.request(results)
• Stop tests when error rate raises above 5%
• Benchmark tool runs locally, targeting could server
Gatling
• An open-source load testing framework based on
Scala, Akka and Netty
• High performance
• Out-of-box HTTP support
• Ready-to-present HTML reports
• Scenario recorder and developer-friendly DSL
http://gatling.io
Gatling
scenario(“Throughout simulation").repeat(repeatCount) {
feed(voteFeeder())
.exec(http("Vote")
.post(voteLink)
.headers(sentHeaders).header("Authorization", token)
.body(StringBody("${vote}"))
.check(status.is(200)).asJSON)
.exec(http("Report")
.get(reportByOptionLink+"/${votingSchemaId}")
.headers(sentHeaders).header("Authorization", token)
.check(status.is(200)).asJSON)
}
Gatling
Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Number of concurrent users
Initial Initial no-joins
Kafka
• Publisher-subscriber
• Distributed by design
• Scalable
• Fast
• Durable
http://kafka.apache.org
Incoming Queue
Votes Reports
Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Number of concurrent users
Initial
Initial no-joins
Incoming Queue
Redis
• In-memory data structure store (set, map, etc)
• Easy leader board implementation
• HyperLogLog is its native data structure
http://redis.io
In-memory Storage
Votes
Reports
Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Number of concurrent users
Initial
Initial no-joins
Incoming Queue
In-memory Storage
• A fast and general engine for large-scale data
processing
http://spark.apache.org
Scalable Processing
Votes
Reports
TODO: Benchmarks
• Processing latency
• Latency vs Data Volume
TODO: Scalable Storage
Reports
Votes
Architecture Goals Met
• High Throughput
• Scalable Storage
• Scalable Processing
• Extensible Processing
• Low Latency Reads & Updates
Lambda Architecture
A Single Picture
http://lambda-architecture.net/img/la-overview_small.png
A Single Picture
QUERY = f_query(batch_view, realtime_view)
batch_view = f_batch(all_data)
realtime_view = f_speed(new_data, realtime_view)
Batch Layer
• Immutable append-only data store
• Batch computations produce batch views
Serving Layer
• Random reads/queries on batch views
• Batch updates from batch layer
• No need for random writes
Batch + Serving Layer
• Robustness and fault tolerance
• Scalability
• Generalization
• Extensibility
• Minimal maintenance
• Debuggability
Speed Layer
• Low latency reads and updates
• Incremental computation (different from batch one)
• Scalability
• Fault tolerance
• Minimal amount of stored data
Goals
• Robustness and fault tolerance
• Scalability
• Generalization
• Extensibility
• Minimal maintenance
• Debuggability
• Low latency reads and updates
Lambda Architectrue
http://lambda-architecture.net/img/la-overview_small.png
Cluster Evolution
Start Simple
single box
Optimization:
Tomcat Connector
• Start with a single machine
• Number of threads matter, benchmark it
• Fine-tuning can be OS specific
Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Number of concurrent users
Initial
Initial no-joins
Incoming Queue
In-memory Storage
???
Haproxy
• The Reliable, High Performance TCP/HTTP Load
Balancer
• A single-process program
http://haproxy.org
A cluster of 10 servers
Optimization:
Load Balancing
0
0.25
0.5
0.75
1
1.25
0
10000
20000
30000
40000
dev 1 2 3 6
Gainbyaddinganotherserver
Requestspersecond
Number of servers
requests per second
scaling factor
When to Stop?
CPU %
Memory
GB
haproxy 95 2.5
tomcat 397 6
kafka 1 1.3
redis 55 3.5
What We Achieved?
Experience
• Lambda Architecture: we have One
• Cluster Scaling & Optimization
• Excellent team
Technology
Java 8
Spring Boot 1.2.5
Spring Data 1.2.5
Tomcat 8
MariaDB 5.5
Haproxy 1.5.14
Kafka 0.8
Redis 2.8
Spark 1.4
HDFS 2.6
Gatling 2.2
Angularjs 1.4
Things That Matter
• Small steps make huge difference
• Choose right metrics
• Benchmark
• Optimize!
Q/A
Thank You!

Contenu connexe

Tendances

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Databricks
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)E. Balauca
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 

Tendances (20)

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 

En vedette

Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...NoSQLmatters
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaSerhiy Batyuk
 
Big data analysis in java world
Big data analysis in java worldBig data analysis in java world
Big data analysis in java worldSerg Masyutin
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projectsDmitriy Dumanskiy
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in PracticeNavneet kumar
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Zero to one.PETER THIEL
Zero to one.PETER THIELZero to one.PETER THIEL
Zero to one.PETER THIELSreeja Sarella
 
High performance web sites with multilevel caching
High performance web sites with multilevel cachingHigh performance web sites with multilevel caching
High performance web sites with multilevel cachingDotnet Open Group
 
Cẩm nang kinh doanh tết 2017
Cẩm nang kinh doanh tết 2017Cẩm nang kinh doanh tết 2017
Cẩm nang kinh doanh tết 2017Haravan Official
 
ITLCHN 18 - Automation & DevOps - Automic
ITLCHN 18 -  Automation & DevOps - AutomicITLCHN 18 -  Automation & DevOps - Automic
ITLCHN 18 - Automation & DevOps - AutomicIT Expert Club
 
NLP: a peek into a day of a computational linguist
NLP: a peek into a day of a computational linguistNLP: a peek into a day of a computational linguist
NLP: a peek into a day of a computational linguistMariana Romanyshyn
 
itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1IT Expert Club
 
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015IT Expert Club
 

En vedette (20)

Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
 
Big data analysis in java world
Big data analysis in java worldBig data analysis in java world
Big data analysis in java world
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
React. Flux. Redux
React. Flux. ReduxReact. Flux. Redux
React. Flux. Redux
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
 
Marionette talk 2016
Marionette talk 2016Marionette talk 2016
Marionette talk 2016
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Zero to one.PETER THIEL
Zero to one.PETER THIELZero to one.PETER THIEL
Zero to one.PETER THIEL
 
High performance web sites with multilevel caching
High performance web sites with multilevel cachingHigh performance web sites with multilevel caching
High performance web sites with multilevel caching
 
Career talk
Career talkCareer talk
Career talk
 
Cẩm nang kinh doanh tết 2017
Cẩm nang kinh doanh tết 2017Cẩm nang kinh doanh tết 2017
Cẩm nang kinh doanh tết 2017
 
ITLCHN 18 - Automation & DevOps - Automic
ITLCHN 18 -  Automation & DevOps - AutomicITLCHN 18 -  Automation & DevOps - Automic
ITLCHN 18 - Automation & DevOps - Automic
 
NLP: a peek into a day of a computational linguist
NLP: a peek into a day of a computational linguistNLP: a peek into a day of a computational linguist
NLP: a peek into a day of a computational linguist
 
itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1itlchn 20 - Kien truc he thong chung khoan - Phan 1
itlchn 20 - Kien truc he thong chung khoan - Phan 1
 
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015
ITLC - Hanoi - NodeJS - ArrowJS - 27-11 - 2015
 

Similaire à Lambda architecture: from zero to One

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
Building Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureBuilding Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureFisnik Doko
 
Neotys PAC - Ian Molyneaux
Neotys PAC - Ian MolyneauxNeotys PAC - Ian Molyneaux
Neotys PAC - Ian MolyneauxNeotys_Partner
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleApache Kafka TLV
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Serverless without Code (Lambda)
Serverless without Code (Lambda)Serverless without Code (Lambda)
Serverless without Code (Lambda)CloudHesive
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYAshnikbiz
 
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices Apigee | Google Cloud
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoEqunix Business Solutions
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)Meni Lubetkin
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance ManagementNoriaki Tatsumi
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Denny Lee
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediTrevor Parsons
 
Enterprise WordPress - Performance, Scalability and Redundancy
Enterprise WordPress - Performance, Scalability and RedundancyEnterprise WordPress - Performance, Scalability and Redundancy
Enterprise WordPress - Performance, Scalability and RedundancyJohn Giaconia
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 

Similaire à Lambda architecture: from zero to One (20)

AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
Building Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureBuilding Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft Azure
 
Neotys PAC - Ian Molyneaux
Neotys PAC - Ian MolyneauxNeotys PAC - Ian Molyneaux
Neotys PAC - Ian Molyneaux
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Serverless without Code (Lambda)
Serverless without Code (Lambda)Serverless without Code (Lambda)
Serverless without Code (Lambda)
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
 
Enterprise WordPress - Performance, Scalability and Redundancy
Enterprise WordPress - Performance, Scalability and RedundancyEnterprise WordPress - Performance, Scalability and Redundancy
Enterprise WordPress - Performance, Scalability and Redundancy
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 

Dernier

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Dernier (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

Lambda architecture: from zero to One

Notes de l'éditeur

  1. Робота над проектом триває, хоча останнім часом активність не велика.
  2. Сервіс для підтримки високонавантаженого сценарію одночасного голосування
  3. Software as a Service
  4. Maria DB 5.5.44 (no second level cache in our persistence, all tests started with empty DB, connection pooling)
  5. Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". Spring Data's mission is to provide a familiar and consistent, Spring-based programming model for data access while still retaining the special traits of the underlying data store. MariaDB An enhanced, drop-in replacement for MySQL. MariaDB 5.5 is a stable (GA) release of MariaDB. It is MariaDB 5.3 + MySQL 5.5
  6. Дає видимість руху, правильно, неправильно, і чи взагалі дало якусь зміну 1000 campaigns 15-100k votes
  7. Коли позбулися 3-4 джойнів то система змогла витримувати більше навантаження, хоча продуктивність не збільшилася. Флуктуації на графіку пов”язані з нестабільність віртаулок які нами викоистовувалися. Бачимо, що система має явне обмеження по перформансу. З цим треба щось робити. Звичайним в цій ситуації є буферизація запитів, тобто будемо ставити їх в чергу. Черга — я з малими бавлюся в паттерн черга — я складаю іграшки на коврик, а вони їх з коврика сортують по ящиках )) Ми знаємо про круту технологію, зараз популярна…
  8. Є чудова презентація від Колі Аліменкова з цьогорічного ЖЕЕКонфа A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  9. Kafka 0.8.2.1 (Scala 2.9.1, only one partition) Система стала асинхронна, варто було б міряти також лейтенсі. Тримаємо це в секреті ) Хто зауважить — тому подарунок
  10. 1.5 - 2.1x Це вже добряче швидко, але зараз ми роздаємо дані з диска. Обрахунки виконуються в запитах до бази даних які досить часто не є найшвидні і не скейляться. Якщо роздавати дані з памяті це дасть нам приріст продуктивності…
  11. Вибрали бо ориганільно розраховували на використання лідербордів. З реального використання можна взяти HyperLogLog як ефективний приблизний підрахунок голосів по компанії In the Redis implementation it only uses 12kbytes per key to count with a standard error of 0.81%, and there is no limit to the number of items you can count, unless you approach 2^64 items (which seems quite unlikely). // key == set
  12. ~1.6x performance boost Але є обмеження на алгоритми по створенню репортів: було б добре мати якісь загальний інструмент для їх реалізації
  13. Про спарк є дуже добра презентація від Тараса, якщо ви її не чули — рекомендую подивитися відео на сайті морнинга. Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.
  14. Основна мета цього кроку — екстенсібіліті системи по відношенню до алгоритмів обробки даних. Також на live layer можна було б використати Spark Streaming, як узагальненя інкрементальних обрахунків. Можливо це буде реалізовано на одному з наступних етапів. Додавання ще одного розподіленого компонента швидше за все не дасть приросту продуктивності, на цьому етапі заміри лейтенсі яке ми міряли Обмеженням системи на даному етапі стає можливість збереження великого об”єму даних: хоч до сих пір база даних справлялася з цією роботою, але вона не найкращий кандидат. Є старенька всім відома технологія… але про це за декілька слайдів.
  15. Зараз іде робота над заміною маріяДБ на HDFS.
  16. Отже ми прийшли до картинки із остаточним варіантом архітектури. Дані вливаються великим потоком Можемо зберігати великі масиви даних Можем опрацьовувати великі масиви даних Є можливість швидкої відповіді на ряд поставлених питань, більш складні питання можна обрахувати точно з певною затримкою Кажуть в тої архітектури є вже назва.
  17. ВИБОРИ В УКРАЇНІ: бюлетні == all_data ЦВК == batch_view екзитпол == realtime_view
  18. Hadoop The batch layer precomputes results using a distributed processing system that can handle very large quantities of data. The batch layer aims at perfect accuracy by being able to process all available data when generating views. This means it can fix any errors by recomputing based on the complete data set, then updating existing views. Output is typically stored in a read-only database, with updates completely replacing existing precomputed views. Apache Hadoop is the de facto standard batch-processing system used in most high-throughput architectures.
  19. Simple Robust Predictable Easy to configure and operate Cassandra/HBase/ElaphantDB, також може бути Hive/Impala Output from the batch and speed layers are stored in the serving layer, which responds to ad-hoc queries by returning precomputed views or building views from the processed data. Examples of technologies used in the serving layer include Druid, which provides a single cluster to handle output from both layers. Dedicated stores used in the serving layer include Apache Cassandra or Apache HBase for speed-layer output, and Elephant DB or Cloudera Impala for batch-layer output.
  20. Але чогось не вистарчає…
  21. Complexity isolation — complexity is pushed to layer whose results are temporary . Eventual accuracy — eventually all the results will be taken from serving layer. Speed layer might use approximate algorithms like HyperLogLog and BloomFilters for computations. Storm/Spark Streaming The speed layer processes data streams in real time and without the requirements of fix-ups or completeness. This layer sacrifices throughput as it aims to minimize latency by providing real-time views into the most recent data. Essentially, the speed layer is responsible for filling the "gap" caused by the batch layer's lag in providing views based on the most recent data. This layer's views may not be as accurate or complete as the ones eventually produced by the batch layer, but they are available almost immediately after data is received, and can be replaced when the batch layer's views for the same data become available. Stream-processing technologies typically used in this layer include Apache Storm, SQLstream and Apache Spark. Output is typically stored on fast NoSQL databases.
  22. High-load meets big data
  23. Java Nio2 Connector, 500 threads http://techblog.netflix.com/2015/07/tuning-tomcat-for-high-throughput-fail.html
  24. Бачило що маємо певний ліміт по навантаженню, спільний для всіх варіантів. Мабуть це томкет ) Хто дасть такий варіант — подарунок. Система тепер досить добре витримує наватнаження. Цікавим є закономірний спад в околі 12к…
  25. Ubuntu 14.04 (virtual machines, 2 cores, 8 GB RAM) згадати про Haproxy vs nginx Згадати про обмеження в 12к для всіх тестів, це був томкет. Треба оптимізувати.
  26. Стовпчики Скейлінг фактор!!!
  27. Як це типово для високонагружених рішень, я очікував що вульким місцем стане збереження даних або їх обробка, наприклад перша моя задача була — давай зробимо так щоб Кафка стала вузьким місцем, але вепрлися в лоад балансер. Та й 30к RPS це нормальна нагрузка.
  28. Зоопарк