"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

Fwdays
FwdaysFwdays
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
https://nbomber.com
https://github.com/stereodb
AGENDA
- intro
- why stateless is slow and less reliable
- tools for building stateful services
PART I
intro to sportsbook domain
and
how we come to stateful
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
20 KB payload for concurrent read and write
Redis, single node: 4vcpu - 8gb
redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842
redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
API API API
DB
API
Cache
DB
but Cache is not queryable
API + DB
Stateful Service
state
state
state
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
CDC (Debezium)
DB
How to handle a case when
your data is larger than RAM?
10 GB 30 GB
Solution 1: use memory DB that supports data larger than RAM
10 GB
20 GB
UA PL FR
Solution 2: use partition by tenant
Solution 3: use range-based sharding
users
(1-500)
users
(501-1000)
shard A shard B
PART II
why stateless is slow
API + DB
Stateful Service
API
Cache
DB
network latency
network latency
Latency Numbers
Latency
2010 2020
Compress 1KB with Zippy 2μs 2μs
Read 1 MB sequentially from RAM 30μs 3μs
Read 1 MB sequentially from SSD 494μs 49μs
Read 1 MB sequentially from disk 3ms 825μs
Round trip within same datacenter 500μs 500μs
Send packet CA -> Netherlands -> CA 150ms 150ms
https://colin-scott.github.io/personal_website/research/interactive_latency.html
API
Cache
DB
CPU: for serialize/deserialize
CPU: serialize/deserialize
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: serialize/deserialize
CPU: ASYNC request handling
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
Overreads
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Object hit rate / Transactional hit rate
A B
C
API
In order to fulfill our transactional flow we need to
fetch records: A, B, C
Record A and B will not impact our latency
Overall Latency = Latency of record C
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Most existing cache eviction algorithms focus on maximizing
object hit rate, or the fraction of single object requests served
from cache. However, this approach fails to capture the
inter-object dependencies within transactions.
async / await
async / await
Imagine that we run Redis on localhost. Even with such setup we
usually use async request handling.
public void SimpleMethod()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = Add(i, i);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private int Add(int a, int b) => a + b;
public async Task SimpleMethodAsync()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private Task<int> AddAsync(int a, int b)
{
return Task.FromResult(a + b);
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return a + b;
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return await Task.Run(() => a + b);
}
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART III
why stateless is less reliable
API
Cache
DB
API + DB
Stateful Service
We have a higher probability of failure
API
Cache
DB
circuit breaker
retry
fallback
timeout
bulkhead isolation
circuit breaker
retry
fallback
timeout
bulkhead isolation
API + DB
Stateful Service
API
Cache
DB
What about cache invalidation
and data consistency?
API + DB
Stateful Service
API
Cache
DB
What about the predictable scale-out?
Will your RPS increase if you add an
additional API or Cache node?
API + DB
Stateful Service
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
- Metastable failures occur in open systems with an uncontrolled source of
load where a trigger causes the system to enter a bad state that persists
even when the trigger is removed.
- Paradoxically, the root cause of these failures is often features that
improve the efficiency or reliability of the system.
- The characteristic of a metastable failure is that the sustaining effect keeps
the system in the metastable failure state even after the trigger is
removed.
At least 4 out of 15 major outages in the
last decade at Amazon Web Services
were caused by metastable failures.
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART IV
tools for building stateful services
distributed log with sync replication
In-process memory DB
SQL OLAP
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
At pick to handle big load for 1 tenant we have:
5-10 nodes, 0.5-2 CPU, 6GB RAM
THANKS
always benchmark
https://twitter.com/antyadev
1 sur 60

Recommandé

Load Balancing MySQL with HAProxy - Slides par
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
11.3K vues25 diapositives
Synapse 2018 Guarding against failure in a hundred step pipeline par
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
120 vues112 diapositives
Anton Moldovan "Building an efficient replication system for thousands of ter... par
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
150 vues114 diapositives
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent... par
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...Amazon Web Services
9.4K vues89 diapositives
Fighting Against Chaotically Separated Values with Embulk par
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
2.1K vues45 diapositives
Microservice bus tutorial par
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
708 vues19 diapositives

Contenu connexe

Similaire à "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

L’odyssée d’une requête HTTP chez Scaleway par
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayScaleway
293 vues41 diapositives
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF par
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFAlexandre Gouaillard
760 vues19 diapositives
DPC2007 PHP And Oracle (Kuassi Mensah) par
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)dpc
849 vues34 diapositives
Cassandra at teads par
Cassandra at teadsCassandra at teads
Cassandra at teadsRomain Hardouin
6.2K vues86 diapositives
StrongLoop Overview par
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
2.3K vues44 diapositives
Getting Started with Amazon Redshift par
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
1K vues59 diapositives

Similaire à "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan (20)

L’odyssée d’une requête HTTP chez Scaleway par Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez Scaleway
Scaleway293 vues
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF par Alexandre Gouaillard
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
DPC2007 PHP And Oracle (Kuassi Mensah) par dpc
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)
dpc849 vues
StrongLoop Overview par Shubhra Kar
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar2.3K vues
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai... par François-Guillaume Ribreau
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
EEDC 2010. Scaling Web Applications par Expertos en TI
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI593 vues
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland par Fuenteovejuna
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna 551 vues
Kafka elastic search meetup 09242018 par Ying Xu
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu175 vues
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark par Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 vues
초보자를 위한 분산 캐시 이야기 par OnGameServer
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer12K vues
Sql sever engine batch mode and cpu architectures par Chris Adkin
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
Chris Adkin1K vues
How To Set Up SQL Load Balancing with HAProxy - Slides par Severalnines
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
Severalnines21.3K vues
Oracle Client Failover - Under The Hood par Ludovico Caldara
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
Ludovico Caldara1.9K vues
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D... par Vadym Kazulkin
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
Vadym Kazulkin84 vues
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호 par Amazon Web Services Korea
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели... par Ontico
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Ontico3.2K vues
Extending Piwik At R7.com par Leo Lorieri
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.com
Leo Lorieri2.8K vues

Plus de Fwdays

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov par
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan KochetovFwdays
17 vues33 diapositives
"Package management in monorepos", Zoltan Kochan par
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
28 vues18 diapositives
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell par
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M SnellFwdays
14 vues30 diapositives
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok par
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey DyachokFwdays
34 vues17 diapositives
"Node.js Development in 2024: trends and tools", Nikita Galkin par
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
23 vues38 diapositives
"Running students' code in isolation. The hard way", Yurii Holiuk par
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
32 vues34 diapositives

Plus de Fwdays(20)

"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov par Fwdays
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
"Drizzle: What Is It All About?", Alex Blokh, Dan Kochetov
Fwdays17 vues
"Package management in monorepos", Zoltan Kochan par Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays28 vues
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell par Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 vues
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok par Fwdays
"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok"AI and how to integrate ChatGPT as a customer support agent",  Sergey Dyachok
"AI and how to integrate ChatGPT as a customer support agent", Sergey Dyachok
Fwdays34 vues
"Node.js Development in 2024: trends and tools", Nikita Galkin par Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays23 vues
"Running students' code in isolation. The hard way", Yurii Holiuk par Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays32 vues
"Surviving highload with Node.js", Andrii Shumada par Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays49 vues
"The role of CTO in a classical early-stage startup", Eugene Gusarov par Fwdays
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene Gusarov
Fwdays33 vues
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... par Fwdays
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...
Fwdays44 vues
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... par Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays47 vues
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur par Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays49 vues
"Fast Start to Building on AWS", Igor Ivaniuk par Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays52 vues
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... par Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays48 vues
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi par Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays32 vues
"How we switched to Kanban and how it integrates with product planning", Vady... par Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays75 vues
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ... par Fwdays
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ..."Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
Fwdays25 vues
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov par Fwdays
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
Fwdays68 vues
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy par Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays49 vues
From “T” to “E”, Dmytro Gryn par Fwdays
From “T” to “E”, Dmytro GrynFrom “T” to “E”, Dmytro Gryn
From “T” to “E”, Dmytro Gryn
Fwdays37 vues
"Why I left React in my TypeScript projects and where ", Illya Klymov par Fwdays
"Why I left React in my TypeScript projects and where ",  Illya Klymov"Why I left React in my TypeScript projects and where ",  Illya Klymov
"Why I left React in my TypeScript projects and where ", Illya Klymov
Fwdays254 vues

Dernier

CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T par
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TShapeBlue
81 vues34 diapositives
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... par
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
74 vues17 diapositives
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... par
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
59 vues13 diapositives
Igniting Next Level Productivity with AI-Infused Data Integration Workflows par
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
373 vues86 diapositives
Business Analyst Series 2023 - Week 4 Session 7 par
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
110 vues31 diapositives
Future of AR - Facebook Presentation par
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
54 vues27 diapositives

Dernier(20)

CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T par ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 vues
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... par ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue74 vues
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... par ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue59 vues
Igniting Next Level Productivity with AI-Infused Data Integration Workflows par Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software373 vues
Business Analyst Series 2023 - Week 4 Session 7 par DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10110 vues
Future of AR - Facebook Presentation par Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 vues
Digital Personal Data Protection (DPDP) Practical Approach For CISOs par Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash103 vues
Why and How CloudStack at weSystems - Stephan Bienek - weSystems par ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 vues
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 vues
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... par James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson142 vues
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... par Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 vues
NTGapps NTG LowCode Platform par Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu287 vues
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... par ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue113 vues
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... par ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue105 vues
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... par ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue48 vues
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... par ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue93 vues

"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

  • 4. AGENDA - intro - why stateless is slow and less reliable - tools for building stateful services
  • 5. PART I intro to sportsbook domain and how we come to stateful
  • 6. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL
  • 7. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes
  • 8. 20 KB payload for concurrent read and write Redis, single node: 4vcpu - 8gb redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842 redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
  • 10. API Cache DB but Cache is not queryable
  • 11. API + DB Stateful Service
  • 15. How to handle a case when your data is larger than RAM? 10 GB 30 GB
  • 16. Solution 1: use memory DB that supports data larger than RAM 10 GB 20 GB
  • 17. UA PL FR Solution 2: use partition by tenant
  • 18. Solution 3: use range-based sharding users (1-500) users (501-1000) shard A shard B
  • 20. API + DB Stateful Service API Cache DB network latency network latency
  • 21. Latency Numbers Latency 2010 2020 Compress 1KB with Zippy 2μs 2μs Read 1 MB sequentially from RAM 30μs 3μs Read 1 MB sequentially from SSD 494μs 49μs Read 1 MB sequentially from disk 3ms 825μs Round trip within same datacenter 500μs 500μs Send packet CA -> Netherlands -> CA 150ms 150ms https://colin-scott.github.io/personal_website/research/interactive_latency.html
  • 22. API Cache DB CPU: for serialize/deserialize CPU: serialize/deserialize API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 23. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: serialize/deserialize CPU: ASYNC request handling API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 24. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets )
  • 25. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 26. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets Overreads CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 32. Object hit rate / Transactional hit rate
  • 33. A B C API In order to fulfill our transactional flow we need to fetch records: A, B, C Record A and B will not impact our latency Overall Latency = Latency of record C
  • 36. Most existing cache eviction algorithms focus on maximizing object hit rate, or the fraction of single object requests served from cache. However, this approach fails to capture the inter-object dependencies within transactions.
  • 38. async / await Imagine that we run Redis on localhost. Even with such setup we usually use async request handling.
  • 39. public void SimpleMethod() { var k = 0; for (int i = 0; i < Iterations; i++) { k = Add(i, i); } } [MethodImpl(MethodImplOptions.NoInlining)] private int Add(int a, int b) => a + b;
  • 40. public async Task SimpleMethodAsync() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private Task<int> AddAsync(int a, int b) { return Task.FromResult(a + b); }
  • 41. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return a + b; }
  • 42. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return await Task.Run(() => a + b); }
  • 45. PART III why stateless is less reliable
  • 46. API Cache DB API + DB Stateful Service We have a higher probability of failure
  • 47. API Cache DB circuit breaker retry fallback timeout bulkhead isolation circuit breaker retry fallback timeout bulkhead isolation API + DB Stateful Service
  • 48. API Cache DB What about cache invalidation and data consistency? API + DB Stateful Service
  • 49. API Cache DB What about the predictable scale-out? Will your RPS increase if you add an additional API or Cache node? API + DB Stateful Service
  • 51. - Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. - Paradoxically, the root cause of these failures is often features that improve the efficiency or reliability of the system. - The characteristic of a metastable failure is that the sustaining effect keeps the system in the metastable failure state even after the trigger is removed.
  • 52. At least 4 out of 15 major outages in the last decade at Amazon Web Services were caused by metastable failures.
  • 54. PART IV tools for building stateful services
  • 55. distributed log with sync replication
  • 59. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes At pick to handle big load for 1 tenant we have: 5-10 nodes, 0.5-2 CPU, 6GB RAM