SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
© 2021 TIS Inc.
Reactive Summit 2021
Reactive Systems that focus on High Availability with Lerna
2021.11.2
Yugo Maede
@yugolf
© 2021 TIS Inc. 2
About me
Yugo Maede ( Twitter: @yugolf )
TIS Inc. Technology & Innovation SBU
Technology & Engineering Center <- technology-specific organization
• current mission
The product owner of Lerna. Developing Lerna and support projects which adopt Lerna.
Lerna enables to build high-available and high-throughput systems quickly and inexpensively.
• translated book
Akka in Action Japanese version
• web media contributions
ThinkIT: Learn about reactive systems, the paradigm of the many-core era
• speaking at events
Scala Matsuri、JJUG CCC etc.
© 2021 TIS Inc. 3
Non-functional requirements for mission-critical systems
Motivation
high availability & high throughput
want a mechanism that can accomplish high availability
and high throughput quickly with low cost!
• costly
• high-throughput is more costly
• complex and difficult
• spend time on non-business logic
distributed system
high-available server
© 2021 TIS Inc. 4
• Message Driven、Actor Model
• Stateful Application
• Distributed System、Cluster
• Event Sourcing、Distributed DB
• CQRS
Solution
difficult
adapt repeatedly, continue to operate the system,
refine the architecture, and nurture engineers
OSS
『Lerna』
building Reactive Systems with Akka
complexity
© 2021 TIS Inc. 5
– libraries (support Akka Typed)
– Developer guides
– reference Code
– learning Contents
and everything you need to build a highly available system.
We make one package which is ready to use.
Building high availability systems
software stacks that focus on high availability
and build reactive systems
© 2021 TIS Inc. 6
execute Terraform scripts on VMs to create environments for highly available systems
Overview of Lerna
https://fintan.jp/?p=5948&lang=en
© 2021 TIS Inc. 7
• availability
– logical availability calculated from MTBF and MTTR
(Note: Not the availability of the service itself.)
• The numeric value of ”Design for Failure”
– How many seconds does it take for your application to recover from a failure?
– minimize the time to repair as failure always occurs
• Focus on minimizing the MTTR; Mean Time To Repair
Availability levels enabled by Lerna
https://en.wikipedia.org/wiki/Availability
© 2021 TIS Inc. 8
Under the following conditions, a simulated failure occurs and MTTR is measured
– building Payment Services on AWS
– adopt CQRS + Event Sourcing architecture
• command side APIs persist events to Cassandra in real-time
• propagate asynchronously to the query side (MariaDB)
– measurement target is the command side API
– send 150 TPS requests from Gatling
Measurement condition
© 2021 TIS Inc. 9
• measure the time when even one service user is unavailable at each point of failure,
and set it as "MTTR in single failure"
• assuming one failure per server per year, "number of servers x failure impact range
(Percentage of Service Unavailable)" is MTTR
• total MTTR for all failure points for one year
Lerna's definition of MTTR
Failed layer
MTTR in
single failure
number of
servers
number of
failures per year
failure
impact range
MTTR
Load Balancer(Keepalived) 2.78 sec 1 1 1
MTTR in single failure
x
number of servers
x
number of failures
per year
x
failure impact range
Load Balancer(HAProxy) 3.32 sec 3 1 1/3
Application(Akka Cluster) 5.92 sec 9 1 1/9
Command Side DB(Cassandra) 0.00 sec 6 1 1/6
Query Side DB(MariaDB) 1.14 sec 6 1 1/6
DC Failure(network partition) 8.02 sec 1 1 1
downtime per year
total MTTR for all
faults
© 2021 TIS Inc. 10
• to Minimize impact, the important thing is to isolate points of failure instantaneously rather
than to recover them
• all layers are scalable and can be healed to their original state
• the application layer is implemented with its original library akka-entity-replication with Raft
Measurement result
failed layer
MTTR in
single failure
number of
servers
number of
failures per year
failure impact
range
MTTR
Load Balancer(Keepalived) 2.78 sec 1 1 1 2.78 sec
Load Balancer(HAProxy) 3.32 sec 3 1 1/3 3.32 sec
Application(Akka Cluster) 5.92 sec 9 1 1/9 5.92 sec
Command Side DB(Cassandra) 0.00 sec 6 1 1/6 0.00 sec
Query Side DB(MariaDB) 1.14 sec 6 1 1/6 1.14 sec
DC Failure(network partition) 8.02 sec 1 1 1 8.02 sec
downtime per year
total MTTR for all
faults
all layers recovered within 10 seconds
© 2021 TIS Inc. 11
akka-entity-replication
https://github.com/lerna-stack/akka-entity-replication#akka-entity-replication
Requests recover (become green) immediately even if failure (kill a node) occurred
© 2021 TIS Inc. 12
Availability : 99.9999%
failed Layer
MTTR in
single failure
number of
servers
number of
failures per year
failure impact
range
MTTR
Load Balancer(Keepalived) 2.78 sec 1 1 1 2.78 sec
Load Balancer(HAProxy) 3.32 sec 3 1 1/3 3.32 sec
Application(Akka Cluster) 5.92 sec 9 1 1/9 5.92 sec
Command Side DB(Cassandra) 0.00 sec 6 1 1/6 0.00 sec
Query Side DB(MariaDB) 1.14 sec 6 1 1/6 1.14 sec
DC Failure(network partition) 8.02 sec 1 1 1 8.02 sec
downtime per year 21.18 sec
https://www.eventhelix.com/fault-handling/reliability-availability-basics/
© 2021 TIS Inc. 13
Lerna is Elastic
Lerna's architecture is Elastic, so adding nodes can achieve 1,000 TPS
(This is not an upper bound because it is Elastic)
© 2021 TIS Inc. 14
Lerna is Responsive
Lerna‘s architecture is responsive, so that it can respond under high load (1,000 TPS)
within 100 ms (tested with payment transactions persisting events to Cassandra)
© 2021 TIS Inc. 15
• availability and performance of Lerna
https://fintan.jp/?p=7256
• getting started with Lerna
https://fintan.jp/?p=5946
• our technical site
https://fintan.jp/?lang=en
More information
© 2021 TIS Inc. 16
• Lerna High Availability Software Stack
– achieve non-functional requirements for mission-critical systems
– not only libraries, but also necessary items for system development are available as OSS
– reduce barriers to complex and difficult distributed systems
• the numeric value of ”Design for Failure”
– logical availability calculated from measured MTTR: 99.9999%
– all layers, including the application layer, recover from failure in less than 10 seconds
Summary
THANK YOU
If you have any questions, please mention or DM on Twitter.
Twitter ID : @yugolf

Contenu connexe

Tendances

Tendances (20)

MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
 
Y. Tsesmelis, Uni Systems: Quarkus use cases and business value
Y. Tsesmelis, Uni Systems: Quarkus use cases and business valueY. Tsesmelis, Uni Systems: Quarkus use cases and business value
Y. Tsesmelis, Uni Systems: Quarkus use cases and business value
 
Protecting Yourself from the Container Shakeout
Protecting Yourself from the Container ShakeoutProtecting Yourself from the Container Shakeout
Protecting Yourself from the Container Shakeout
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)
 
Resilient Microservices with Spring Cloud
Resilient Microservices with Spring CloudResilient Microservices with Spring Cloud
Resilient Microservices with Spring Cloud
 
Cloud-native Application Lifecycle Management
Cloud-native Application Lifecycle ManagementCloud-native Application Lifecycle Management
Cloud-native Application Lifecycle Management
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Kubernetes overview 101
Kubernetes overview 101Kubernetes overview 101
Kubernetes overview 101
 
Driving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete DeckDriving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete Deck
 
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerC. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
 
OpenStack Training | OpenStack Tutorial For Beginners | OpenStack Certificati...
OpenStack Training | OpenStack Tutorial For Beginners | OpenStack Certificati...OpenStack Training | OpenStack Tutorial For Beginners | OpenStack Certificati...
OpenStack Training | OpenStack Tutorial For Beginners | OpenStack Certificati...
 
D. Andreadis, Red Hat: Concepts and technical overview of Quarkus
D. Andreadis, Red Hat: Concepts and technical overview of QuarkusD. Andreadis, Red Hat: Concepts and technical overview of Quarkus
D. Andreadis, Red Hat: Concepts and technical overview of Quarkus
 
Tanzu Standard
Tanzu StandardTanzu Standard
Tanzu Standard
 
What? VDI without Nutanix and ControlUp?!
What? VDI without Nutanix and ControlUp?!What? VDI without Nutanix and ControlUp?!
What? VDI without Nutanix and ControlUp?!
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
Norway VMUG Tour - The Architecture Behind Policy-Driven Data Protection - A ...
 
Practical Guide to Securing Kubernetes
Practical Guide to Securing KubernetesPractical Guide to Securing Kubernetes
Practical Guide to Securing Kubernetes
 

Similaire à Reactive Systems that focus on High Availability with Lerna

Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
From local servers up to Kubernetes in the cloud
From local servers up to Kubernetes in the cloudFrom local servers up to Kubernetes in the cloud
From local servers up to Kubernetes in the cloud
Scaleway
 

Similaire à Reactive Systems that focus on High Availability with Lerna (20)

Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
From local servers up to Kubernetes in the cloud
From local servers up to Kubernetes in the cloudFrom local servers up to Kubernetes in the cloud
From local servers up to Kubernetes in the cloud
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
 
Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
High-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulationsHigh-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulations
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
 
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OSEpisode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OS
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Dynamic Data Centers - Taking it to the next level
Dynamic Data Centers - Taking it to the next levelDynamic Data Centers - Taking it to the next level
Dynamic Data Centers - Taking it to the next level
 
Webinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWSWebinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWS
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
 
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...
 

Plus de TIS Inc.

AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
TIS Inc.
 
甲賀流Jenkins活用術
甲賀流Jenkins活用術甲賀流Jenkins活用術
甲賀流Jenkins活用術
TIS Inc.
 

Plus de TIS Inc. (16)

AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
AWSマネージドサービスとOSSによるミッションクリティカルなシステムの実現
 
Starting Reactive Systems with Lerna #reactive_shinjuku
Starting Reactive Systems with Lerna #reactive_shinjukuStarting Reactive Systems with Lerna #reactive_shinjuku
Starting Reactive Systems with Lerna #reactive_shinjuku
 
可用性を突き詰めたリアクティブシステム
可用性を突き詰めたリアクティブシステム可用性を突き詰めたリアクティブシステム
可用性を突き詰めたリアクティブシステム
 
EventStormingワークショップ 〜かつてない図書館をモデリングしてみよう〜
EventStormingワークショップ 〜かつてない図書館をモデリングしてみよう〜EventStormingワークショップ 〜かつてない図書館をモデリングしてみよう〜
EventStormingワークショップ 〜かつてない図書館をモデリングしてみよう〜
 
Akkaの並行性
Akkaの並行性Akkaの並行性
Akkaの並行性
 
JavaからAkkaハンズオン
JavaからAkkaハンズオンJavaからAkkaハンズオン
JavaからAkkaハンズオン
 
リアクティブシステムとAkka
リアクティブシステムとAkkaリアクティブシステムとAkka
リアクティブシステムとAkka
 
Akkaで実現するステートフルでスケーラブルなアーキテクチャ
Akkaで実現するステートフルでスケーラブルなアーキテクチャAkkaで実現するステートフルでスケーラブルなアーキテクチャ
Akkaで実現するステートフルでスケーラブルなアーキテクチャ
 
akka-doc-ja
akka-doc-jaakka-doc-ja
akka-doc-ja
 
10分で分かるリアクティブシステム
10分で分かるリアクティブシステム10分で分かるリアクティブシステム
10分で分かるリアクティブシステム
 
Typesafe Reactive Platformで作るReactive System入門
Typesafe Reactive Platformで作るReactive System入門Typesafe Reactive Platformで作るReactive System入門
Typesafe Reactive Platformで作るReactive System入門
 
Typesafe Reactive Platformで作るReactive System
Typesafe Reactive Platformで作るReactive SystemTypesafe Reactive Platformで作るReactive System
Typesafe Reactive Platformで作るReactive System
 
Effective Akka読書会2
Effective Akka読書会2Effective Akka読書会2
Effective Akka読書会2
 
再帰で脱Javaライク
再帰で脱Javaライク再帰で脱Javaライク
再帰で脱Javaライク
 
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
 
甲賀流Jenkins活用術
甲賀流Jenkins活用術甲賀流Jenkins活用術
甲賀流Jenkins活用術
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Reactive Systems that focus on High Availability with Lerna

  • 1. © 2021 TIS Inc. Reactive Summit 2021 Reactive Systems that focus on High Availability with Lerna 2021.11.2 Yugo Maede @yugolf
  • 2. © 2021 TIS Inc. 2 About me Yugo Maede ( Twitter: @yugolf ) TIS Inc. Technology & Innovation SBU Technology & Engineering Center <- technology-specific organization • current mission The product owner of Lerna. Developing Lerna and support projects which adopt Lerna. Lerna enables to build high-available and high-throughput systems quickly and inexpensively. • translated book Akka in Action Japanese version • web media contributions ThinkIT: Learn about reactive systems, the paradigm of the many-core era • speaking at events Scala Matsuri、JJUG CCC etc.
  • 3. © 2021 TIS Inc. 3 Non-functional requirements for mission-critical systems Motivation high availability & high throughput want a mechanism that can accomplish high availability and high throughput quickly with low cost! • costly • high-throughput is more costly • complex and difficult • spend time on non-business logic distributed system high-available server
  • 4. © 2021 TIS Inc. 4 • Message Driven、Actor Model • Stateful Application • Distributed System、Cluster • Event Sourcing、Distributed DB • CQRS Solution difficult adapt repeatedly, continue to operate the system, refine the architecture, and nurture engineers OSS 『Lerna』 building Reactive Systems with Akka complexity
  • 5. © 2021 TIS Inc. 5 – libraries (support Akka Typed) – Developer guides – reference Code – learning Contents and everything you need to build a highly available system. We make one package which is ready to use. Building high availability systems software stacks that focus on high availability and build reactive systems
  • 6. © 2021 TIS Inc. 6 execute Terraform scripts on VMs to create environments for highly available systems Overview of Lerna https://fintan.jp/?p=5948&lang=en
  • 7. © 2021 TIS Inc. 7 • availability – logical availability calculated from MTBF and MTTR (Note: Not the availability of the service itself.) • The numeric value of ”Design for Failure” – How many seconds does it take for your application to recover from a failure? – minimize the time to repair as failure always occurs • Focus on minimizing the MTTR; Mean Time To Repair Availability levels enabled by Lerna https://en.wikipedia.org/wiki/Availability
  • 8. © 2021 TIS Inc. 8 Under the following conditions, a simulated failure occurs and MTTR is measured – building Payment Services on AWS – adopt CQRS + Event Sourcing architecture • command side APIs persist events to Cassandra in real-time • propagate asynchronously to the query side (MariaDB) – measurement target is the command side API – send 150 TPS requests from Gatling Measurement condition
  • 9. © 2021 TIS Inc. 9 • measure the time when even one service user is unavailable at each point of failure, and set it as "MTTR in single failure" • assuming one failure per server per year, "number of servers x failure impact range (Percentage of Service Unavailable)" is MTTR • total MTTR for all failure points for one year Lerna's definition of MTTR Failed layer MTTR in single failure number of servers number of failures per year failure impact range MTTR Load Balancer(Keepalived) 2.78 sec 1 1 1 MTTR in single failure x number of servers x number of failures per year x failure impact range Load Balancer(HAProxy) 3.32 sec 3 1 1/3 Application(Akka Cluster) 5.92 sec 9 1 1/9 Command Side DB(Cassandra) 0.00 sec 6 1 1/6 Query Side DB(MariaDB) 1.14 sec 6 1 1/6 DC Failure(network partition) 8.02 sec 1 1 1 downtime per year total MTTR for all faults
  • 10. © 2021 TIS Inc. 10 • to Minimize impact, the important thing is to isolate points of failure instantaneously rather than to recover them • all layers are scalable and can be healed to their original state • the application layer is implemented with its original library akka-entity-replication with Raft Measurement result failed layer MTTR in single failure number of servers number of failures per year failure impact range MTTR Load Balancer(Keepalived) 2.78 sec 1 1 1 2.78 sec Load Balancer(HAProxy) 3.32 sec 3 1 1/3 3.32 sec Application(Akka Cluster) 5.92 sec 9 1 1/9 5.92 sec Command Side DB(Cassandra) 0.00 sec 6 1 1/6 0.00 sec Query Side DB(MariaDB) 1.14 sec 6 1 1/6 1.14 sec DC Failure(network partition) 8.02 sec 1 1 1 8.02 sec downtime per year total MTTR for all faults all layers recovered within 10 seconds
  • 11. © 2021 TIS Inc. 11 akka-entity-replication https://github.com/lerna-stack/akka-entity-replication#akka-entity-replication Requests recover (become green) immediately even if failure (kill a node) occurred
  • 12. © 2021 TIS Inc. 12 Availability : 99.9999% failed Layer MTTR in single failure number of servers number of failures per year failure impact range MTTR Load Balancer(Keepalived) 2.78 sec 1 1 1 2.78 sec Load Balancer(HAProxy) 3.32 sec 3 1 1/3 3.32 sec Application(Akka Cluster) 5.92 sec 9 1 1/9 5.92 sec Command Side DB(Cassandra) 0.00 sec 6 1 1/6 0.00 sec Query Side DB(MariaDB) 1.14 sec 6 1 1/6 1.14 sec DC Failure(network partition) 8.02 sec 1 1 1 8.02 sec downtime per year 21.18 sec https://www.eventhelix.com/fault-handling/reliability-availability-basics/
  • 13. © 2021 TIS Inc. 13 Lerna is Elastic Lerna's architecture is Elastic, so adding nodes can achieve 1,000 TPS (This is not an upper bound because it is Elastic)
  • 14. © 2021 TIS Inc. 14 Lerna is Responsive Lerna‘s architecture is responsive, so that it can respond under high load (1,000 TPS) within 100 ms (tested with payment transactions persisting events to Cassandra)
  • 15. © 2021 TIS Inc. 15 • availability and performance of Lerna https://fintan.jp/?p=7256 • getting started with Lerna https://fintan.jp/?p=5946 • our technical site https://fintan.jp/?lang=en More information
  • 16. © 2021 TIS Inc. 16 • Lerna High Availability Software Stack – achieve non-functional requirements for mission-critical systems – not only libraries, but also necessary items for system development are available as OSS – reduce barriers to complex and difficult distributed systems • the numeric value of ”Design for Failure” – logical availability calculated from measured MTTR: 99.9999% – all layers, including the application layer, recover from failure in less than 10 seconds Summary
  • 17. THANK YOU If you have any questions, please mention or DM on Twitter. Twitter ID : @yugolf