Verteilte System sollten heute definitiv nicht mehr ohne Resilienz entwickelt und betrieben werden. Der zuständige Entwickler oder Architekt muss sich zuerst überlegen, welche Resilienzpatterns notwendig sind. Im Anschluss daran stellt sich die Frage, wie die Umsetzung dieser Patterns in den einzelnen Services erfolgen soll. Dabei kann zwischen zwei grundsätzlichen Alternativen unterschieden werden. Zum einen gibt es die Implementierung mit den klassischen Resilienz-Frameworks, wie beispielsweise Resilience4j, Failsafe oder MicroProfile Fault Tolerance. Andererseits ist es mittlerweile auch möglich, Resilienz mit Hilfe eines Service-Mesh-Werkzeugs, wie zum Beispiel Istio, zu etablieren. In dieser Session werden nach einer kurzen Einführung zu Istio die beiden grundsätzlichen Alternativen verglichen. Die jeweiligen Vor- und Nachteile werden aufgeführt und in einer abschließenden Bewertung gegenübergestellt. Darüber hinaus wird noch gezeigt, welche Möglichkeiten Istio für den Test der Resilienz bietet.
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Service Mesh vs. Frameworks: Where to put the resilience?
1. Service Mesh vs. Frameworks:
Where to put the resilience?
Michael Hofmann
https://hofmann-itconsulting.de
2. (1) Distributed Systems and Resilience
(2) Framework
(3) Service Mesh
(4) Framework and Service Mesh Characteristics
(5) Thoughts about Resilience
(6) Essential Requirements
(7) Conclusion
Agenda
3. Distributed Systems
➔ degree of distribution raises failure rate!
➔ compensation strategy: resilience!
slow response
timeout
aborted network connection
...
Typical Communication Errors Fallacies of Distributed Computing
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
7. Service Mesh
The term service mesh is used to describe the
network of microservices that make up such
application and the interactions between them.
(istio.io)
Don’t manage a Service Mesh without tooling!
Requirements:
(1) manage calls on layer 7 (application layer, L7)
(2) resilience, routing, security and telemetry
(3) decentralized & transparent for services (implementation independent)
9. Resilience Patterns in Istio
✔
Timeout
✔
Retry
✔
CircuitBreaker
✔
Bulkhead
✗
Fallback?
✗
is a Fallback possible?
✗
less technical, more business driven
https://dzone.com/articles/fallbacks-are-overrated-architecting-for-resilienc
11. Resilience in Istio
Apply to sidecar
Resilience rules
— transparent for service
— act global on all sidecars
Fault Injection
MicroProfile with Istio setting
apiVersion:networking.istio.io/
v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percent: 100
MP_Fault_Tolerance_NonFallback_Enabled = false
12. Frameworks Characteristics
—
Java: a lot of different frameworks
—
Team decides framework?!?
—
Learning curve for every framework
—
Different frameworks behave different
—
Same framework in different version behave different
—
Same framework in different versions parallel in use
13. Frameworks Characteristics
➔ Change of framework:
➔ Replace all positions in code
➔ New behavior
➔ New deployment
➔ New tests
➔ Risk of chain reaction:
framework ➔ load balancing ➔ service registry
➔ Multiple service registries for every different framework?
14. Service Mesh Characteristics
—
Define new rule
—
Same behavior (… no framework change)
—
unchanged deployed service
—
new tests only for new rules
—
Client-side load balancing in sidecar
—
Service Registry based on endpoints in K8S
$ kubectl apply -f ...
15. Thoughts about Resilience
Resilience pattern still correct if communication behavior changes?
—
Modified behavior of partner
—
Modified communication partner
—
Modified infrastructure
—
Load changes during day
—
Side effects from other systems
—
Anticipate problems of tomorrow?
16. Thoughts about Resilience
—
Main problem: choose the right resilience pattern
—
Correct parameters for pattern?
—
Measure resilience
—
Mostly: try & error for suitable pattern/params
(main reason for end of life in hystrix)
—
Often: retry storm
—
Often: missing musketeer principle
(black sheep)
19. Conclusion
—
Comparable resilience patterns
—
Missing fallback in service mesh (but overrated)
—
Higher flexibility in service mesh
—
Fault injection easy in service mesh
Solve problems where they arise!
Service Mesh for L4-L7
Developer for L8 (original profession)