More Related Content Similar to Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - Aleksandr Volochnev, Developer Advocate at DataStax (20) More from Dataconomy Media (20) Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - Aleksandr Volochnev, Developer Advocate at DataStax2. © DataStax, All Rights Reserved.
“Anything that can
go wrong will go
wrong”
Murphy is here,
watching you.
2
4. 4 © DataStax, All Rights Reserved.
Step I: [Data] Replication
● Single copy is doomed
● It’s a question of time
● Replicate it!
● Inconsistency (Say goodbye to ACID)
● Consistency level control
5. 5 © DataStax, All Rights Reserved.
Step II: Replica Distribution
● What stayed together is doomed
● It’s a question of time
● Distribute it
● Network delay
● Work with local_dc
6. 6 © DataStax, All Rights Reserved.
Step III: Infrastructure Diversification
● Single platform is doomed
● Guess what? It’s a question of time.
● Diversify it
● Configuration discrepancies
● Platform-agnostic solution
7. 7 © DataStax, All Rights Reserved.
Step IV: Durable Design
● Every unique node is a
bottleneck…
● And Single Point of Failure
● No SPoF, everything is
disposable
● Decentralization over
Federalisation
● “Cattle over Pets”
● Collaboration is harder
● Paxos Consensus Protocol
8. 8 © DataStax, All Rights Reserved.
Step V: Horizontal Scaling
● Up-Scaling is Ooops-Scaling
● Expensive and not efficient
● Commodity Hardware
● Scale Out!
● Fleet Management
● Configuration Management
● Infrastructure Automation (IaaC)
9. 9 © DataStax, All Rights Reserved.
Step V: Horizontal Scaling
● Up-Scaling is Ooops-Scaling
● Expensive and not efficient
● Commodity Hardware
● Scale Out!
● Fleet Management
● Configuration Management
● Infrastructure Automation (IaaC)
10. 10 © DataStax, All Rights Reserved.
Step VI: Self-Aware Cluster Topology
● Situation changes quickly
● No manual management possible
● Schema-aware cluster
● Gossiping
● Early failure detection
● Coordination
● Query optimisation
● Schema-aware client
● Client-side routing
11. 11 © DataStax, All Rights Reserved.
Step VII: Failure Detection & Recovery
● Errors happen all the time
● Proper error handling is often missing
● Recovery is usually post-factum
● Every part is ready
● Node processing request is a coordinator
● Parallel Async Dispatching
● Fail on write? Proactive Hinted handoff.
● Fail on read? Wait for next response &
decrease weight of a suspicious node.
12. 12 © DataStax, All Rights Reserved.
Step VIII: Operational Simplicity
“Lack of laziness is the developer’s worst curse”
● Manual operations are error-prone, not transparent and time-wasting.
● All repeatable operations should be automated and traceable
● Partitioning automation
● Emergency rebalance automation
● Bootstrap automation
● Decommission automation
13. 13 © DataStax, All Rights Reserved.
Step IX: Background Self-Healing
● Failures sneak in anyway
● Because of Murphy, blame him!
● Repair-on-Read
● On-demand repair
● NodeSync (DSE)
● Scheduled repairs (v4)
● Automated Background Process
(unless you have 5000 perfect
ops ppl)
(no, you don’t)
14. 14 © DataStax, All Rights Reserved.
Step X: Continuous Improvement
● Debugging of a distributed system is DEADLY HARD
● No, seriously. I mean that.
● Think ahead, make logs great again ©
● Transient unique transaction ID
● Continuous monitoring
● Post-Mortem & Root Cause Analysis
● Goal is MTTR=0
15. Real Life?
15 © DataStax, All Rights Reserved.
Let me show you the numbers
19. © DataStax, All Rights Reserved.
• Replicate Data
• Distribute Replicas
• Diversify Infrastructure
• Have no Single Point of Failure
• Scale Out
• Develop to be Self-Sufficient
• Design to Recover Quickly
• Simplify Management
• Automate Recovery
• Monitoring & Post-Mortem
Know your Principles
All Together
19
20. © DataStax, All Rights Reserved.
Expect Failure
Praise Failure
Design to Fail
Know the Principle
In Two Words
20
22. Thank you! Questions?
22 © DataStax, All Rights Reserved.
Aleks Volochnev
Developer Advocate at DataStax
@HadesArchitect
After many years in software development as a developer,
technical lead, devops engineer and architect, Aleks focused
himself on distributed applications and cloud architecture. Working
as a developer advocate at DataStax, he shares his knowledge
and expertise in the field of microservices, disaster tolerant
systems and hybrid platforms.
Ask me about Cassandra Day in your city!