This is the talk given at Highload++ 2014 in Moscow, Russia. The topic was partition tolerance testing of Galera in a noisy high load environment with NetEm and Docker.
5. Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter
Deutsch
“ A distributed system is one in which the failure of a computer you didn’t
even know existed can render your own computer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequately explained by stupidity.
” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can be explained by an ill
node(s) ” - Me
5 / 92
6. Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter
Deutsch
“ A distributed system is one in which the failure of a computer you didn’t
even know existed can render your own computer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequately explained by stupidity.
” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can be explained by an ill
node(s) ” - Me
6 / 92
7. Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter
Deutsch
“ A distributed system is one in which the failure of a computer you didn’t
even know existed can render your own computer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequately explained by stupidity.
” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can be explained by an ill
node(s) ” - Me
7 / 92
8. Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter
Deutsch
“ A distributed system is one in which the failure of a computer you didn’t
even know existed can render your own computer unusable. ” - Leslie Lamport
“ Never attribute to malice that which is adequately explained by stupidity.
” - Hanlon’s Razor
“ Never attribute to Byzantine failure which can be explained by an ill
node(s) ” - Me
8 / 92
9. The fallacies
▶ The network is reliable
▶ Latency is zero
▶ Bandwidth is infinite
▶ The network is secure
9 / 92
10. The fallacies
▶ Topology doesn’t change
▶ There is one administrator
▶ Transport cost is zero
▶ The network is homogeneous
10 / 92
59. Docker
▶ Why not virtualize
♦ Occam
♦ Namespaces
▶ Simplicity
♦ Network
♦ One application per node
59 / 92
60. Docker
▶ Portability
- See same qualitative behavior that I do.
▶ Reproducibility
- Makes it determinstic
▶ Configurable and CI
- Byproducts
60 / 92
61. Docker
▶ QEMU and Docker
▶ Scalability
♦ Performance
♦ Feature
▶ Abstraction of channels
61 / 92
62. Container Networking
▶ Linking didn’t help
▶ Dnsmasq to rescue!
♦ Hosts file and volumes
♦ SIGHUP and refresh
▶ More elegant methods
- Swarm
62 / 92
76. Eviction
▶ Aim
▶ Quorum required
- Keep majority of the group live or to avoid eviction.
- Why? - Not shoot each other
- Non-PC nodes also.
76 / 92
77. Eviction
▶ Aim
▶ Quorum required
- Keep majority of the group live or to avoid eviction.
- Why? - Not shoot each other
- Non-PC nodes also.
77 / 92
78. Eviction
▶ EVS version and upgrade
▶ TODO!
- Ingress only
- Follow here.
▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership.
78 / 92
79. Coredumps with Docker
▶ Breakdown of abstraction
▶ Lack of isolation
▶ What was done
- Volumes
- core_pattern & sysctl
- suid and ulimit
79 / 92
80. WAN Segments
▶ How they work
▶ Random allocation
▶ Joiner starvation
▶ Simulates data center
▶ Donor selection
80 / 92
83. Code: todo
▶ Poll and exit during reconciliation
- Instrument v/s number of nodes
▶ Use actual channels
▶ Run it bare - CoreOS etc.
▶ Overlay with etcd/fleet/libswarm
83 / 92