SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
Corpus collapsum 
Partition tolerance of Galera in a noisy high load 
environment 
Highload++ 2014 
Raghavendra Prabhu 
 raghavendra.d.prabhu@gmail.com 
Percona  raghavendra.prabhu@percona.com 
 randomsurfer  wnohang.net  rdprabhu  ronin13
The Title?
Our Cluster
Split brain
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter 
Deutsch 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
5 / 92
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter 
Deutsch 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
6 / 92
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter 
Deutsch 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
7 / 92
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter 
Deutsch 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
8 / 92
The fallacies 
▶ The network is reliable 
▶ Latency is zero 
▶ Bandwidth is infinite 
▶ The network is secure 
9 / 92
The fallacies 
▶ Topology doesn’t change 
▶ There is one administrator 
▶ Transport cost is zero 
▶ The network is homogeneous 
10 / 92
20000 feet view
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
12 / 92
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
13 / 92
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
14 / 92
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
15 / 92
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
16 / 92
Actors 
▶ Jenkins 
♦ Build flow and CI 
▶ Storage 
♦ Why 
17 / 92
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
18 / 92
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
19 / 92
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
20 / 92
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
21 / 92
But why 
▶ Failures in warehouses. 
▶ Not quorum, but consensus. 
▶ Real world networks and synchronous replication 
- Delay 
- Partition 
22 / 92
Galera
Galera 
▶ Data-centric approach 
▶ EVS 
▶ Causality and Synchronous 
▶ Latency 
24 / 92
Where did it start
Where did it start 
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 
▶ Loss of PC 
▶ Crash 
▶ HA goal 
29 / 92
One can bring the whole down
The Flow
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
32 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
33 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
34 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
35 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
36 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
37 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
38 / 92
Basic Flow 
Jenkins Build image(s?) Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
39 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
40 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
41 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
42 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
43 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
44 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
45 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
46 / 92
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Examine! 
Cleanup Collect logs 
47 / 92
Cluster Resilience 
48 / 92
Parameters 
▶ Sysbench 
▶ Segments 
▶ Reconciliation period 
▶ Number of nodes with loss 
49 / 92
Parameters 
▶ Sysbench 
▶ Segments 
▶ Reconciliation period 
▶ Number of nodes with loss 
50 / 92
Parameters 
▶ Sysbench 
▶ Segments 
▶ Reconciliation period 
▶ Number of nodes with loss 
51 / 92
Parameters 
▶ Sysbench 
▶ Segments 
▶ Reconciliation period 
▶ Number of nodes with loss 
52 / 92
Parameters 
▶ NetEm 
▶ Detach qdisc or keep it? 
▶ Fsync 
▶ Shutdown 
53 / 92
Parameters 
▶ NetEm 
▶ Detach qdisc or keep it? 
▶ Fsync 
▶ Shutdown 
54 / 92
Parameters 
▶ NetEm 
▶ Detach qdisc or keep it? 
▶ Fsync 
▶ Shutdown 
55 / 92
Parameters 
▶ NetEm 
▶ Detach qdisc or keep it? 
▶ Fsync 
▶ Shutdown 
56 / 92
Parameters
Containers!
Docker 
▶ Why not virtualize 
♦ Occam 
♦ Namespaces 
▶ Simplicity 
♦ Network 
♦ One application per node 
59 / 92
Docker 
▶ Portability 
- See same qualitative behavior that I do. 
▶ Reproducibility 
- Makes it determinstic 
▶ Configurable and CI 
- Byproducts 
60 / 92
Docker 
▶ QEMU and Docker 
▶ Scalability 
♦ Performance 
♦ Feature 
▶ Abstraction of channels 
61 / 92
Container Networking 
▶ Linking didn’t help 
▶ Dnsmasq to rescue! 
♦ Hosts file and volumes 
♦ SIGHUP and refresh 
▶ More elegant methods 
- Swarm 
62 / 92
Noise 
▶ Initial setup 
- Bridge 
- Egress only 
- IFB 
▶ Present state 
▶ NetEm 
- tc qdisc buckets 
- packet loss, delay, corruption, duplication, reordering 
- nsenter 
▶ Future 
- Docker exec 
63 / 92
Testing methods
Method I 
▶ Qdisc is detached after load 
▶ Objective 
- Time to recover of full cluster 
▶ Done with a larger subset 
65 / 92
Method II 
▶ Qdisc is kept till the end 
▶ Objective 
- Formation of primary component 
▶ Comparatively smaller subset 
66 / 92
Observations 
▶ Post sanity types 
- Why 
- Whom 
▶ Which method is more pertinent 
▶ State transfer issues 
- Beginning 
- During re-emergence 
67 / 92
Observations 
▶ Direct load to affected nodes 
▶ Logs 
- journalctl 
- Streaming? 
68 / 92
Other noises 
▶ Aim 
▶ Fsync 
- libeatmydata 
- Variance 
▶ Correlation with network 
▶ How with Docker 
- LD_PRELOAD 
69 / 92
System Load
Load generation 
▶ Sysbench 
- Generation 
- Reconnect on partition 
▶ Sockets chosen 
- Load on affected nodes 
▶ Distribution of Load 
- RR with socat 
- Native sysbench support 
- HAProxy? 
71 / 92
Load generation 
▶ Nature of data/load 
- DDL 
▶ RQG in future 
- Fuzz testing 
72 / 92
The Fix
Strike Out!
Eviction 
▶ STONITH 
▶ Permanent eviction 
▶ ’N’ strikes & out! 
- Timers - evs parameters 
- wsrep_evs_delayed and wsrep_evs_evict_list 
75 / 92
Eviction 
▶ Aim 
▶ Quorum required 
- Keep majority of the group live or to avoid eviction. 
- Why? - Not shoot each other 
- Non-PC nodes also. 
76 / 92
Eviction 
▶ Aim 
▶ Quorum required 
- Keep majority of the group live or to avoid eviction. 
- Why? - Not shoot each other 
- Non-PC nodes also. 
77 / 92
Eviction 
▶ EVS version and upgrade 
▶ TODO! 
- Ingress only 
- Follow here. 
▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. 
78 / 92
Coredumps with Docker 
▶ Breakdown of abstraction 
▶ Lack of isolation 
▶ What was done 
- Volumes 
- core_pattern & sysctl 
- suid and ulimit 
79 / 92
WAN Segments 
▶ How they work 
▶ Random allocation 
▶ Joiner starvation 
▶ Simulates data center 
▶ Donor selection 
80 / 92
The code 
▶ Github: https://github.com/percona/pxc-docker 
▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ 
▶ Contributions/testing welcome! 
▶ Dependencies 
- Sysbench 
81 / 92
Code: todo 
▶ Docker automated builds 
▶ Orchestration 
- Kubernetes, Mesos et.al. 
▶ Docker 
♦ Injection 
♦ Signal proxying 
82 / 92
Code: todo 
▶ Poll and exit during reconciliation 
- Instrument v/s number of nodes 
▶ Use actual channels 
▶ Run it bare - CoreOS etc. 
▶ Overlay with etcd/fleet/libswarm 
83 / 92
Future work
Future work 
▶ Fault injection 
♦ Memory 
- Poisoned memory/madvise 
♦ Disk 
- libeatmydata 
- Opposite 
- ENOSPC 
85 / 92
Fault injection 
▶ CPU 
- NUMA? 
- Hotplug 
▶ More network 
- corruption, duplication, reordering, rate-limit 
- Better distribution 
- Other shaping 
86 / 92
More Chaos
Future work 
▶ Disturb cluster more! 
- Membership changes 
* Manual eviction 
* Pull the cord! 
- Corrupt nodes 
▶ Consistency voting 
88 / 92
Further Reading 
▶ Jepsen 
▶ Byzantine fault tolerance 
- Reaching agreement in presence of faults 
▶ The Network is Reliable 
▶ NetEm 
▶ Latency: The New Web Performance Bottleneck 
▶ Galera Cluster Documentation 
▶ Auto eviction code 
▶ Don’t Settle for Eventual Consistency 
▶ Extended Virtual Synchrony 
89 / 92
About 
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. 
▶ Slides will be at slideshare. 
▶ About.me: raghavendra.prabhu 
▶ Keybase.io: rdprabhu 
▶ Presentation under CC BY-SA 4.0 
90 / 92
Image Credits 
▶ http://galeracluster.com/documentation-webpages/ 
▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ 
▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png 
▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 
▶ https://flic.kr/p/9J6GNu 
▶ https://secure.flickr.com/photos/brewbooks/7780990192 
▶ https://www.flickr.com/photos/kwerfeldein/2649294869 
▶ https://secure.flickr.com/photos/mindmob/51951632 
▶ https://secure.flickr.com/photos/arenamontanus/2227769907 
▶ https://www.flickr.com/photos/markop/477199204 
▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png 
▶ https://www.flickr.com/photos/gcwest/281385801 
▶ https://www.flickr.com/photos/opethdamna/360934079 
▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664 
▶ http://highload.co/i/logo.png 
▶ https://flic.kr/p/xTT8n 
▶ https://www.flickr.com/photos/29233640@N07/13466208953 
▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ 
91 / 92
Spaseeba & Da sveedaneeya!

Contenu connexe

Tendances

LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?Jérôme Petazzoni
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni
 
Distributed Lock Manager
Distributed Lock ManagerDistributed Lock Manager
Distributed Lock ManagerHao Chen
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upJonathan Lee
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionJérôme Petazzoni
 
Seven problems of Linux Containers
Seven problems of Linux ContainersSeven problems of Linux Containers
Seven problems of Linux ContainersKirill Kolyshkin
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Maciej Lasyk
 
Plmce2k15 15 tips galera cluster
Plmce2k15   15 tips galera clusterPlmce2k15   15 tips galera cluster
Plmce2k15 15 tips galera clusterFrederic Descamps
 
Docker: the road ahead
Docker: the road aheadDocker: the road ahead
Docker: the road aheadshykes
 
Ovs perf
Ovs perfOvs perf
Ovs perfMadhu c
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 
Percona XtraDB 集群内部
Percona XtraDB 集群内部Percona XtraDB 集群内部
Percona XtraDB 集群内部YUCHENG HU
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 

Tendances (20)

LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
moscmy2016: Extending Docker
moscmy2016: Extending Dockermoscmy2016: Extending Docker
moscmy2016: Extending Docker
 
Kubernetes 1001
Kubernetes 1001Kubernetes 1001
Kubernetes 1001
 
Docker vs kvm
Docker vs kvmDocker vs kvm
Docker vs kvm
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Distributed Lock Manager
Distributed Lock ManagerDistributed Lock Manager
Distributed Lock Manager
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-up
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
 
Seven problems of Linux Containers
Seven problems of Linux ContainersSeven problems of Linux Containers
Seven problems of Linux Containers
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
 
Plmce2k15 15 tips galera cluster
Plmce2k15   15 tips galera clusterPlmce2k15   15 tips galera cluster
Plmce2k15 15 tips galera cluster
 
Docker: the road ahead
Docker: the road aheadDocker: the road ahead
Docker: the road ahead
 
Ovs perf
Ovs perfOvs perf
Ovs perf
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Lecture set 7
Lecture set 7Lecture set 7
Lecture set 7
 
Percona XtraDB 集群内部
Percona XtraDB 集群内部Percona XtraDB 集群内部
Percona XtraDB 集群内部
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 

Similaire à Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

LinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challengesLinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challengesXavier Vello
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Novaclayton_oneill
 
Hands on Virtualization with Ganeti
Hands on Virtualization with GanetiHands on Virtualization with Ganeti
Hands on Virtualization with GanetiOSCON Byrum
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
DCEU 18: Tips and Tricks of the Docker Captains
DCEU 18: Tips and Tricks of the Docker CaptainsDCEU 18: Tips and Tricks of the Docker Captains
DCEU 18: Tips and Tricks of the Docker CaptainsDocker, Inc.
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Praguetomasbart
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLELinaro
 
FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018Xavier Mertens
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
CEPH中的QOS技术
CEPH中的QOS技术CEPH中的QOS技术
CEPH中的QOS技术suncbing1
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionBen Hall
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!All Things Open
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfstroganovboris
 

Similaire à Corpus collapsum: Partition tolerance of Galera in a noisy high load environment (20)

LinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challengesLinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challenges
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Nova
 
Hands on Virtualization with Ganeti
Hands on Virtualization with GanetiHands on Virtualization with Ganeti
Hands on Virtualization with Ganeti
 
Drbd
DrbdDrbd
Drbd
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
DCEU 18: Tips and Tricks of the Docker Captains
DCEU 18: Tips and Tricks of the Docker CaptainsDCEU 18: Tips and Tricks of the Docker Captains
DCEU 18: Tips and Tricks of the Docker Captains
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
oSC22ww4.pdf
oSC22ww4.pdfoSC22ww4.pdf
oSC22ww4.pdf
 
Docker
DockerDocker
Docker
 
Bsides final
Bsides finalBsides final
Bsides final
 
FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
CEPH中的QOS技术
CEPH中的QOS技术CEPH中的QOS技术
CEPH中的QOS技术
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and Production
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
 

Plus de Raghavendra Prabhu

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTARaghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesRaghavendra Prabhu
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTARaghavendra Prabhu
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitRaghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondRaghavendra Prabhu
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesRaghavendra Prabhu
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerRaghavendra Prabhu
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerRaghavendra Prabhu
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesRaghavendra Prabhu
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with KubernetesRaghavendra Prabhu
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Raghavendra Prabhu
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!Raghavendra Prabhu
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Raghavendra Prabhu
 
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm Raghavendra Prabhu
 
Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker Raghavendra Prabhu
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Raghavendra Prabhu
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Raghavendra Prabhu
 
Corpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to testCorpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to testRaghavendra Prabhu
 

Plus de Raghavendra Prabhu (20)

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and Profit
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and Beyond
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and Challenges
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and Opportunities
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services
 
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
 
Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body
 
Corpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to testCorpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to test
 

Dernier

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Dernier (20)

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 

Corpus collapsum: Partition tolerance of Galera in a noisy high load environment

  • 1. Corpus collapsum Partition tolerance of Galera in a noisy high load environment Highload++ 2014 Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  • 5. Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter Deutsch “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me 5 / 92
  • 6. Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter Deutsch “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me 6 / 92
  • 7. Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter Deutsch “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me 7 / 92
  • 8. Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” - Peter Deutsch “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me 8 / 92
  • 9. The fallacies ▶ The network is reliable ▶ Latency is zero ▶ Bandwidth is infinite ▶ The network is secure 9 / 92
  • 10. The fallacies ▶ Topology doesn’t change ▶ There is one administrator ▶ Transport cost is zero ▶ The network is homogeneous 10 / 92
  • 12. Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm 12 / 92
  • 13. Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm 13 / 92
  • 14. Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm 14 / 92
  • 15. Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter 15 / 92
  • 16. Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter 16 / 92
  • 17. Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why 17 / 92
  • 18. But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance 18 / 92
  • 19. But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance 19 / 92
  • 20. But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance 20 / 92
  • 21. But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance 21 / 92
  • 22. But why ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition 22 / 92
  • 24. Galera ▶ Data-centric approach ▶ EVS ▶ Causality and Synchronous ▶ Latency 24 / 92
  • 25.
  • 26.
  • 27.
  • 28. Where did it start
  • 29. Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HA goal 29 / 92
  • 30. One can bring the whole down
  • 32. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 32 / 92
  • 33. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 33 / 92
  • 34. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 34 / 92
  • 35. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 35 / 92
  • 36. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 36 / 92
  • 37. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 37 / 92
  • 38. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 38 / 92
  • 39. Basic Flow Jenkins Build image(s?) Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench 39 / 92
  • 40. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 40 / 92
  • 41. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 41 / 92
  • 42. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 42 / 92
  • 43. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 43 / 92
  • 44. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 44 / 92
  • 45. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 45 / 92
  • 46. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 46 / 92
  • 47. Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Examine! Cleanup Collect logs 47 / 92
  • 49. Parameters ▶ Sysbench ▶ Segments ▶ Reconciliation period ▶ Number of nodes with loss 49 / 92
  • 50. Parameters ▶ Sysbench ▶ Segments ▶ Reconciliation period ▶ Number of nodes with loss 50 / 92
  • 51. Parameters ▶ Sysbench ▶ Segments ▶ Reconciliation period ▶ Number of nodes with loss 51 / 92
  • 52. Parameters ▶ Sysbench ▶ Segments ▶ Reconciliation period ▶ Number of nodes with loss 52 / 92
  • 53. Parameters ▶ NetEm ▶ Detach qdisc or keep it? ▶ Fsync ▶ Shutdown 53 / 92
  • 54. Parameters ▶ NetEm ▶ Detach qdisc or keep it? ▶ Fsync ▶ Shutdown 54 / 92
  • 55. Parameters ▶ NetEm ▶ Detach qdisc or keep it? ▶ Fsync ▶ Shutdown 55 / 92
  • 56. Parameters ▶ NetEm ▶ Detach qdisc or keep it? ▶ Fsync ▶ Shutdown 56 / 92
  • 59. Docker ▶ Why not virtualize ♦ Occam ♦ Namespaces ▶ Simplicity ♦ Network ♦ One application per node 59 / 92
  • 60. Docker ▶ Portability - See same qualitative behavior that I do. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts 60 / 92
  • 61. Docker ▶ QEMU and Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels 61 / 92
  • 62. Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ More elegant methods - Swarm 62 / 92
  • 63. Noise ▶ Initial setup - Bridge - Egress only - IFB ▶ Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec 63 / 92
  • 65. Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset 65 / 92
  • 66. Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller subset 66 / 92
  • 67. Observations ▶ Post sanity types - Why - Whom ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence 67 / 92
  • 68. Observations ▶ Direct load to affected nodes ▶ Logs - journalctl - Streaming? 68 / 92
  • 69. Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker - LD_PRELOAD 69 / 92
  • 71. Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support - HAProxy? 71 / 92
  • 72. Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing 72 / 92
  • 75. Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list 75 / 92
  • 76. Eviction ▶ Aim ▶ Quorum required - Keep majority of the group live or to avoid eviction. - Why? - Not shoot each other - Non-PC nodes also. 76 / 92
  • 77. Eviction ▶ Aim ▶ Quorum required - Keep majority of the group live or to avoid eviction. - Why? - Not shoot each other - Non-PC nodes also. 77 / 92
  • 78. Eviction ▶ EVS version and upgrade ▶ TODO! - Ingress only - Follow here. ▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. 78 / 92
  • 79. Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit 79 / 92
  • 80. WAN Segments ▶ How they work ▶ Random allocation ▶ Joiner starvation ▶ Simulates data center ▶ Donor selection 80 / 92
  • 81. The code ▶ Github: https://github.com/percona/pxc-docker ▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ ▶ Contributions/testing welcome! ▶ Dependencies - Sysbench 81 / 92
  • 82. Code: todo ▶ Docker automated builds ▶ Orchestration - Kubernetes, Mesos et.al. ▶ Docker ♦ Injection ♦ Signal proxying 82 / 92
  • 83. Code: todo ▶ Poll and exit during reconciliation - Instrument v/s number of nodes ▶ Use actual channels ▶ Run it bare - CoreOS etc. ▶ Overlay with etcd/fleet/libswarm 83 / 92
  • 85. Future work ▶ Fault injection ♦ Memory - Poisoned memory/madvise ♦ Disk - libeatmydata - Opposite - ENOSPC 85 / 92
  • 86. Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping 86 / 92
  • 88. Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Consistency voting 88 / 92
  • 89. Further Reading ▶ Jepsen ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera Cluster Documentation ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency ▶ Extended Virtual Synchrony 89 / 92
  • 90. About ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare. ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 90 / 92
  • 91. Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/opethdamna/360934079 ▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664 ▶ http://highload.co/i/logo.png ▶ https://flic.kr/p/xTT8n ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ 91 / 92
  • 92. Spaseeba & Da sveedaneeya!