SlideShare une entreprise Scribd logo
1  sur  88
Télécharger pour lire hors ligne
Corpus collapsum 
Partition tolerance of Galera put to test 
RICON 2014 
Raghavendra Prabhu 
 raghavendra.d.prabhu@gmail.com 
Percona  raghavendra.prabhu@percona.com 
 randomsurfer  wnohang.net  rdprabhu  ronin13
The Title?
Our Cluster
Split brain
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed 
system. ” 
“ A distributed system is one in which the failure of a 
computer you didn’t even know existed can render your own 
computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately 
explained by stupidity. ” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be 
explained by an ill node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed 
system. ” 
“ A distributed system is one in which the failure of a 
computer you didn’t even know existed can render your own 
computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately 
explained by stupidity. ” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be 
explained by an ill node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed 
system. ” 
“ A distributed system is one in which the failure of a 
computer you didn’t even know existed can render your own 
computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately 
explained by stupidity. ” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be 
explained by an ill node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed 
system. ” 
“ A distributed system is one in which the failure of a 
computer you didn’t even know existed can render your own 
computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately 
explained by stupidity. ” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be 
explained by an ill node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
20000 feet view
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 8 / 58
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 8 / 58
Introduction 
Actors 
▶ Jenkins 
♦ Build flow and CI 
▶ Storage 
♦ Why 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 9 / 58
Details 
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
Details 
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
Details 
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
Details 
But why 
▶ The ‘P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
Details 
But why 
▶ Failures in warehouses. 
▶ Not quorum, but consensus. 
▶ Real world networks and synchronous replication 
- Delay 
- Partition 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 11 / 58
Galera
Details 
Galera 
▶ Data-centric approach 
▶ EVS 
▶ Causality and Synchronous 
▶ Latency 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 13 / 58
Where did it start
Details 
Where did it start 
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 
▶ Loss of PC 
▶ Crash 
▶ HA goal 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 18 / 58
One can bring the whole 
down
The Flow
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
Details 
Cluster Resilience 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 23 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
Containers!
Details 
Docker 
▶ Why not virtualize 
♦ Occam 
♦ Namespaces 
▶ Simplicity 
♦ Network 
♦ One application per node 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 27 / 58
Details 
Docker 
▶ Portability 
- See same qualitative behavior that I do. 
▶ Reproducibility 
- Makes it determinstic 
▶ Configurable and CI 
- Byproducts 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 28 / 58
Details 
Docker 
▶ QEMU and Docker 
▶ Scalability 
♦ Performance 
♦ Feature 
▶ Abstraction of channels 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 29 / 58
Details 
Container Networking 
▶ Linking didn’t help 
▶ Dnsmasq to rescue! 
♦ Hosts file and volumes 
♦ SIGHUP and refresh 
▶ More elegant methods 
- Swarm 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 30 / 58
Details 
Noise 
▶ Initial setup 
- Bridge 
- Egress only 
- IFB 
▶ Present state 
▶ NetEm 
- tc qdisc buckets 
- packet loss, delay, corruption, duplication, reordering 
- nsenter 
▶ Future 
- Docker exec 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 31 / 58
Testing methods
Details 
Method I 
▶ Qdisc is detached after load 
▶ Objective 
- Time to recover of full cluster 
▶ Done with a larger subset 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 33 / 58
Details 
Method II 
▶ Qdisc is kept till the end 
▶ Objective 
- Formation of primary component 
▶ Comparatively smaller set 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 34 / 58
Details 
Observations 
▶ Post sanity types 
- Why 
▶ Which method is more pertinent 
▶ State transfer issues 
- Beginning 
- During re-emergence 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 35 / 58
Details 
Observations 
▶ Direct load to affected nodes 
▶ Logs 
- journalctl 
- Streaming? 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 36 / 58
Details 
Other noises 
▶ Aim 
▶ Fsync 
- libeatmydata 
- Variance 
▶ Correlation with network 
▶ How with Docker 
- LD_PRELOAD 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 37 / 58
System Load
Details 
Load generation 
▶ Sysbench 
- Generation 
- Reconnect on partition 
▶ Sockets chosen 
- Load on affected nodes 
▶ Distribution of Load 
- RR with socat 
- Native sysbench support 
- HAProxy? 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 39 / 58
Details 
Load generation 
▶ Nature of data/load 
- DDL 
▶ RQG in future 
- Fuzz testing 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 40 / 58
The Fix
Strike Out!
Details 
Eviction 
▶ STONITH 
▶ Permanent eviction 
▶ ’N’ strikes & out! 
- Timers - evs parameters 
- wsrep_evs_delayed and wsrep_evs_evict_list 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 43 / 58
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other 
- Non-PC nodes also. 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 44 / 58
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other 
- Non-PC nodes also. 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 44 / 58
Details 
Eviction 
▶ EVS version and upgrade 
▶ TODO! 
- Ingress only 
- Follow here. 
▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from 
codership. 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 45 / 58
Details 
Coredumps with Docker 
▶ Breakdown of abstraction 
▶ Lack of isolation 
▶ What was done 
- Volumes 
- core_pattern & sysctl 
- suid and ulimit 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 46 / 58
Details 
WAN Segments 
▶ How they work 
▶ Random allocation 
▶ Joiner starvation 
▶ Simulates data center 
▶ Donor selection 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 47 / 58
Epilogue 
The code 
▶ Github: https://github.com/percona/pxc-docker 
▶ Jenkins: 
http://jenkins.percona.com/job/PXC-5.6-netem/ 
▶ Contributions/testing welcome! 
▶ Dependencies 
- Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 48 / 58
Epilogue 
Code: todo 
▶ Docker automated builds 
▶ Orchestration 
▶ Docker 
♦ Injection 
♦ Signal proxying 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 49 / 58
Epilogue 
Code: todo 
▶ Use Hoare’s channels - Go! 
▶ Run it bare - CoreOS 
▶ Overlay with etcd/fleet/libswarm 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 50 / 58
Future work
Epilogue 
Future work 
▶ Fault injection 
♦ Memory 
- Poisoned memory 
♦ Disk 
- libeatmydata 
- Opposite 
- ENOSPC 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 52 / 58
Epilogue 
Fault injection 
▶ CPU 
- NUMA? 
- Hotplug 
▶ More network 
- corruption, duplication, reordering, rate-limit 
- Better distribution 
- Other shaping 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 53 / 58
More Chaos
Epilogue 
Future work 
▶ Disturb cluster more! 
- Membership changes 
* Manual eviction 
* Pull the cord! 
- Corrupt nodes 
▶ Consistency voting 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 55 / 58
Epilogue 
Further Reading 
▶ Byzantine fault tolerance 
- Reaching agreement in presence of faults 
▶ The Network is Reliable 
▶ NetEm 
▶ Latency: The New Web Performance Bottleneck 
▶ Galera Cluster Documentation 
▶ Auto eviction code 
▶ Don’t Settle for Eventual Consistency 
▶ Extended Virtual Synchrony 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 56 / 58
Epilogue 
About 
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB 
Cluster, Percona. 
▶ Slides will be at slideshare.net/slidunder and owncloud 
▶ About.me: raghavendra.prabhu 
▶ Keybase.io: rdprabhu 
▶ Presentation under CC BY-SA 4.0 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 57 / 58
Epilogue 
Image Credits 
▶ http://galeracluster.com/documentation-webpages/ 
▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ 
▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png 
▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 
▶ https://flic.kr/p/9J6GNu 
▶ https://secure.flickr.com/photos/brewbooks/7780990192 
▶ https://www.flickr.com/photos/kwerfeldein/2649294869 
▶ https://secure.flickr.com/photos/mindmob/51951632 
▶ https://secure.flickr.com/photos/arenamontanus/2227769907 
▶ https://www.flickr.com/photos/markop/477199204 
▶ https://www.flickr.com/photos/gcwest/281385801 
▶ https://www.flickr.com/photos/29233640@N07/13466208953 
▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ 
Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 58 / 58

Contenu connexe

Similaire à Corpus collapsum: Partition tolerance of Galera put to test

ACIDic Clusters: Review of current relation databases with synchronous replic...
ACIDic Clusters: Review of current relation databases with synchronous replic...ACIDic Clusters: Review of current relation databases with synchronous replic...
ACIDic Clusters: Review of current relation databases with synchronous replic...Raghavendra Prabhu
 
Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...NETWAYS
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonPhilip Tellis
 
Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Ptah Dunbar
 
Drupal and Varnish Reverse Proxy
Drupal and Varnish Reverse ProxyDrupal and Varnish Reverse Proxy
Drupal and Varnish Reverse ProxyVFXCode
 
Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...Puppet
 
Automated Testing in WordPress, Really?!
Automated Testing in WordPress, Really?!Automated Testing in WordPress, Really?!
Automated Testing in WordPress, Really?!Ptah Dunbar
 
The MetaCPAN VM for Dummies Part One (Installation)
The MetaCPAN VM for Dummies Part One (Installation)The MetaCPAN VM for Dummies Part One (Installation)
The MetaCPAN VM for Dummies Part One (Installation)Olaf Alders
 
Percona XtraDB Cluster before every release: Glimpse into CI testing
Percona XtraDB Cluster before every release: Glimpse into CI testingPercona XtraDB Cluster before every release: Glimpse into CI testing
Percona XtraDB Cluster before every release: Glimpse into CI testingRaghavendra Prabhu
 
Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet
 
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014Puppet
 

Similaire à Corpus collapsum: Partition tolerance of Galera put to test (13)

ACIDic Clusters: Review of current relation databases with synchronous replic...
ACIDic Clusters: Review of current relation databases with synchronous replic...ACIDic Clusters: Review of current relation databases with synchronous replic...
ACIDic Clusters: Review of current relation databases with synchronous replic...
 
Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy Person
 
Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013
 
Drupal and Varnish Reverse Proxy
Drupal and Varnish Reverse ProxyDrupal and Varnish Reverse Proxy
Drupal and Varnish Reverse Proxy
 
Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...Building scalable applications while scaling your infrastructure by rhommel l...
Building scalable applications while scaling your infrastructure by rhommel l...
 
Automated Testing in WordPress, Really?!
Automated Testing in WordPress, Really?!Automated Testing in WordPress, Really?!
Automated Testing in WordPress, Really?!
 
The MetaCPAN VM for Dummies Part One (Installation)
The MetaCPAN VM for Dummies Part One (Installation)The MetaCPAN VM for Dummies Part One (Installation)
The MetaCPAN VM for Dummies Part One (Installation)
 
Percona XtraDB Cluster before every release: Glimpse into CI testing
Percona XtraDB Cluster before every release: Glimpse into CI testingPercona XtraDB Cluster before every release: Glimpse into CI testing
Percona XtraDB Cluster before every release: Glimpse into CI testing
 
Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
 
Sap snc configuration
Sap snc configurationSap snc configuration
Sap snc configuration
 
Survey of Percona Toolkit
Survey of Percona ToolkitSurvey of Percona Toolkit
Survey of Percona Toolkit
 
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014
Continuously Testing Infrastructure - Beyond Module Testing - PuppetConf 2014
 

Plus de Raghavendra Prabhu

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTARaghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesRaghavendra Prabhu
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTARaghavendra Prabhu
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitRaghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondRaghavendra Prabhu
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesRaghavendra Prabhu
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerRaghavendra Prabhu
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerRaghavendra Prabhu
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesRaghavendra Prabhu
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with KubernetesRaghavendra Prabhu
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Raghavendra Prabhu
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!Raghavendra Prabhu
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Raghavendra Prabhu
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Raghavendra Prabhu
 
Running virtualized Galera instances for fun and profit
Running virtualized Galera instances for fun and profitRunning virtualized Galera instances for fun and profit
Running virtualized Galera instances for fun and profitRaghavendra Prabhu
 
Feed me more: MySQL Memory analysed
Feed me more: MySQL Memory analysedFeed me more: MySQL Memory analysed
Feed me more: MySQL Memory analysedRaghavendra Prabhu
 

Plus de Raghavendra Prabhu (20)

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and Profit
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and Beyond
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and Challenges
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and Opportunities
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body
 
Running virtualized Galera instances for fun and profit
Running virtualized Galera instances for fun and profitRunning virtualized Galera instances for fun and profit
Running virtualized Galera instances for fun and profit
 
Feed me more: MySQL Memory analysed
Feed me more: MySQL Memory analysedFeed me more: MySQL Memory analysed
Feed me more: MySQL Memory analysed
 
Xtrabackup and FTWRL
Xtrabackup and FTWRLXtrabackup and FTWRL
Xtrabackup and FTWRL
 
MySQL-and-virtualization
MySQL-and-virtualizationMySQL-and-virtualization
MySQL-and-virtualization
 

Dernier

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 

Dernier (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 

Corpus collapsum: Partition tolerance of Galera put to test

  • 1. Corpus collapsum Partition tolerance of Galera put to test RICON 2014 Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  • 5. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
  • 6. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
  • 7. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
  • 8. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 5 / 58
  • 10. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
  • 11. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
  • 12. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 7 / 58
  • 13. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 8 / 58
  • 14. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 8 / 58
  • 15. Introduction Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 9 / 58
  • 16. Details But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
  • 17. Details But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
  • 18. Details But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
  • 19. Details But why ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 10 / 58
  • 20. Details But why ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 11 / 58
  • 22. Details Galera ▶ Data-centric approach ▶ EVS ▶ Causality and Synchronous ▶ Latency Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 13 / 58
  • 23.
  • 24.
  • 25.
  • 26. Where did it start
  • 27. Details Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HA goal Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 18 / 58
  • 28. One can bring the whole down
  • 30. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 31. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 32. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 33. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 34. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 35. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 36. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 37. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 21 / 58
  • 38. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 39. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 40. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 41. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 42. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 43. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 44. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 45. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 22 / 58
  • 46. Details Cluster Resilience Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 23 / 58
  • 47. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
  • 48. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
  • 49. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
  • 50. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 24 / 58
  • 51. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
  • 52. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
  • 53. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
  • 54. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 25 / 58
  • 56. Details Docker ▶ Why not virtualize ♦ Occam ♦ Namespaces ▶ Simplicity ♦ Network ♦ One application per node Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 27 / 58
  • 57. Details Docker ▶ Portability - See same qualitative behavior that I do. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 28 / 58
  • 58. Details Docker ▶ QEMU and Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 29 / 58
  • 59. Details Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ More elegant methods - Swarm Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 30 / 58
  • 60. Details Noise ▶ Initial setup - Bridge - Egress only - IFB ▶ Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 31 / 58
  • 62. Details Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 33 / 58
  • 63. Details Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller set Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 34 / 58
  • 64. Details Observations ▶ Post sanity types - Why ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 35 / 58
  • 65. Details Observations ▶ Direct load to affected nodes ▶ Logs - journalctl - Streaming? Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 36 / 58
  • 66. Details Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker - LD_PRELOAD Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 37 / 58
  • 68. Details Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support - HAProxy? Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 39 / 58
  • 69. Details Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 40 / 58
  • 72. Details Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 43 / 58
  • 73. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 44 / 58
  • 74. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 44 / 58
  • 75. Details Eviction ▶ EVS version and upgrade ▶ TODO! - Ingress only - Follow here. ▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 45 / 58
  • 76. Details Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 46 / 58
  • 77. Details WAN Segments ▶ How they work ▶ Random allocation ▶ Joiner starvation ▶ Simulates data center ▶ Donor selection Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 47 / 58
  • 78. Epilogue The code ▶ Github: https://github.com/percona/pxc-docker ▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ ▶ Contributions/testing welcome! ▶ Dependencies - Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 48 / 58
  • 79. Epilogue Code: todo ▶ Docker automated builds ▶ Orchestration ▶ Docker ♦ Injection ♦ Signal proxying Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 49 / 58
  • 80. Epilogue Code: todo ▶ Use Hoare’s channels - Go! ▶ Run it bare - CoreOS ▶ Overlay with etcd/fleet/libswarm Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 50 / 58
  • 82. Epilogue Future work ▶ Fault injection ♦ Memory - Poisoned memory ♦ Disk - libeatmydata - Opposite - ENOSPC Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 52 / 58
  • 83. Epilogue Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 53 / 58
  • 85. Epilogue Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Consistency voting Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 55 / 58
  • 86. Epilogue Further Reading ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera Cluster Documentation ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency ▶ Extended Virtual Synchrony Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 56 / 58
  • 87. Epilogue About ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare.net/slidunder and owncloud ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 57 / 58
  • 88. Epilogue Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ Raghavendra Prabhu (Percona) Corpus collapsum 28 October, 2014 58 / 58