This document discusses using HAProxy as a load balancer and discusses its benefits and uses. It describes how HAProxy can be used to make non-highly available services highly available and provide load balancing. It also discusses ways to address potential single points of failure with HAProxy, such as using Corosync and Pacemaker to set up redundant HAProxy clusters. Upstream failures from slow databases or application servers are also addressed.
4. Simplified Web Architecture
Web
Clients Dynamic “Data”
Server
PHP Memcache
iPhones Nginx
PostgreSQL
Ruby
MySQL
Androids Apache Perl Mongo
CouchDB
Python
Redis
Browsers lighttpd
Node.js Oracle
5. ChOP Archtiecture
Web
Clients Dynamic “Data”
Server
Memcache
iPhone
MySQL
Android Nginx PHP5-FPM
Redis
Desktop
Chat
6. YouVersion Architecture
Web
Clients Dynamic “Data”
Server
Memcache
iPhone
PHP5-FPM
PostgreSQL
Android Nginx
Mongo
Ruby
(coming
Desktop soon)
Oracle
7. HAProxy
¡ High Availability Proxy
¡ TCP load balancing proxy with awesome health
checking built in
¡ Fast
¡ Scalable
¡ Makes non-HA services HA
8. How I Love Thee, Let Me
Count The Ways…
¡ Rock solid
¡ Dead simple to run and configure
¡ Comprehensive Health Checking
¡ Lots of statistics
9.
10. HAProxy Uses
¡ Not really a service unto itself
¡ Fits into the gaps between layers well
¡ Issue: Becomes a single point of failure itself
HAProxy HAProxy* HAProxy*
Web Dynamic
Clients “Data”
Server Engine
* – potential future use
11. Eliminating SPOFs
¡ Two types of HAProxy SPOFs:
¡ Service Outage
(Hardware failure or HAProxy service failure)
¡ HAProxy Limit Outage / Upstream Outage
(Hit some arbitrary limit we defined somewhere or
ran out of some slots somewhere)
12. Service Outage
¡ HAProxy service crashes or dies for some reason
(has never happened, knock on wood)
¡ Hardware / Network Failure
13. Service Outage: Solution
¡ Corosync & Pacemaker
¡ Hard to configure at first, but don’t really need to
touch it later
¡ Pretty much magic
¡ Two Corosync HAProxy clusters: DFW and SAN
¡ Setup is blogged about here:
http://itand.me/41901523
14. HAProxy Limit Outage /
Upstream Outage
¡ Usually because of an outage further upstream
at the Dynamic or “Data” layer
¡ Completely Hypothetical Situation: Mongo slows
down, causing PHP processes to back up,
causing the connection limit to go through the
roof, causing total outage
15. What it looks like on the
graph (Yesterday)
OR: WHY WE MUST MOVE MONGO STAT!
17. Upstream Outage
¡ Usually the result of running out of PHP processes.
¡ Normally each PHP process can process
hundreds of req/s
¡ Something slows them down (mongo, postgres,
et al) so a process can only process a smaller
number of req / s (or, worse, seconds / req)
¡ Inevitably, these requests take all PHP processes,
nothing else can run and HAProxy fails all health
checks and shows you Binary Jesus
18. “Solutions”
¡ Start Hashing URLs to avoid upstream failures
¡ Want to send all URL requests to the same app server
so if it’s slow only that app server goes down
¡ Some benefit to caching as well
¡ Challenge: want to hash only part of a URL
¡ Challenge: need to separate app servers into
“availability groups”
¡ Challenge: deployments, monitoring, alerting, all
that crap…
19. HAProxy Limit Outage
¡ We set limits on all HAProxy backends and front
ends and servers to ensure they don’t get
overwhelmed
¡ Sometimes these limits are too low
¡ Solution: Raise them
¡ Challenge: Raise them too high without regard
for the backend, and you could cause more
harm than good (Stampeding Herd)