High availability is a core enterprise requirement in any modern deployment. This ensures that systems can continue to function correctly even when there are failures. Ideally such systems should continue to function correctly and provide services while there are several failures.
Another important aspect is that system deployment architectures should ensure there are no single points of failure. Redundancy is the most widely used technique in designing such fault tolerant systems. This includes data redundancy, information redundancy, time redundancy etc. However, redundancy always comes at a cost and hence there is a cost vs. availability trade-off. Certain availability techniques may also result in performance overheads.
In this webinar, Afkham Azeez, Director of Architecture, will take a look at how these techniques are employed in the WSO2 platform to achieve high availability and fault tolerance, while taking the cost and performance factors into consideration. We will also look at some key aspects which will enable enterprise architects to make deployment topology decisions based on availability requirements at an optimum cost.
Adjusting carbon topology to match high availability scenario requirements
1. Adjusting Carbon Topology to
Match High Availability Scenario
Requirements
Afkham Azeez
Director of Architecture
WSO2 Inc
1
2. About Me
• PMC member Apache Axis, Committer Synapse
& Web Services
• Member, Apache Software Foundation
• Co-author, Axis2 Web Services
• Director of Architecture, WSO2 Inc
• Blog: http://blog.afkham.org
• Twitter: afkham_azeez
2
3. Agenda
• A brief look at the WSO2 platform
• Carbon clustering for availability
• Cost of availability & related topologies
3
4. WSO2 Offerings
• WSO2 Carbon
• Full platform of servers for deployment on-premise, in private or public cloud
• Products share a consistent architecture and core platform services (e.g.
logging, management, security, identity, caching) through OSGi and the “Carbon
Core”
• Includes ESB, AppServer, Data Services, Governance, Identity, Business
Process, and more
• WSO2 Stratos
• Platform-as-a-Service (PaaS) Foundation
• Supports running servers as elastic, metered, billed, multi-tenant with self-service
• Including all Carbon Servers, PHP, Jetty, and a growing list through a standard Cartridge
model
• WSO2 StratosLive
• http://stratoslive.wso2.com
• WSO2’s Public PaaS
• An instance of Stratos running in the cloud with all Carbon Servers available 4
5. Consistent Architecture
• Carbon: A consistent set of class-leading enterprise servers
• The same products run either on-premise or in the cloud, single-tenant or multi-
tenant
• Utilize the same Carbon core runtime for a seamless experience
• Stratos: A cloud platform for enterprise, hybrid and public deployment
• Extends the deployment to support full self-service, elastic scaling, metering and
billing
• Supports Carbon and native server runtimes
• Including Java and non-Java servers such as Jetty and PHP
• Re-uses the same core Carbon architecture to offer core PaaS services including:
• Identity, Logging, File, Relational Storage, Column Storage, Code Deployment, etc
• Both projects share a common set of OSGi modules and a core runtime
architecture
5
8. Availability
The degree to which a system, subsystem, or
equipment is in a specified operable and
committable state at the start of a mission, when
the mission is called for at an unknown, i.e., a
random, time.
Simply put, availability is the proportion of time a
system is in a functioning condition.
8
11. High Availability (HA)
A system that is designed for continuous operation in the
event of a failure of one or more components. However,
the system may display some degradation of service, but
will continue to perform correctly.
The proportion of time during which the service is
accessible with reasonable response times should be
close to 100%.
All single points of failure should be eliminated
11
12. HA, CO & CA
• Continuous Operation (CO)
• Ability to avoid planned outages.
• hardware and software maintenance carried out
while applications remains available users.
• Continuous Availability (CA)
• Combines the characteristics of HA and CO to keep
the applications running without any noticeable
downtime
• Hot update/ graceful round-robin restart
12
13. High Availability Techniques
• Redundancy
• Time – retransmit
• Data – e.g. parity bits
• Processing – e.g. redundant nodes
• Diversity
• e.g. Hybrid deployments, do the same thing using
different implementations
13
14. How to decide required availability?
• Average throughput (TPS)
• Max throughput (TPS)
• Monetary value of a transaction
• Average loss & max loss per second of
downtime
• Decide on how much to invest on availability
based on cost vs. benefit tradeoff
14
15. Patching Production Deployments
Patch Distribution Coordinator
1. Check patch list
2.Pull new patch
3. Push patch 3. Push patch
3. Push patch
3. Push patch
15
22. Dynamic membership
• No predefined members
• Nodes can join & leave
Dynamic group
M1 M2 N
Join
M3 M4
22
23. Hybrid membership
• Some predefined (well-known) members, and some
dynamic members
• Nodes can join & leave
• Membership revolves around the static members
Hybrid group
Dynamic members Static members N
Join
M5 M6 M1 M2 (IP, Port)
M7 M3 M4
23
25. Well-known Address (WKA) based
membership management
Hybrid group
Dynamic members Static members
M6
M5
WK1 N
WK2
Notify Join (IP,
Port)
M7 WK3 WK4
25
26. Multicast vs. WKA
Multicast WKA
All nodes should be in the same subnet Nodes can be in different networks
All nodes should be in the same multicast
domain No multicasting requirement
Multicasting should not be blocked
No fixed IP addresses or hosts required At least one well-known IP address or host
required
Failure of any member does not affect New members can join with some WKA
membership discovery nodes down, but not if all WKA nodes are
down
Does not work on IaaSs such as Amazon IaaS-friendly
EC2
Requires keepalived, elastic IPs or some
other mechanism for remapping IP
addresses of WK members in cases of
failure
26
27. Multicast vs. WKA – how to decide?
• Multicast
• Cluster is going to be setup in a network where
multicasting is allowed
• WKA
• Cloud based deployment
• Members are distributed across datacenters &
regions
• Multicasting blocked
27
29. State Replication
JSR-107/JCache
A standard Java Caching API for use by developers and a standard SPI ("Service Provider
Interface") for use by implementers.
import javax.cache.*
…
CacheManager cacheMgr = Caching.getCacheManager();
Cache<String, Integer> cache =cacheMgr .getCache(cacheName);
cache.put(“key”, sampleValue);
Integer i = cache.get(“key”);
29
30. State Replication
CarbonContext based API
Cache cache = CarbonContext.getCurrentContext().getCache();
cache.put(“key”, sampleValue);
Integer i = cache.get(“key”);
Axis2 Contexts
Using Axis2 clustering StateManager – axis2.xml
<stateManager class="org.apache.axis2.clustering.state.DefaultStateManager” enable=”true">
30
31. Elastic Load Balancer 2.0
• New sysadmin-friendly configuration language
• High performance PassThrough transport
• Tenant-aware load balancing
• Ability to dedicate clusters for tenants (private
jet mode)
• Improved auto-scaler
• Separate IaaS-aware Cloud controller takes care of
spawning new instances on different IaaSs
31
33. Private Jet mode
• Analogy
• Economy class
• no SLA management, only elasticity
• Business class
• elasticity plus SLA guarantees
• Private Jet
• Guaranteed isolated VMs or machines for a specific
tenant
• Still elastically scaled
40. Management & Worker Node Separation
• Proper separation of concerns - management nodes
specialize in management of the setup while worker nodes
specialize in serving requests to deployment artifacts
• Only management nodes are authorized to add new artifacts
into the system or make configuration changes
• Worker nodes can only deploy artifacts & read configuration
• Lower memory foot in the worker nodes because the
management console related OSGi bundles are not loaded
• Improved security - management nodes can be behind the
internal firewall & be exposed to clients running within the
organization only, while worker nodes can be exposed to
external clients.
• Isolation of failures
40
47. Multiple IaaS (hybrid) Deployment
HIGHEST
Zone 1
Private cloud (data center) Zone 2
Zone 1
Zone 2
Amazon EC2
Zone 1
Availability
Cost
Zone 2
LOWEST
Rackspace Cloud
47
48. Single Node
Primary-Secondary, single LB
Primary-Secondary,
with multiple LBs
Multi-node active cluster
- Single zone
Cost of Availability
Multi-zone
Multi-region
Multi-IaaS
48
49. HA for the Load Balancer
• Load balancer cluster
• Keepalived
• Elastic IP address
• Round Robin DNS
49
50. Monitoring Servers
• Monit
• Automatically provide alerts & restart processes
when monitored items (e.g. latency) fall below
certain thresholds.
• New Relic
• Nagios
50
51. References
Information on tenant-aware load balancing
http://sanjeewamalalgoda.blogspot.com/2012/03/tenant-aware-load-balancer-is-upcoming.html
http://sanjeewamalalgoda.blogspot.com/2012/05/tenant-aware-load-balancer.html
Scaling Stratos
http://srinathsview.blogspot.com/2012/06/scaling-wso2-stratos.html
http://blog.afkham.org/2011/09/how-to-setup-wso2-elastic-load-balancer.html
http://blog.afkham.org/2011/09/wso2-load-balancer-how-it-works.html
51
Fox Mobile who ran for two years with zero downtime and multiple updates including a hardware refresh.
Membership modes – multicast & wkaA look at the cluster configuration
Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region. Amazon EC2 is currently available in three Regions: the US East (Northern Virginia) Region and the US West (Northern California) Region in the United States, and the EU (Ireland) Region in Europe."