SlideShare a Scribd company logo
1 of 54
Download to read offline
© 2015 IBM Corporation
Planning for Catastrophe with
IBM WebSphere Application Server &
IBM Business Process Manager
Tom Alcott STSM
Chris Richardson STSM
This Session
• This session will focus on the architectural and operational issues that need to
be considered when planning and implementing a Disaster Recovery plan with
WebSphere Application Server and IBM BPM. Topics will include use of
multiple data centers, geographic separation constraints, supporting software
components, disaster recovery and other common deployment issues. Though
focused primarily on WebSphere Application Server and IBM BPM this session
also applies to IBM Middleware that is deployed on WebSphere Application
Server as well as Pure Application System.
• While not a prerequisite, attendees should be familiar with the material
covered in " Preparing to Fail, Practical WebSphere Application Server High
Availability”
Introduction
• Why Are We Here?
• To Avoid This
Agenda
• Concepts
• Disaster Recovery
• Multiple Cells and Data Centers
• WebSphere Application Server Recovery
• IBM BPM Recovery
• Final Thoughts
Definitions
• Redundancy
• The provision of additional or duplicate systems, equipment, etc., that
function in case an operating part or system fails, as in a spacecraft.
• Isolated
• Separated from other persons or things; alone; solitary
• Independent
• Not dependent; not depending or contingent upon something else for
existence, operation, etc.
• All of the Above are Fundamental for Effective High Availability and
Disaster Recovery
Definitions
High Availability (HA)
• Ensuring that the system can continue to process work within one
location after routine single component failures
• Usually we assume a single failure
• Usually the goal is very brief disruptions for only some users for
unplanned events
Continuous Operations
• Ensuring that the system is never unavailable during planned
activities
• E.g., if the application is upgraded to a new version, we do it in a
way that avoids all downtime
Continuous Availability (CA)
• High Availability coupled with Continuous Operations
• No tolerance for planned downtime
• Little unplanned downtime as possible
• Very expensive
• Note that while achieving CA almost always requires an
aggressive DR plan, they are not the same thing
Definitions
Background: High Availability in one picture
7
IP Sprayer
Node Agt
Node1
Server 1Cluster “A”
Web Server
User Registry
DMgr
IHS
IHS
Server 1
Server 2
Node Agt
Node2
Server 2
Server 2
Shared
Filesystem
WAS Txn Logs
Database
Storage (SAN)
Server 2
Cluster “B”
Cluster “C”
• Clustered IP Sprayer and Firewalls (not
depicted)
• Clustered HTTP Servers
• WAS-ND Cell with Clustered Application
Servers
• User Registry (LDAP) Hardware Clustered with
Shared Disk
• Database Hardware Clustered With Shared
Disk
• JMS Provider (Not Depicted)
• WAS Messaging Engine with Shared Disks/DB
• External JMS with Hardware Cluster and
Shared Disk
• Transaction Logs on Shared File System
• Clusters of “2” Provide High Availability,
• Don’t Forget “Rule of 3”,
• When Using With Clusters of 2
• An Outage (Planned or Unplanned) Reduces
Capacity by 50%
• Is No Longer Fault Tolerant
Disaster Recovery (DR)
• Ensuring that the system can be reconstituted and/or activated at another location and
can process work after an unexpected catastrophic failure at one location
• Often multiple single failures (which are normally handled by high availability techniques)
is considered catastrophic
• There may or may not be significant downtime as part of a disaster recovery
• This environment may be substantially smaller than the entire production environment, as
only a subset of production applications demand DR
• Normally based on justifiable business need.
• Recovery Time Objective (RTO)
• Service Recovery with little to no interruption
• Recovery Point Objective (RPO)
• Data Recovery and acceptable data loss
Definitions
• Service Levels (SLAs ) cover many things, our focus is availability aspects
• You need a clear set of requirements that define precisely the availability
requirements of the system, taking into account
• Components of the system
 A system has many pieces and business aspects, how do their requirements differ?
‒ Responsiveness and throughput requirements
 100% of requests aren't going to work perfectly 100% of the time
• Degraded services requirements
 Does everything have to meet the responsiveness requirements ALL the time?
• Dependent system requirements
 What are the implications if a system on which you depend is down?
• Data Loss
• Application Data
• Application State (Is this Critical in a Disaster?)
• Maintenance
 Change occurs, how does that affect availability?
• Disaster Recovery
 The unimaginable happens, then what?
Definitions
HA Service Level Example
SLA External
Commitment
SLA Internal Target
Service Timeframe 7 x 24 7 x 24
Application
Processing Availability
99.5% per month 99.7% per month
Recovery Time
Objective
4 Hours 1 Hour
Maintenance Window Tue-Thurs
3:00 - 6:00 am
Tue-Thurs
3:00 - 6:00 am
99.5% = 3.60 Hours
Downtime/Month
99.7% = 2.16 Hours
Downtime/Month
DR Service Level Example
SLA External
Commitment
SLA Internal Target
Recovery Time
Objective
16 Hours 4 Hours
Recovery Point
Objective
~ 0 (No Data Loss) ~ 0 (No Data Loss)
Note: This is Recovery of an Entire Data Center with 100’s of Servers,
Application, Database, Messaging, etc
Agenda
• Concepts
• Disaster Recovery
• Multiple Cells and Data Centers
• WebSphere Application Server Recovery
• IBM BPM Recovery
• Final Thoughts
Stage 0 DR – a sound HA strategy
• HA is cheaper and Less Complex Than DR .
• A Robust HA Solution prevents small failures from
becoming disasters
• Don’t let a (relatively) minor failure become a
catastrophe
• Eliminate all single points of failure in your primary
datacenter
• Spread Workloads Across Multiple Servers (and
Hypervisors ! )
• Add 2nd Production WAS-ND Cell
• Consider DB Replication, DB2 HADR, Oracle RAC in
conjunction with hardware clustering
• LDAP/Registry Replication
• Otherwise, an HA event could force you to enact
your DR procedure
• Database is only replicated HA in Different Data
Center
13
IP Sprayer
Node Agt
Node1
Server 1Cluster “A”
Web Server
User Registry
DMgr
IHS
IHS
Server 1
Server 2
Node Agt
Node2
Server 2
Server 2
Shared
Filesystem
WAS Txn Logs
Database
Storage (SAN)
Server 2
Cluster “B”
Cluster “C”
Multiple Data Center Options (1/4)
• Classic DR
• Active/Passive
• Two Data Centers, one Serving Requests the other Idling
• Independent Cells
• Easier Than Active/Active
• User and Application State Synchronization are Less Critical
• Asynchronous Replication Is Likely Sufficient
• Lower Cost for Network and Hardware Capacity
• From a Capacity perspective One Data Center is Being Underutilized.
• Typically Does Not Incur S/W License Charges When Idle
• If You Don’t Pay for S/W Licenses Is Cost and Underutilization Still a Concern?
• WebSphere License Provides for
o Hot – Processing Requests License Required
o Warm – Started But Not Processing Requests, License Not Required
o Cold – Installed, But Not Started, License Not Required
• DB2 and MQ Require < 100 % of Hot Licenses for Replication
Multiple Data Center Options (2/4)
• “Active/Active” with Single Set of Active Databases
• Two Data Centers
• Independent Application Cells
• Serving Requests for Same Applications
• Database(s) Only Active in One Data Center
– Additional Latency for Application Data Requests from Remote Data Center
– Request Processing Interruption When Data Replica is Promoted to Primary
Multiple Data Center Options (3/4)
• Classic “Active/Active”
• Two Data Centers
• Independent Cells and Synchronized Resource Managers (DBs)
• Serving Requests for Same Applications
• Requires Shared Application Data
o Application Data Consistency is prerequisite to any other planning
o Simultaneous Reads/Writes = Geographic Synchronous Disk Replication
• Additional Hardware and Disk Capacity Required
• e.g. IBM High Availability Geographic Cluster (HAGEO), Sun Cluster Geographic Edition
• Expectation of Continuous Availability and Transparent Failover
o Requires Sharing Application State
• Expectation Seldom Realized
• Outage of One Data Center, Stops Disk Writes in Both, No Longer “Transparent”
• Synchronous Disk Replication Limits Geographic Separation
• Hardest and Costliest to Achieve
Note: Disk Replication only employed for Application Data and Application State, WAS-ND cell configuration, software
updates, and application maintenance should maintained independently in order to insure isolation (and availability)
Multiple Data Center Options (4/4)
• Hybrid “Active/Active” (Partitioned by Applications)
• Two Data Centers
• Independent Cells with replicated Resource Managers
• Both DC’s Serving Requests, Both DC’s Configured for All Applications
• Running Different Applications (With Different Application Data)
– New Application Tests
– One DC Performing Updates, One DC Performing Inquiry Only (e.g. data
warehouse)
• No Shared Application State, No Shared Application Data
– Asynchronous Replication Sufficient
• Global Network Switch Used to Partition/Distribute Traffic
• In the Event of a Disaster
• Users failover from one DC to the other
• Likely Some Interruption
– As Data Replica is Promoted to Primary
– During Failover Workload Startup
• Provides Most of the Benefits of “Classic Active/Active” without the Cost and
Complexity
The CAP Theorem
• In a distributed environment, especially spanning data centers
across LANs and WANs there are three core requirements for a
service:
• Consistency
– Either the service works or fails
– Traditional ACID of databases provides consistency and isolation
• Availability
– Extremely important in web business model
– In a large distributed system, one may have to compromise with
consistency for the sake of availability
• Partition Tolerance
– Network partition will happen when not all machines are connected
– “No set of failures less than the total network failure is allowed to
cause the system to respond incorrectly” – Seth and Lynch
– Quorum is used to guard against split brain syndrome
• Brewer’s CAP conjecture states that
• One can achieve only two not all three of the above mentioned
requirements
http://en.wikipedia.org/wiki/CAP_theorem
18
Multiple Active Data Centers and the CAP Theorem
• Active/Active requires you to sacrifice either consistency, availability or
partition tolerance.
• All three aren’t possible
• If you choose full availability, then you are going to lose guaranteed
consistency.
• So you need to design with this in mind, and build in mechanisms
(typically involving queuing technologies) that enable your system to
"tend towards“ consistency.
• Your data is going to be in two places, either partitioned or replicated.
• If the former, what happens when one site is down?
• If the latter, what happens when users hitting each site see slightly
different versions of the current state?
• These are very complex problems.
• Which is why I try to steer customers away from active/active and into
an active/passive model with DR from active to passive.
• But they always feel like they are wasting hardware……………!
19
Data Center Utilization Urban Legends
• Legend
• Active/Active Improves Utilization
• Reality
• An Active/Active Topology at 40-50% Utilization in Each DC Is
Equivalent to An Active/Passive Datacenter Deployment with One
Active at 80% to 90 % Utilization and the Other Passive
• Running Active/Active at Greater Than 50% Of Total (both
Datacenters) Capacity Can Often Result in a Complete Loss of Service
When a Data Center Outage Occurs
o Insufficient Capacity in Remaining Data Center to Handle > 100% Capacity
Results in
• Poor Response Time (at best)
• Network and Server Overload, Resulting in a Complete Crash
Active/Active - What’s Wrong With This Picture ?
A former employer of mine had two data centers, running active/active
at two facilities approximately 2.6 miles (or 4.2 KM) apart.
• Close Proximity Addressed Data Consistency Concerns ……..But………
What Happens When ?
• There’s an earthquake
• There’s a Civil Insurrection
• A Hazardous Chemical Spill Occurs
• And The Wind Is Blowing the Chemical Cloud from West to East (or vice versa)
• Your DC May Not Be Located in a Locale Prone to Earthquakes
• But what about the other catastrophes ???
• They can, *and* will happen !!
• There’s No Substitute For Isolation Between Data Centers
• Data Centers Should Be Sufficiently Distant So That a Single Event Doesn’t
Impact Both !!
• This Likely Mandates Asynchronous Replication
• Active/Active No Longer Practical
Network Latency and Application Data Consistency – A 3rd
Party Perspective
• Since the latency or round trip time for a network is usually correlated to the
length of the network, or the physical distance between the two end points (in
this case the primary and standby), Maximum Protection and Maximum
Availability modes are not recommended for Data Guard deployments over a
Wide Area Network (WAN). Note that this recommendation is driven by the
laws of physics (speed of light limitation) - the greater the distance of a
network, the longer it will take for data packets to traverse the network, and
hence the longer it will take for primary database transactions to commit.
• http://www.oracle.com/technology/deploy/availability/htdocs/dataguardnetwork.htm
Multiple Cells and Data Centers
• Your Network Team Assures You That Can (or Have) Constructed a Network Link
Between Data Centers
• For Arguments Sake, We’ll I Agree, It Is possible to construct a network so that latency is
NOT an issue Under Normal Conditions
• Even so, WANs are Less Reliable than LANs.
o And Much Harder To Fix !
• But You’re Missing The Point !
• Network Interdependency Between Data Centers Means That the Data Centers are Not
Independent
• Question
• Do You Want to Have to Explain to Your CIO Why A Problem In One Data
Center Impacted The Other and Resulted in a Outage Because You Didn’t
Have Cells Aligned to Data Center Boundaries ?
How Do I Recover WebSphere Application Server ?
• File System or OS Backup and Recovery
• Disk or Tape
• WAS backUpConfig/restoreConfig
– WAS_PROFILE/properties, ../etc, WAS_ROOT/java/jre/lib/*properties,
WAS_ROOT/java/jre/lib/security,
• Build from Scratch
o Only a Realistic Option with Complete Set of Scripts and Rigorous Change Control
• Best Options
• File System/OS Backup & Recovery
• backupConfig/restoreConfig for Deployment Manager
– WAS V8.0 and above, addNode –asExistingNode can Reconfigure Each Note After
restoreConfig of Deployment Manager Configuration
• Both From Last Know Working Production Configuration
• Otherwise No Assurance Recovery Will Succeed
• Same Concern with “Build From Scratch”
‒ If Using Virtualization Consider VM Cloning for Install and Configuration
‒ Consider Smart Cloud Orchestrator for Automated Install and Configuration
o Provision Both Primary Site and DR Site in a Consistent Manner with SCO
o Note: Don’t Deploy to Backup to DR Site over a WAN !
• May Need to Change Cell and Host Names
• Will the Original Data Center Be Restored, Or Is it Gone (for Good)?
WAS Full Profile DR Recovery
• Transaction Recovery on Separate (Physical) Server
• Access the Transaction Logs
– Move/Mount the Transaction Logs to Physical Server Hosting Application Server with
Access to Same Resources (e.g. JDBC, JMS)
– V8.x Optional Use of DB for Transaction logs
• If Recovery Occurs in Different Cell use wsadmin to Configure the Same JAAS
Alias for Accessing XA Resources
– With adminconsole the node name gets prefixed to the alias.
• No Longer Required to Have Same Hostname and IP Address
– Different IP’s with Multiple Host Alias’s Typical
http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tjta_mvelog.html
• WAS Messaging Engine
• recoverMEConfig AdminTask Command Retrieves MEUUID From Persistent
Message Store and Updates Message Engine Configuration
• Allows Recovery of Stranded Messages After Catastrophic ME Failure in WAS
V8.5.0 and above
• http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rjk_recoverme_config.html
• Some Small Capacity in DR Site Needs to be set Aside for Recovery
• In Addition to Production Workload
• Or Production Workload Not Processed Until Recovery is Complete
WAS Liberty Profile DR Recovery
• Transaction Recovery on Separate (Physical) Server
• Access the Transaction Logs
– Move/Mount the Transaction Logs to Physical Server Hosting Application Server with
Access to Same Resources (e.g. JDBC, JMS)
– V8.x Optional Use of DB for Transaction logs (Not supported for Production)
• Restore server.xml Backup (e.g. zip)
• No Longer Required to Have Same Hostname and IP Address
– Different IP’s with Multiple Host Alias’s Typical
• WAS Messaging Engine
• Restore server.xml Backup (e.g zip)
• Point to Copy of Messages on File System (Could also employ zip/unzip)
• Some Small Capacity in DR Site Needs to be set Aside for Recovery
• In Addition to Production Workload
• Or Production Workload Not Processed Until Recovery is Complete
Classic DR for Stateful Applications: full cell replication
28
28
IP Sprayer
Node Agt
Node1
Msg.mem1Messaging
Web Server
User Registry
DMgr
AppTarget
Support
DMgr
IHS
IHS
App.mem1
Sup.mem1
Node Agt
Node2
App.mem2
Sup.mem2
Filesystem (NFS)
Node Agt
Node1
Msg.mem1
DMgr
IHS
IHS
App.mem1
Sup.mem1
Node Agt
Node2
App.mem2
Sup.mem2
WAS Txn Logs WAS Txn Logs
Consistency Group
SAN Replication
for Application Data
File Copy
for Install & Config Data
Database
Storage (SAN)
Primary Datacenter Secondary Datacenter
Msg.mem2 Msg.mem2
A P A P
Filesystem (NFS)
User Registry
Web Server
Database
Storage (SAN)
Consistency Group
IP Sprayer
Messaging
AppTarget
Support
DMgr
DR via Stray Nodes & Database Managed replication
29
29
IP Sprayer
Node Agt
Node1
Messaging
Web Server
DMgr
AppTarget
Support
IHS
IHS
App.mem1
Sup.mem1
Node Agt
Node2
App.mem2
Sup.mem2
IP Sprayer
Node3
DMgr
IHS
IHS
App.mem3
Sup.mem3
Node4
App.mem4
Sup.mem4
WAS Txn Logs WAS Txn Logs
User Registry
Primary Datacenter Secondary Datacenter
Database Database
DB-managed
Replication for
Application Data
Msg.mem1 Msg.mem2
A P
Msg.mem3 Msg.mem4
A P
Node AgtNode Agt
User Registry
Web Server
IBM BPM: Licensing Guidance for HA/DR Configurations
Configuration
What’s active?
What licenses are needed for the backup nodes?
DB2
WAS
ND
BPM
“Classic” Disaster Recovery
(DR): SAN-based replication
off off off
• Files in the backup data center are being synchronized automatically by a
SAN. But there is no DB2, WAS ND, or BPM program active.
• No extra DB2, WAS ND, or BPM licenses needed for backup nodes.
DR configuration with OS
replication for config data & DB2
replication for runtime data
ON off off
• Active DB2 HADR Standby setup considered warm standby – licenses for
100 DB2 PVUs required to cover warm standby servers
• BPM and WAS ND are inactive – no extra WAS ND or BPM licenses
needed for backup nodes.
DR configuration with WAS ND
replication for config data &
DB2 replication for runtime data
ON ON off
• Active DB2 HADR Standby setup considered warm standby – licenses for
100 DB2 PVUs required to cover warm standby servers
• WebSphere process in the backup nodes used for synchronization – WAS
ND licenses are required.
• BPM is inactive* – no extra BPM licenses needed for backup nodes.
High Availability (HA) ON ON ON
• IBM BPM is active in all nodes – full BPM licenses are required.
• WAS ND and DB2 licensing based on Supporting Programs terms of BPM
Note: For any HA or DR configuration, any node(s) running DMGR only will also require a WAS ND license.
(*Inactive: BPM server JVMs in the remote datacenter are not started)
REFERENCES
• IBM Program License Agreement licensing for Backup Use: This document explains that extra licenses are not required for cold or warm backup
nodes. However, if there is a program actively “doing work” to keep the backup node synchronized with the primary site, then that program must
be licensed. E.g., when the DB2 and WAS ND node agents are actively “doing work” (replication), they must be licensed, but if IBM BPM
services are not active, no BPM licenses are required in backup nodes.
• DeveloperWorks article on “Stray Node” DR Configuration: This article describes a “better” Stray Node DR configuration. It is different from a
Classic DR configuration in that it keeps the WAS ND environment up-to-date as well as the DB2 environment, in order to reduce the server
recovery time after a disaster. In this “better” Stray Node DR configuration, WAS ND node agents are active, but IBM BPM is not active.
•DeveloperWorks article on licensing DB2 10.1 servers in a HA environment
What Your Mother Didn’t Tell You
About Disaster Recovery
• Transaction Log replication & Network configuration requirements
• Historically, the WebSphere transaction service required IP Address & Hostname at the target server match the
source
• This is because, for some types of transactions, the server network information is written into the logs
themselves
• As of WAS v8 servers, this requirement is relaxed a bit – IP addresses no longer need to match
• High Availability and Disaster Recovery for the Deployment Manager
• Techniques are available that leverage hardware clustering or replication of the DMgr’s cell configuration to an
alternate server. These are described in detail at:
https://www.ibm.com/developerworks/websphere/techjournal/1001_webcon/1001_webcon.html
• Beginning in version 8.5.5 WebSphere supports High Availability features for the Deployment Manager, using a
shared filesystem. WAS installations (including BPM) running on applicable versions can leverage this feature:
http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.doc/ae/twve_xdsoconfig.html
• Because these HA techniques rely on DMgr replication, they apply unchanged to DR scenarios
– Generally, we recommend recovering application servers before bringing up a replacement DMgr
• A note about logical corruption – data integrity problems that get replicated to the DR environment
• We recommend using storage system tooling (for example, FlashCopy) to periodically copy the system state
• This can be done at the replica, to avoid interfering with normal operations
• If the Primary data and its replica are both corrupted, then state can be restored to a copy made before the
corruption
• For DR purposes, why can’t I just make one WebSphere ND cell with members running in both of my datacenters? That
way, if one datacenter is lost, another can carry the load
• See ‘Active/Active Antipattern’ discussion on the following slides
31
Active/Active anti-pattern (Cells Spanning Data Centers)
Secondary DatacenterPrimary Datacenter
32
IP Sprayer
Node Agt
Node1
Msg.mem1Messaging
Web Server
User Registry
DMgr
AppTarget
Support
IHS
IHS
App.mem1
Sup.mem1
Node Agt
Node2
App.mem2
Sup.mem2
Node Agt
Node3
DMgr
App.mem3
Sup.mem3
Node Agt
Node4
App.mem4
Sup.mem4
Msg.mem2 Msg.mem3 Msg.mem4
A P P P
IP Sprayer
Web Server
IHS
IHS
Filesystem (NFS)
WAS Logs
Consistency Group Consistency Group
SAN Replication
for Application Data
Database
Storage (SAN) Storage (SAN)
User Registry
Filesystem (NFS)
Database
Why is this type of topology considered an Anti-Pattern?
• Active/Active approaches introduce new complexities that undermine the stability of the system
• Issues/Problems Can Propagate From One DC to the Other
• This Compromises Redundancy and Resiliency
• Worst Case a Outage Cascades Across Both Data Centers
• Frequently these negate the advantages that led the customer to consider the approach in the first place
• Increased risk of network instability can lead to partitioned network (‘split brain’)
– Independent Transaction “Recovery” in Both Data Centers By HA Manager
– The two data centers could move to inconsistent transactional states!!
• Increased network latency can limit system performance during normal operations
– Latency between the Application Server and its databases
– Latency among cluster members communicating via the WebSphere HA Manager
component
• Desire to automate failover increases risk of false failover & rapid cycling
• A system more than 50% utilized introduces the risk that losing a single component will
compromise the entire system, turning what could have been a (simple) HA event into a true
disaster
• In practice, many Active/Active topologies do not deliver Disaster Recovery capability at all:
• Attempts to limit latency lead to datacenters physically near each other, increasing the risk that
a single disaster will eliminate the entire system
• Many disasters arise from human error and data corruption. Tight coupling between DR
resources does not provide protection from this type of failure at all
• A WAS-ND Cell Spanning Data Centers will actually interfere with Zero RTO
• Refer to
• http://www.ibm.com/developerworks/websphere/techjournal/0606_col_alcott/0606_col_alcott.html#sec1d
• http://www.ibm.com/developerworks/websphere/techjournal/1004_webcon/1004_webcon.html
33
What are the recommended alternatives?
• Properly plan a High Availability solution distinct from Disaster Recovery
• Eliminate single points of failure through redundancy in network and software components
• HA features allow rapid and automatic recovery from loss of a single component. Utilize them!
• Improve RTO by reducing complexity, scripting operational procedures and drill
• Automate Processes for Repeatability and Consistency
– Scripting
– Point and Click” is Not Repeatable
• Discipline and Practice are Essential
• Well Defined Procedures for Every Contingency
– You Do Not Want to Learn During an Outage
– Practice Those Procedures
– Won’t Make Mistakes in Crisis
– Validates that Procedures Actually Work
– Practice Backup and Recovery, System Failures, Disaster Recovery, etc.
• Goal: Make Daily Operations Boring
• Improve electricity distribution via Uninterruptable Power Supply
• Utilize application design patterns like loose coupling in order to improve application flexibility
• In cases where RTO between 1 and 4 hours is necessary, without the requirement to process new work,
consider the Stray Node pattern
34
Is This Different for the Liberty Profile and Liberty Collectives?
• No
• Same Fundamentals for Effective Redundancy and the
Requirements for Isolation and Independence Apply
• Though Liberty May Make It Easier to Ignore or Believe That
the Fundamentals Don’t Apply
35
Agenda
• Concepts
• Disaster Recovery
• Multiple Cells and Data Centers
• WebSphere Application Server Recovery
• IBM BPM Recovery
• Final Thoughts
Disaster Recovery
• Develop a Disaster Recovery Plan
• Group Business Needs and Associated Applications into Tiers
• Group into tiers based on the hard/soft dollar impact on the
organization
• Categorize by RPO and RTO.
• The top tier likely includes zero data loss and either no downtime or
perhaps just a few minutes of down time
• Subsequent tiers have an RTO of 24 hours, then 48 to 72 hours,
then perhaps 72 to 96 hour
• Essential Part of Any Plan
• Who approves DR move/recovery ?
• Automated site failover is a bad idea
o Typically triggering DR is very expensive
o You do not want to trigger a DR by accident because of some transient issue – just makes
the situation worse
Disaster Recovery Objectives
• Recovery Time Objective
• How quickly the system will be able to accept traffic after the disaster
• Shorter times require progressively more expensive techniques
o e.g., a tape backup and restore is relatively inexpensive
o e.g., a fully redundant fully operational data center is very expensive
• One challenge is detection time
• It takes time to determine you are in a disaster state and trigger
disaster procedures
o While you are deciding if you are down, you are probably missing your SLA.
o Does the RTO include detection time?
Disaster Recovery Objectives
• Recovery Point Objective
• How much data you are willing to lose when there is a disaster
• Limiting data loss raises costs
o e.g., restoring from tape is relatively inexpensive but you'll lose everything
since the last backup
o e.g., asynchronous replication of data and system state requires significant
network bandwidth to prevent falling far behind
o e.g., synchronous replication to the backup data center guarantees no data
loss but requires VERY fast and reliable network and will significantly harm
performance
• Warning: results in increased latency which means capacity must be increased at
all layers
Disaster Recovery Objectives
• Most RTO and RPO goals will deeply impact application and infrastructure
architecture and can't be done “after the fact”
• e.g., if data is shared across data centers, your database and application
design will have to be careful to avoid conflicting database updates and/or
tolerate them
• e.g., application upgrades have to account for multiple versions of the
application running at once which can affect user interface design, database
layout, etc
• Extreme RTO and RPO goals tend to conflict
• e.g., using synchronous disk replication of data gives you a zero RPO but that
means the second system can't be operational, which raises RTO
• Trying to Achieve a Zero RTO *and* a Zero RPO is Mutually Exclusive
Disaster Recovery Testing
• The DR hardware Should Be Put Into Actual Production Usage
• Otherwise How Can You Be Sure It Will Work When You REALLY Need It.
• A Corollary of Murphy’s Law
• The larger the numbers, the less likely all Tier 1 machines can be successfully
restored to operations.
• DR Testing Options
• “Saturday Afternoon Surprise”
– Unannounced DR Test
– Only If You Can Tolerate an Outage
• Progressively More Realistic and Complex Tests
– Startup of Remote Infrastructure
– Remote Startup with Simulated Workload
– Remote Startup with Production Workload Shift
• Other Issues In a Real Disaster
• Will your key staff want to travel?
• Will they be able to travel?
41
Example and (Very High Level) DR Plan
• Executive/Management Approval for Activation of DR
• Isolate Data Centers
• Halt Incoming Network Traffic
– Static “Temporarily Unavailable Web Page”
• Break Disk Synchronization
• Sever Network Links Between Data Centers
• Start and Recovery of Surviving Center
• Restore/Recovery Hardware and Middleware
• Start DB, Messaging and Application Servers
• Examine DB and Message provider logs for pending transactions
• Recover Pending transactions and messages
• Start Accepting New Work in Surviving Data Center
• Enable Network
42
Other Aspects to Consider
• An HA, CA or DR Deployment Architecture is Not a Product Feature.
• WAS and the WebSphere portfolio products
– Provide HA Features and Function
– Can Be Employed in an HA Architecture
– The Appropriate Environment Varies by Customer
– One Size Does Not Fit All !
• Optimizing WebSphere HA Capabilities into a Robust Deployment
• Requires In-depth Understanding
– Of Environment
– Of Applications
– Of Operational Requirements (Service Levels)
• Architectural Advice May Require ISSW Assistance
43
Learn from Your Mistakes
• Mistakes and failures will occur, learn from them
• What separates mediocre organizations from the good and great isn't so
much perfection as it is the constant striving to get better – to not repeat
mistakes
• After every outage perform
• Root cause analysis
– Capture diagnostic information
– Meet as a team including all key players to discuss
– Determine precisely what went wrong
• Wrong doesn't mean “Bob made an error.”
• Find the process flaw that led to the problem
– Determine a corrective action that will prevent this from happening again
• If you can't, determine what diagnostic information is needed next time this happens and ensure it is collected
– Implement that corrective action
• All too often this last step isn't done
• Verify that action corrected problem
• A senior manager must own this process
Indispensable When Planning for Catastrophe
• Think !
Questions?
Notices and Disclaimers
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or
transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been
reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM
shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY,
EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF
THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT
OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the
agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without
notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are
presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual
performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products,
programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not
necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither
intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal
counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s
business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or
represent or warrant that its services or products will ensure that the customer is in compliance with any law.
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products in connection with this
publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any
IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Bluemix, Blueworks Live, CICS, Clearcase, DOORS®, Enterprise Document
Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand,
ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™,
PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, SoDA, SPSS, StoredIQ, Tivoli®, Trusteer®,
urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on
the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
Thank You
Your Feedback is
Important!
Access the InterConnect 2015
Conference CONNECT Attendee
Portal to complete your session
surveys from your smartphone,
laptop or conference kiosk.
Backup Slides
51
Shameless Self Promotion
IBM WebSphere Deployment and Advanced Configuration
By Roland Barcia, Bill Hines, Tom Alcott and Keys Botzum
ISBN: 0131468626
52
Another Recommended Book
IBM WebSphere v5.0 System Administration
By Leigh Williamson, Lavena Chan,Roger Cundiff, Shawn Lauzon and Christopher
C. Mitchell
ISBN: 0131446045
Licensing Servers as Back Up Servers
From IBM Contracts and Practices Database
• The policy is to Charge for HOT, and not for WARM or COLD back ups. The following are definitions of what constitutes HOT-
WARM-COLD backups:
• All programs running in backup mode must be under the customer's control, even if running at another enterprise's location.
• COLD - a copy of the program may be stored for backup purpose machine as long as the program has not been started.
• There is no charge for this copy.
• WARM - a copy of the program may reside for backup purposes on a machine and is started, but is "idling", and is not doing any
work of any kind.
• There is no charge for this copy.
• HOT - a copy of the program may reside for backup purposes on a machine, is started and is doing work. However, this
program must be ordered.
• There is a charge for this copy.
• "Doing Work", includes, for example, production, development, program maintenance, and testing. It also could include other
activities such as mirroring of transactions, updating of files, synchronization of programs, data or other resources (e.g. active
linking with another machine, program, data base or other resource, etc.) or any activity or configurability that would allow an
active hot-switch or other synchronized switch-over between programs, data bases, or other resources to occur
Refer to http://www-03.ibm.com/software/sla/sladb.nsf/pdf/policies/$file/Feb-2003-IPLA-backup.pdf for more information
53

More Related Content

What's hot

Introduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config publicIntroduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config publicPetchpaitoon Krungwong
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWSAWS Germany
 
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018Amazon Web Services
 
IBM WebSphere Application Server version to version comparison
IBM WebSphere Application Server version to version comparisonIBM WebSphere Application Server version to version comparison
IBM WebSphere Application Server version to version comparisonejlp12
 
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...Edureka!
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practicesDmitry Anoshin
 
Client Server Architecture
Client Server ArchitectureClient Server Architecture
Client Server ArchitectureAshir Mubeen
 
Domain Driven Design & Hexagonal Architecture
Domain Driven Design & Hexagonal ArchitectureDomain Driven Design & Hexagonal Architecture
Domain Driven Design & Hexagonal ArchitectureCan Pekdemir
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Hosting a website on IIS Server
Hosting a website on IIS ServerHosting a website on IIS Server
Hosting a website on IIS ServerDinesh Vasamshetty
 
Server Side Technologies
Server Side TechnologiesServer Side Technologies
Server Side Technologiestawi123
 
Spring framework-tutorial
Spring framework-tutorialSpring framework-tutorial
Spring framework-tutorialvinayiqbusiness
 

What's hot (20)

Introduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config publicIntroduction to ibm cloud paks concept license and minimum config public
Introduction to ibm cloud paks concept license and minimum config public
 
Web application
Web applicationWeb application
Web application
 
Restful web services ppt
Restful web services pptRestful web services ppt
Restful web services ppt
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018
 
IBM WebSphere Application Server version to version comparison
IBM WebSphere Application Server version to version comparisonIBM WebSphere Application Server version to version comparison
IBM WebSphere Application Server version to version comparison
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
What is AWS | AWS Certified Solutions Architect | AWS Tutorial | AWS Training...
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
 
Client Server Architecture
Client Server ArchitectureClient Server Architecture
Client Server Architecture
 
Introduction to virtualization
Introduction to virtualizationIntroduction to virtualization
Introduction to virtualization
 
Domain Driven Design & Hexagonal Architecture
Domain Driven Design & Hexagonal ArchitectureDomain Driven Design & Hexagonal Architecture
Domain Driven Design & Hexagonal Architecture
 
Chef Cookbook Workflow
Chef Cookbook WorkflowChef Cookbook Workflow
Chef Cookbook Workflow
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Hosting a website on IIS Server
Hosting a website on IIS ServerHosting a website on IIS Server
Hosting a website on IIS Server
 
Next.js - ReactPlayIO.pptx
Next.js - ReactPlayIO.pptxNext.js - ReactPlayIO.pptx
Next.js - ReactPlayIO.pptx
 
Server Side Technologies
Server Side TechnologiesServer Side Technologies
Server Side Technologies
 
IIS
IISIIS
IIS
 
Ajax Ppt 1
Ajax Ppt 1Ajax Ppt 1
Ajax Ppt 1
 
Spring framework-tutorial
Spring framework-tutorialSpring framework-tutorial
Spring framework-tutorial
 

Viewers also liked

AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileWASdev Community
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesBrian Petrini
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...WASdev Community
 
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...WASdev Community
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryMarkTaylorIBM
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDale McInnis
 

Viewers also liked (10)

AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...
AAI-2016 WebSphere Application Server Installation and Maintenance in the Ent...
 
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
AAI-2013 Preparing to Fail: Practical WebSphere Application Server High Avail...
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Cartea Dulgherului
Cartea DulgheruluiCartea Dulgherului
Cartea Dulgherului
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple Standby
 
IBM BPM Overview
IBM BPM OverviewIBM BPM Overview
IBM BPM Overview
 
What is BPM?
What is BPM?What is BPM?
What is BPM?
 

Similar to Planning For Catastrophe with IBM WAS and IBM BPM

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013Lostar
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITRightScale
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recoverySqlperfomance
 
BIWUG1303 - HA & DR
BIWUG1303 - HA & DRBIWUG1303 - HA & DR
BIWUG1303 - HA & DRBIWUG
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Vikas Sahni
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications irelandMichael Meagher
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01FaisalMashood
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdfhania80
 
Dr and ha solutions with sql server azure
Dr and ha solutions with sql server azureDr and ha solutions with sql server azure
Dr and ha solutions with sql server azureMSDEVMTL
 
E2 evc 3-2-1-rule - mikeresseler
E2 evc   3-2-1-rule - mikeresselerE2 evc   3-2-1-rule - mikeresseler
E2 evc 3-2-1-rule - mikeresselerMike Resseler
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Andrew Miller
 

Similar to Planning For Catastrophe with IBM WAS and IBM BPM (20)

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013
Cloud Based Business Continuity - Murat Lostar @ ISACA EUROCACS 2013
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
BIWUG1303 - HA & DR
BIWUG1303 - HA & DRBIWUG1303 - HA & DR
BIWUG1303 - HA & DR
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Unit 2 oracle9i
Unit 2  oracle9i Unit 2  oracle9i
Unit 2 oracle9i
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Database Management System - 2a
Database Management System - 2aDatabase Management System - 2a
Database Management System - 2a
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications ireland
 
Database Administration & Management - 01
Database Administration & Management - 01Database Administration & Management - 01
Database Administration & Management - 01
 
DBAM-01.pdf
DBAM-01.pdfDBAM-01.pdf
DBAM-01.pdf
 
Dr and ha solutions with sql server azure
Dr and ha solutions with sql server azureDr and ha solutions with sql server azure
Dr and ha solutions with sql server azure
 
E2 evc 3-2-1-rule - mikeresseler
E2 evc   3-2-1-rule - mikeresselerE2 evc   3-2-1-rule - mikeresseler
E2 evc 3-2-1-rule - mikeresseler
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
 

More from WASdev Community

Introduction to WebSockets
Introduction to WebSocketsIntroduction to WebSockets
Introduction to WebSocketsWASdev Community
 
Don't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 InsteadDon't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 InsteadWASdev Community
 
WebSphere App Server vs JBoss vs WebLogic vs Tomcat
WebSphere App Server vs JBoss vs WebLogic vs TomcatWebSphere App Server vs JBoss vs WebLogic vs Tomcat
WebSphere App Server vs JBoss vs WebLogic vs TomcatWASdev Community
 
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...WASdev Community
 
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the Cloud
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the CloudAAI-1445 Managing Dynamic Workloads with WebSphere ND and in the Cloud
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the CloudWASdev Community
 
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...WASdev Community
 
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...WASdev Community
 
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphere
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphereAAI-2236 Using the new Java Concurrency Utilities with IBM WebSphere
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphereWASdev Community
 
AAI-2235 Open JPA and EclipseLink Usage Scenarios Explained
AAI-2235 Open JPA and EclipseLink Usage Scenarios ExplainedAAI-2235 Open JPA and EclipseLink Usage Scenarios Explained
AAI-2235 Open JPA and EclipseLink Usage Scenarios ExplainedWASdev Community
 
AAI-1713 Introduction to Java EE 7
AAI-1713 Introduction to Java EE 7AAI-1713 Introduction to Java EE 7
AAI-1713 Introduction to Java EE 7WASdev Community
 
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterDeploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterWASdev Community
 
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing Workloa
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing WorkloaAAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing Workloa
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing WorkloaWASdev Community
 
AAI-1305 Choosing WebSphere Liberty for Java EE Deployments
AAI-1305 Choosing WebSphere Liberty for Java EE DeploymentsAAI-1305 Choosing WebSphere Liberty for Java EE Deployments
AAI-1305 Choosing WebSphere Liberty for Java EE DeploymentsWASdev Community
 
AAI-1304 Technical Deep-Dive into IBM WebSphere Liberty
AAI-1304 Technical Deep-Dive into IBM WebSphere LibertyAAI-1304 Technical Deep-Dive into IBM WebSphere Liberty
AAI-1304 Technical Deep-Dive into IBM WebSphere LibertyWASdev Community
 
Arduinos, application servers, and me: Adventures in and out of the cloud
Arduinos, application servers, and me: Adventures in and out of the cloudArduinos, application servers, and me: Adventures in and out of the cloud
Arduinos, application servers, and me: Adventures in and out of the cloudWASdev Community
 

More from WASdev Community (17)

Liberty Deep Dive
Liberty Deep DiveLiberty Deep Dive
Liberty Deep Dive
 
Introduction to WebSockets
Introduction to WebSocketsIntroduction to WebSockets
Introduction to WebSockets
 
Don't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 InsteadDon't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 Instead
 
Liberty management
Liberty managementLiberty management
Liberty management
 
WebSphere App Server vs JBoss vs WebLogic vs Tomcat
WebSphere App Server vs JBoss vs WebLogic vs TomcatWebSphere App Server vs JBoss vs WebLogic vs Tomcat
WebSphere App Server vs JBoss vs WebLogic vs Tomcat
 
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
AAI-3281 Smarter Production with WebSphere Application Server ND Intelligent ...
 
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the Cloud
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the CloudAAI-1445 Managing Dynamic Workloads with WebSphere ND and in the Cloud
AAI-1445 Managing Dynamic Workloads with WebSphere ND and in the Cloud
 
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...
ASZ-3034 Build a WebSphere Linux Cloud on System z: From Roll-Your-Own to Pre...
 
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
 
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphere
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphereAAI-2236 Using the new Java Concurrency Utilities with IBM WebSphere
AAI-2236 Using the new Java Concurrency Utilities with IBM WebSphere
 
AAI-2235 Open JPA and EclipseLink Usage Scenarios Explained
AAI-2235 Open JPA and EclipseLink Usage Scenarios ExplainedAAI-2235 Open JPA and EclipseLink Usage Scenarios Explained
AAI-2235 Open JPA and EclipseLink Usage Scenarios Explained
 
AAI-1713 Introduction to Java EE 7
AAI-1713 Introduction to Java EE 7AAI-1713 Introduction to Java EE 7
AAI-1713 Introduction to Java EE 7
 
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin CenterDeploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
Deploy, Monitor and Manage in Style with WebSphere Liberty Admin Center
 
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing Workloa
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing WorkloaAAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing Workloa
AAI-2075 Evolving an IBM WebSphere Topology to Manage a Changing Workloa
 
AAI-1305 Choosing WebSphere Liberty for Java EE Deployments
AAI-1305 Choosing WebSphere Liberty for Java EE DeploymentsAAI-1305 Choosing WebSphere Liberty for Java EE Deployments
AAI-1305 Choosing WebSphere Liberty for Java EE Deployments
 
AAI-1304 Technical Deep-Dive into IBM WebSphere Liberty
AAI-1304 Technical Deep-Dive into IBM WebSphere LibertyAAI-1304 Technical Deep-Dive into IBM WebSphere Liberty
AAI-1304 Technical Deep-Dive into IBM WebSphere Liberty
 
Arduinos, application servers, and me: Adventures in and out of the cloud
Arduinos, application servers, and me: Adventures in and out of the cloudArduinos, application servers, and me: Adventures in and out of the cloud
Arduinos, application servers, and me: Adventures in and out of the cloud
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Planning For Catastrophe with IBM WAS and IBM BPM

  • 1. © 2015 IBM Corporation Planning for Catastrophe with IBM WebSphere Application Server & IBM Business Process Manager Tom Alcott STSM Chris Richardson STSM
  • 2. This Session • This session will focus on the architectural and operational issues that need to be considered when planning and implementing a Disaster Recovery plan with WebSphere Application Server and IBM BPM. Topics will include use of multiple data centers, geographic separation constraints, supporting software components, disaster recovery and other common deployment issues. Though focused primarily on WebSphere Application Server and IBM BPM this session also applies to IBM Middleware that is deployed on WebSphere Application Server as well as Pure Application System. • While not a prerequisite, attendees should be familiar with the material covered in " Preparing to Fail, Practical WebSphere Application Server High Availability”
  • 3. Introduction • Why Are We Here? • To Avoid This
  • 4. Agenda • Concepts • Disaster Recovery • Multiple Cells and Data Centers • WebSphere Application Server Recovery • IBM BPM Recovery • Final Thoughts
  • 5. Definitions • Redundancy • The provision of additional or duplicate systems, equipment, etc., that function in case an operating part or system fails, as in a spacecraft. • Isolated • Separated from other persons or things; alone; solitary • Independent • Not dependent; not depending or contingent upon something else for existence, operation, etc. • All of the Above are Fundamental for Effective High Availability and Disaster Recovery
  • 6. Definitions High Availability (HA) • Ensuring that the system can continue to process work within one location after routine single component failures • Usually we assume a single failure • Usually the goal is very brief disruptions for only some users for unplanned events Continuous Operations • Ensuring that the system is never unavailable during planned activities • E.g., if the application is upgraded to a new version, we do it in a way that avoids all downtime
  • 7. Continuous Availability (CA) • High Availability coupled with Continuous Operations • No tolerance for planned downtime • Little unplanned downtime as possible • Very expensive • Note that while achieving CA almost always requires an aggressive DR plan, they are not the same thing Definitions
  • 8. Background: High Availability in one picture 7 IP Sprayer Node Agt Node1 Server 1Cluster “A” Web Server User Registry DMgr IHS IHS Server 1 Server 2 Node Agt Node2 Server 2 Server 2 Shared Filesystem WAS Txn Logs Database Storage (SAN) Server 2 Cluster “B” Cluster “C” • Clustered IP Sprayer and Firewalls (not depicted) • Clustered HTTP Servers • WAS-ND Cell with Clustered Application Servers • User Registry (LDAP) Hardware Clustered with Shared Disk • Database Hardware Clustered With Shared Disk • JMS Provider (Not Depicted) • WAS Messaging Engine with Shared Disks/DB • External JMS with Hardware Cluster and Shared Disk • Transaction Logs on Shared File System • Clusters of “2” Provide High Availability, • Don’t Forget “Rule of 3”, • When Using With Clusters of 2 • An Outage (Planned or Unplanned) Reduces Capacity by 50% • Is No Longer Fault Tolerant
  • 9. Disaster Recovery (DR) • Ensuring that the system can be reconstituted and/or activated at another location and can process work after an unexpected catastrophic failure at one location • Often multiple single failures (which are normally handled by high availability techniques) is considered catastrophic • There may or may not be significant downtime as part of a disaster recovery • This environment may be substantially smaller than the entire production environment, as only a subset of production applications demand DR • Normally based on justifiable business need. • Recovery Time Objective (RTO) • Service Recovery with little to no interruption • Recovery Point Objective (RPO) • Data Recovery and acceptable data loss Definitions
  • 10. • Service Levels (SLAs ) cover many things, our focus is availability aspects • You need a clear set of requirements that define precisely the availability requirements of the system, taking into account • Components of the system  A system has many pieces and business aspects, how do their requirements differ? ‒ Responsiveness and throughput requirements  100% of requests aren't going to work perfectly 100% of the time • Degraded services requirements  Does everything have to meet the responsiveness requirements ALL the time? • Dependent system requirements  What are the implications if a system on which you depend is down? • Data Loss • Application Data • Application State (Is this Critical in a Disaster?) • Maintenance  Change occurs, how does that affect availability? • Disaster Recovery  The unimaginable happens, then what? Definitions
  • 11. HA Service Level Example SLA External Commitment SLA Internal Target Service Timeframe 7 x 24 7 x 24 Application Processing Availability 99.5% per month 99.7% per month Recovery Time Objective 4 Hours 1 Hour Maintenance Window Tue-Thurs 3:00 - 6:00 am Tue-Thurs 3:00 - 6:00 am 99.5% = 3.60 Hours Downtime/Month 99.7% = 2.16 Hours Downtime/Month
  • 12. DR Service Level Example SLA External Commitment SLA Internal Target Recovery Time Objective 16 Hours 4 Hours Recovery Point Objective ~ 0 (No Data Loss) ~ 0 (No Data Loss) Note: This is Recovery of an Entire Data Center with 100’s of Servers, Application, Database, Messaging, etc
  • 13. Agenda • Concepts • Disaster Recovery • Multiple Cells and Data Centers • WebSphere Application Server Recovery • IBM BPM Recovery • Final Thoughts
  • 14. Stage 0 DR – a sound HA strategy • HA is cheaper and Less Complex Than DR . • A Robust HA Solution prevents small failures from becoming disasters • Don’t let a (relatively) minor failure become a catastrophe • Eliminate all single points of failure in your primary datacenter • Spread Workloads Across Multiple Servers (and Hypervisors ! ) • Add 2nd Production WAS-ND Cell • Consider DB Replication, DB2 HADR, Oracle RAC in conjunction with hardware clustering • LDAP/Registry Replication • Otherwise, an HA event could force you to enact your DR procedure • Database is only replicated HA in Different Data Center 13 IP Sprayer Node Agt Node1 Server 1Cluster “A” Web Server User Registry DMgr IHS IHS Server 1 Server 2 Node Agt Node2 Server 2 Server 2 Shared Filesystem WAS Txn Logs Database Storage (SAN) Server 2 Cluster “B” Cluster “C”
  • 15. Multiple Data Center Options (1/4) • Classic DR • Active/Passive • Two Data Centers, one Serving Requests the other Idling • Independent Cells • Easier Than Active/Active • User and Application State Synchronization are Less Critical • Asynchronous Replication Is Likely Sufficient • Lower Cost for Network and Hardware Capacity • From a Capacity perspective One Data Center is Being Underutilized. • Typically Does Not Incur S/W License Charges When Idle • If You Don’t Pay for S/W Licenses Is Cost and Underutilization Still a Concern? • WebSphere License Provides for o Hot – Processing Requests License Required o Warm – Started But Not Processing Requests, License Not Required o Cold – Installed, But Not Started, License Not Required • DB2 and MQ Require < 100 % of Hot Licenses for Replication
  • 16. Multiple Data Center Options (2/4) • “Active/Active” with Single Set of Active Databases • Two Data Centers • Independent Application Cells • Serving Requests for Same Applications • Database(s) Only Active in One Data Center – Additional Latency for Application Data Requests from Remote Data Center – Request Processing Interruption When Data Replica is Promoted to Primary
  • 17. Multiple Data Center Options (3/4) • Classic “Active/Active” • Two Data Centers • Independent Cells and Synchronized Resource Managers (DBs) • Serving Requests for Same Applications • Requires Shared Application Data o Application Data Consistency is prerequisite to any other planning o Simultaneous Reads/Writes = Geographic Synchronous Disk Replication • Additional Hardware and Disk Capacity Required • e.g. IBM High Availability Geographic Cluster (HAGEO), Sun Cluster Geographic Edition • Expectation of Continuous Availability and Transparent Failover o Requires Sharing Application State • Expectation Seldom Realized • Outage of One Data Center, Stops Disk Writes in Both, No Longer “Transparent” • Synchronous Disk Replication Limits Geographic Separation • Hardest and Costliest to Achieve Note: Disk Replication only employed for Application Data and Application State, WAS-ND cell configuration, software updates, and application maintenance should maintained independently in order to insure isolation (and availability)
  • 18. Multiple Data Center Options (4/4) • Hybrid “Active/Active” (Partitioned by Applications) • Two Data Centers • Independent Cells with replicated Resource Managers • Both DC’s Serving Requests, Both DC’s Configured for All Applications • Running Different Applications (With Different Application Data) – New Application Tests – One DC Performing Updates, One DC Performing Inquiry Only (e.g. data warehouse) • No Shared Application State, No Shared Application Data – Asynchronous Replication Sufficient • Global Network Switch Used to Partition/Distribute Traffic • In the Event of a Disaster • Users failover from one DC to the other • Likely Some Interruption – As Data Replica is Promoted to Primary – During Failover Workload Startup • Provides Most of the Benefits of “Classic Active/Active” without the Cost and Complexity
  • 19. The CAP Theorem • In a distributed environment, especially spanning data centers across LANs and WANs there are three core requirements for a service: • Consistency – Either the service works or fails – Traditional ACID of databases provides consistency and isolation • Availability – Extremely important in web business model – In a large distributed system, one may have to compromise with consistency for the sake of availability • Partition Tolerance – Network partition will happen when not all machines are connected – “No set of failures less than the total network failure is allowed to cause the system to respond incorrectly” – Seth and Lynch – Quorum is used to guard against split brain syndrome • Brewer’s CAP conjecture states that • One can achieve only two not all three of the above mentioned requirements http://en.wikipedia.org/wiki/CAP_theorem 18
  • 20. Multiple Active Data Centers and the CAP Theorem • Active/Active requires you to sacrifice either consistency, availability or partition tolerance. • All three aren’t possible • If you choose full availability, then you are going to lose guaranteed consistency. • So you need to design with this in mind, and build in mechanisms (typically involving queuing technologies) that enable your system to "tend towards“ consistency. • Your data is going to be in two places, either partitioned or replicated. • If the former, what happens when one site is down? • If the latter, what happens when users hitting each site see slightly different versions of the current state? • These are very complex problems. • Which is why I try to steer customers away from active/active and into an active/passive model with DR from active to passive. • But they always feel like they are wasting hardware……………! 19
  • 21. Data Center Utilization Urban Legends • Legend • Active/Active Improves Utilization • Reality • An Active/Active Topology at 40-50% Utilization in Each DC Is Equivalent to An Active/Passive Datacenter Deployment with One Active at 80% to 90 % Utilization and the Other Passive • Running Active/Active at Greater Than 50% Of Total (both Datacenters) Capacity Can Often Result in a Complete Loss of Service When a Data Center Outage Occurs o Insufficient Capacity in Remaining Data Center to Handle > 100% Capacity Results in • Poor Response Time (at best) • Network and Server Overload, Resulting in a Complete Crash
  • 22. Active/Active - What’s Wrong With This Picture ? A former employer of mine had two data centers, running active/active at two facilities approximately 2.6 miles (or 4.2 KM) apart. • Close Proximity Addressed Data Consistency Concerns ……..But………
  • 23. What Happens When ? • There’s an earthquake • There’s a Civil Insurrection • A Hazardous Chemical Spill Occurs • And The Wind Is Blowing the Chemical Cloud from West to East (or vice versa) • Your DC May Not Be Located in a Locale Prone to Earthquakes • But what about the other catastrophes ??? • They can, *and* will happen !! • There’s No Substitute For Isolation Between Data Centers • Data Centers Should Be Sufficiently Distant So That a Single Event Doesn’t Impact Both !! • This Likely Mandates Asynchronous Replication • Active/Active No Longer Practical
  • 24. Network Latency and Application Data Consistency – A 3rd Party Perspective • Since the latency or round trip time for a network is usually correlated to the length of the network, or the physical distance between the two end points (in this case the primary and standby), Maximum Protection and Maximum Availability modes are not recommended for Data Guard deployments over a Wide Area Network (WAN). Note that this recommendation is driven by the laws of physics (speed of light limitation) - the greater the distance of a network, the longer it will take for data packets to traverse the network, and hence the longer it will take for primary database transactions to commit. • http://www.oracle.com/technology/deploy/availability/htdocs/dataguardnetwork.htm
  • 25. Multiple Cells and Data Centers • Your Network Team Assures You That Can (or Have) Constructed a Network Link Between Data Centers • For Arguments Sake, We’ll I Agree, It Is possible to construct a network so that latency is NOT an issue Under Normal Conditions • Even so, WANs are Less Reliable than LANs. o And Much Harder To Fix ! • But You’re Missing The Point ! • Network Interdependency Between Data Centers Means That the Data Centers are Not Independent • Question • Do You Want to Have to Explain to Your CIO Why A Problem In One Data Center Impacted The Other and Resulted in a Outage Because You Didn’t Have Cells Aligned to Data Center Boundaries ?
  • 26. How Do I Recover WebSphere Application Server ? • File System or OS Backup and Recovery • Disk or Tape • WAS backUpConfig/restoreConfig – WAS_PROFILE/properties, ../etc, WAS_ROOT/java/jre/lib/*properties, WAS_ROOT/java/jre/lib/security, • Build from Scratch o Only a Realistic Option with Complete Set of Scripts and Rigorous Change Control • Best Options • File System/OS Backup & Recovery • backupConfig/restoreConfig for Deployment Manager – WAS V8.0 and above, addNode –asExistingNode can Reconfigure Each Note After restoreConfig of Deployment Manager Configuration • Both From Last Know Working Production Configuration • Otherwise No Assurance Recovery Will Succeed • Same Concern with “Build From Scratch” ‒ If Using Virtualization Consider VM Cloning for Install and Configuration ‒ Consider Smart Cloud Orchestrator for Automated Install and Configuration o Provision Both Primary Site and DR Site in a Consistent Manner with SCO o Note: Don’t Deploy to Backup to DR Site over a WAN ! • May Need to Change Cell and Host Names • Will the Original Data Center Be Restored, Or Is it Gone (for Good)?
  • 27. WAS Full Profile DR Recovery • Transaction Recovery on Separate (Physical) Server • Access the Transaction Logs – Move/Mount the Transaction Logs to Physical Server Hosting Application Server with Access to Same Resources (e.g. JDBC, JMS) – V8.x Optional Use of DB for Transaction logs • If Recovery Occurs in Different Cell use wsadmin to Configure the Same JAAS Alias for Accessing XA Resources – With adminconsole the node name gets prefixed to the alias. • No Longer Required to Have Same Hostname and IP Address – Different IP’s with Multiple Host Alias’s Typical http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tjta_mvelog.html • WAS Messaging Engine • recoverMEConfig AdminTask Command Retrieves MEUUID From Persistent Message Store and Updates Message Engine Configuration • Allows Recovery of Stranded Messages After Catastrophic ME Failure in WAS V8.5.0 and above • http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rjk_recoverme_config.html • Some Small Capacity in DR Site Needs to be set Aside for Recovery • In Addition to Production Workload • Or Production Workload Not Processed Until Recovery is Complete
  • 28. WAS Liberty Profile DR Recovery • Transaction Recovery on Separate (Physical) Server • Access the Transaction Logs – Move/Mount the Transaction Logs to Physical Server Hosting Application Server with Access to Same Resources (e.g. JDBC, JMS) – V8.x Optional Use of DB for Transaction logs (Not supported for Production) • Restore server.xml Backup (e.g. zip) • No Longer Required to Have Same Hostname and IP Address – Different IP’s with Multiple Host Alias’s Typical • WAS Messaging Engine • Restore server.xml Backup (e.g zip) • Point to Copy of Messages on File System (Could also employ zip/unzip) • Some Small Capacity in DR Site Needs to be set Aside for Recovery • In Addition to Production Workload • Or Production Workload Not Processed Until Recovery is Complete
  • 29. Classic DR for Stateful Applications: full cell replication 28 28 IP Sprayer Node Agt Node1 Msg.mem1Messaging Web Server User Registry DMgr AppTarget Support DMgr IHS IHS App.mem1 Sup.mem1 Node Agt Node2 App.mem2 Sup.mem2 Filesystem (NFS) Node Agt Node1 Msg.mem1 DMgr IHS IHS App.mem1 Sup.mem1 Node Agt Node2 App.mem2 Sup.mem2 WAS Txn Logs WAS Txn Logs Consistency Group SAN Replication for Application Data File Copy for Install & Config Data Database Storage (SAN) Primary Datacenter Secondary Datacenter Msg.mem2 Msg.mem2 A P A P Filesystem (NFS) User Registry Web Server Database Storage (SAN) Consistency Group IP Sprayer Messaging AppTarget Support DMgr
  • 30. DR via Stray Nodes & Database Managed replication 29 29 IP Sprayer Node Agt Node1 Messaging Web Server DMgr AppTarget Support IHS IHS App.mem1 Sup.mem1 Node Agt Node2 App.mem2 Sup.mem2 IP Sprayer Node3 DMgr IHS IHS App.mem3 Sup.mem3 Node4 App.mem4 Sup.mem4 WAS Txn Logs WAS Txn Logs User Registry Primary Datacenter Secondary Datacenter Database Database DB-managed Replication for Application Data Msg.mem1 Msg.mem2 A P Msg.mem3 Msg.mem4 A P Node AgtNode Agt User Registry Web Server
  • 31. IBM BPM: Licensing Guidance for HA/DR Configurations Configuration What’s active? What licenses are needed for the backup nodes? DB2 WAS ND BPM “Classic” Disaster Recovery (DR): SAN-based replication off off off • Files in the backup data center are being synchronized automatically by a SAN. But there is no DB2, WAS ND, or BPM program active. • No extra DB2, WAS ND, or BPM licenses needed for backup nodes. DR configuration with OS replication for config data & DB2 replication for runtime data ON off off • Active DB2 HADR Standby setup considered warm standby – licenses for 100 DB2 PVUs required to cover warm standby servers • BPM and WAS ND are inactive – no extra WAS ND or BPM licenses needed for backup nodes. DR configuration with WAS ND replication for config data & DB2 replication for runtime data ON ON off • Active DB2 HADR Standby setup considered warm standby – licenses for 100 DB2 PVUs required to cover warm standby servers • WebSphere process in the backup nodes used for synchronization – WAS ND licenses are required. • BPM is inactive* – no extra BPM licenses needed for backup nodes. High Availability (HA) ON ON ON • IBM BPM is active in all nodes – full BPM licenses are required. • WAS ND and DB2 licensing based on Supporting Programs terms of BPM Note: For any HA or DR configuration, any node(s) running DMGR only will also require a WAS ND license. (*Inactive: BPM server JVMs in the remote datacenter are not started) REFERENCES • IBM Program License Agreement licensing for Backup Use: This document explains that extra licenses are not required for cold or warm backup nodes. However, if there is a program actively “doing work” to keep the backup node synchronized with the primary site, then that program must be licensed. E.g., when the DB2 and WAS ND node agents are actively “doing work” (replication), they must be licensed, but if IBM BPM services are not active, no BPM licenses are required in backup nodes. • DeveloperWorks article on “Stray Node” DR Configuration: This article describes a “better” Stray Node DR configuration. It is different from a Classic DR configuration in that it keeps the WAS ND environment up-to-date as well as the DB2 environment, in order to reduce the server recovery time after a disaster. In this “better” Stray Node DR configuration, WAS ND node agents are active, but IBM BPM is not active. •DeveloperWorks article on licensing DB2 10.1 servers in a HA environment
  • 32. What Your Mother Didn’t Tell You About Disaster Recovery • Transaction Log replication & Network configuration requirements • Historically, the WebSphere transaction service required IP Address & Hostname at the target server match the source • This is because, for some types of transactions, the server network information is written into the logs themselves • As of WAS v8 servers, this requirement is relaxed a bit – IP addresses no longer need to match • High Availability and Disaster Recovery for the Deployment Manager • Techniques are available that leverage hardware clustering or replication of the DMgr’s cell configuration to an alternate server. These are described in detail at: https://www.ibm.com/developerworks/websphere/techjournal/1001_webcon/1001_webcon.html • Beginning in version 8.5.5 WebSphere supports High Availability features for the Deployment Manager, using a shared filesystem. WAS installations (including BPM) running on applicable versions can leverage this feature: http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.doc/ae/twve_xdsoconfig.html • Because these HA techniques rely on DMgr replication, they apply unchanged to DR scenarios – Generally, we recommend recovering application servers before bringing up a replacement DMgr • A note about logical corruption – data integrity problems that get replicated to the DR environment • We recommend using storage system tooling (for example, FlashCopy) to periodically copy the system state • This can be done at the replica, to avoid interfering with normal operations • If the Primary data and its replica are both corrupted, then state can be restored to a copy made before the corruption • For DR purposes, why can’t I just make one WebSphere ND cell with members running in both of my datacenters? That way, if one datacenter is lost, another can carry the load • See ‘Active/Active Antipattern’ discussion on the following slides 31
  • 33. Active/Active anti-pattern (Cells Spanning Data Centers) Secondary DatacenterPrimary Datacenter 32 IP Sprayer Node Agt Node1 Msg.mem1Messaging Web Server User Registry DMgr AppTarget Support IHS IHS App.mem1 Sup.mem1 Node Agt Node2 App.mem2 Sup.mem2 Node Agt Node3 DMgr App.mem3 Sup.mem3 Node Agt Node4 App.mem4 Sup.mem4 Msg.mem2 Msg.mem3 Msg.mem4 A P P P IP Sprayer Web Server IHS IHS Filesystem (NFS) WAS Logs Consistency Group Consistency Group SAN Replication for Application Data Database Storage (SAN) Storage (SAN) User Registry Filesystem (NFS) Database
  • 34. Why is this type of topology considered an Anti-Pattern? • Active/Active approaches introduce new complexities that undermine the stability of the system • Issues/Problems Can Propagate From One DC to the Other • This Compromises Redundancy and Resiliency • Worst Case a Outage Cascades Across Both Data Centers • Frequently these negate the advantages that led the customer to consider the approach in the first place • Increased risk of network instability can lead to partitioned network (‘split brain’) – Independent Transaction “Recovery” in Both Data Centers By HA Manager – The two data centers could move to inconsistent transactional states!! • Increased network latency can limit system performance during normal operations – Latency between the Application Server and its databases – Latency among cluster members communicating via the WebSphere HA Manager component • Desire to automate failover increases risk of false failover & rapid cycling • A system more than 50% utilized introduces the risk that losing a single component will compromise the entire system, turning what could have been a (simple) HA event into a true disaster • In practice, many Active/Active topologies do not deliver Disaster Recovery capability at all: • Attempts to limit latency lead to datacenters physically near each other, increasing the risk that a single disaster will eliminate the entire system • Many disasters arise from human error and data corruption. Tight coupling between DR resources does not provide protection from this type of failure at all • A WAS-ND Cell Spanning Data Centers will actually interfere with Zero RTO • Refer to • http://www.ibm.com/developerworks/websphere/techjournal/0606_col_alcott/0606_col_alcott.html#sec1d • http://www.ibm.com/developerworks/websphere/techjournal/1004_webcon/1004_webcon.html 33
  • 35. What are the recommended alternatives? • Properly plan a High Availability solution distinct from Disaster Recovery • Eliminate single points of failure through redundancy in network and software components • HA features allow rapid and automatic recovery from loss of a single component. Utilize them! • Improve RTO by reducing complexity, scripting operational procedures and drill • Automate Processes for Repeatability and Consistency – Scripting – Point and Click” is Not Repeatable • Discipline and Practice are Essential • Well Defined Procedures for Every Contingency – You Do Not Want to Learn During an Outage – Practice Those Procedures – Won’t Make Mistakes in Crisis – Validates that Procedures Actually Work – Practice Backup and Recovery, System Failures, Disaster Recovery, etc. • Goal: Make Daily Operations Boring • Improve electricity distribution via Uninterruptable Power Supply • Utilize application design patterns like loose coupling in order to improve application flexibility • In cases where RTO between 1 and 4 hours is necessary, without the requirement to process new work, consider the Stray Node pattern 34
  • 36. Is This Different for the Liberty Profile and Liberty Collectives? • No • Same Fundamentals for Effective Redundancy and the Requirements for Isolation and Independence Apply • Though Liberty May Make It Easier to Ignore or Believe That the Fundamentals Don’t Apply 35
  • 37. Agenda • Concepts • Disaster Recovery • Multiple Cells and Data Centers • WebSphere Application Server Recovery • IBM BPM Recovery • Final Thoughts
  • 38. Disaster Recovery • Develop a Disaster Recovery Plan • Group Business Needs and Associated Applications into Tiers • Group into tiers based on the hard/soft dollar impact on the organization • Categorize by RPO and RTO. • The top tier likely includes zero data loss and either no downtime or perhaps just a few minutes of down time • Subsequent tiers have an RTO of 24 hours, then 48 to 72 hours, then perhaps 72 to 96 hour • Essential Part of Any Plan • Who approves DR move/recovery ? • Automated site failover is a bad idea o Typically triggering DR is very expensive o You do not want to trigger a DR by accident because of some transient issue – just makes the situation worse
  • 39. Disaster Recovery Objectives • Recovery Time Objective • How quickly the system will be able to accept traffic after the disaster • Shorter times require progressively more expensive techniques o e.g., a tape backup and restore is relatively inexpensive o e.g., a fully redundant fully operational data center is very expensive • One challenge is detection time • It takes time to determine you are in a disaster state and trigger disaster procedures o While you are deciding if you are down, you are probably missing your SLA. o Does the RTO include detection time?
  • 40. Disaster Recovery Objectives • Recovery Point Objective • How much data you are willing to lose when there is a disaster • Limiting data loss raises costs o e.g., restoring from tape is relatively inexpensive but you'll lose everything since the last backup o e.g., asynchronous replication of data and system state requires significant network bandwidth to prevent falling far behind o e.g., synchronous replication to the backup data center guarantees no data loss but requires VERY fast and reliable network and will significantly harm performance • Warning: results in increased latency which means capacity must be increased at all layers
  • 41. Disaster Recovery Objectives • Most RTO and RPO goals will deeply impact application and infrastructure architecture and can't be done “after the fact” • e.g., if data is shared across data centers, your database and application design will have to be careful to avoid conflicting database updates and/or tolerate them • e.g., application upgrades have to account for multiple versions of the application running at once which can affect user interface design, database layout, etc • Extreme RTO and RPO goals tend to conflict • e.g., using synchronous disk replication of data gives you a zero RPO but that means the second system can't be operational, which raises RTO • Trying to Achieve a Zero RTO *and* a Zero RPO is Mutually Exclusive
  • 42. Disaster Recovery Testing • The DR hardware Should Be Put Into Actual Production Usage • Otherwise How Can You Be Sure It Will Work When You REALLY Need It. • A Corollary of Murphy’s Law • The larger the numbers, the less likely all Tier 1 machines can be successfully restored to operations. • DR Testing Options • “Saturday Afternoon Surprise” – Unannounced DR Test – Only If You Can Tolerate an Outage • Progressively More Realistic and Complex Tests – Startup of Remote Infrastructure – Remote Startup with Simulated Workload – Remote Startup with Production Workload Shift • Other Issues In a Real Disaster • Will your key staff want to travel? • Will they be able to travel? 41
  • 43. Example and (Very High Level) DR Plan • Executive/Management Approval for Activation of DR • Isolate Data Centers • Halt Incoming Network Traffic – Static “Temporarily Unavailable Web Page” • Break Disk Synchronization • Sever Network Links Between Data Centers • Start and Recovery of Surviving Center • Restore/Recovery Hardware and Middleware • Start DB, Messaging and Application Servers • Examine DB and Message provider logs for pending transactions • Recover Pending transactions and messages • Start Accepting New Work in Surviving Data Center • Enable Network 42
  • 44. Other Aspects to Consider • An HA, CA or DR Deployment Architecture is Not a Product Feature. • WAS and the WebSphere portfolio products – Provide HA Features and Function – Can Be Employed in an HA Architecture – The Appropriate Environment Varies by Customer – One Size Does Not Fit All ! • Optimizing WebSphere HA Capabilities into a Robust Deployment • Requires In-depth Understanding – Of Environment – Of Applications – Of Operational Requirements (Service Levels) • Architectural Advice May Require ISSW Assistance 43
  • 45. Learn from Your Mistakes • Mistakes and failures will occur, learn from them • What separates mediocre organizations from the good and great isn't so much perfection as it is the constant striving to get better – to not repeat mistakes • After every outage perform • Root cause analysis – Capture diagnostic information – Meet as a team including all key players to discuss – Determine precisely what went wrong • Wrong doesn't mean “Bob made an error.” • Find the process flaw that led to the problem – Determine a corrective action that will prevent this from happening again • If you can't, determine what diagnostic information is needed next time this happens and ensure it is collected – Implement that corrective action • All too often this last step isn't done • Verify that action corrected problem • A senior manager must own this process
  • 46. Indispensable When Planning for Catastrophe • Think !
  • 48. Notices and Disclaimers Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
  • 49. Notices and Disclaimers (con’t) Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. • IBM, the IBM logo, ibm.com, Bluemix, Blueworks Live, CICS, Clearcase, DOORS®, Enterprise Document Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, SoDA, SPSS, StoredIQ, Tivoli®, Trusteer®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
  • 50. Thank You Your Feedback is Important! Access the InterConnect 2015 Conference CONNECT Attendee Portal to complete your session surveys from your smartphone, laptop or conference kiosk.
  • 52. 51 Shameless Self Promotion IBM WebSphere Deployment and Advanced Configuration By Roland Barcia, Bill Hines, Tom Alcott and Keys Botzum ISBN: 0131468626
  • 53. 52 Another Recommended Book IBM WebSphere v5.0 System Administration By Leigh Williamson, Lavena Chan,Roger Cundiff, Shawn Lauzon and Christopher C. Mitchell ISBN: 0131446045
  • 54. Licensing Servers as Back Up Servers From IBM Contracts and Practices Database • The policy is to Charge for HOT, and not for WARM or COLD back ups. The following are definitions of what constitutes HOT- WARM-COLD backups: • All programs running in backup mode must be under the customer's control, even if running at another enterprise's location. • COLD - a copy of the program may be stored for backup purpose machine as long as the program has not been started. • There is no charge for this copy. • WARM - a copy of the program may reside for backup purposes on a machine and is started, but is "idling", and is not doing any work of any kind. • There is no charge for this copy. • HOT - a copy of the program may reside for backup purposes on a machine, is started and is doing work. However, this program must be ordered. • There is a charge for this copy. • "Doing Work", includes, for example, production, development, program maintenance, and testing. It also could include other activities such as mirroring of transactions, updating of files, synchronization of programs, data or other resources (e.g. active linking with another machine, program, data base or other resource, etc.) or any activity or configurability that would allow an active hot-switch or other synchronized switch-over between programs, data bases, or other resources to occur Refer to http://www-03.ibm.com/software/sla/sladb.nsf/pdf/policies/$file/Feb-2003-IPLA-backup.pdf for more information 53