Education and training program in the hospital APR.pptx
Disaster recovery toolkit final version
1. Utilising the Cloud for
Disaster Recovery
Craig Scott – Head of ICT Services
South Tyneside College
Supported by AoC
3. Utilising the Cloud for Disaster Recovery 3
Introduction
The Disaster Recovery is something that IT Managers spend a considerable amount of time planning
and preparing for with the hope they will never have to implement those plans. Over the years users
have come to expect IT to be “always on” and available 24/7 to allow them to study or carry out
the duties associated with their job role. These availability and reliability expectations also impacts
on disaster recovery provision, it is no longer sufficient to rely on restoration from backup instead
redundant hardware and facilities are required. This paper discusses factors that must be considered
when planning for disaster recovery and identifies how cloud services can be used as a disaster
recovery solution.
Determining Project Scope
Disaster Recovery – what is it?
The most important starting point for the project is to define what you mean by “Disaster Recovery”.
To you and your team is a disaster the failure of a single server? A fire in your data centre? A power
outage to your entire site or all of the above?
Until you know what you’re trying to protect yourself from its difficult to ensure that you have adequate
process and procedures in place. A risk based approach can help you to identify potential disasters, the
impact they will have on your services and likelihood of their occurrence.
Disaster Recovery vs. High Availability
High Availability (HA) is typically used to describe systems which are connected by high speed low
latency links and often have shared components. Many vendors provide failover clustering technologies
that provide high availability solutions, such as Microsoft Windows Failover Clustering, Oracle Real
Application Clusters, etc…
HA solutions are designed to minimise the downtime of business critical services and can protect against
hardware failure of specific components. HA clusters generally offer automated failover with minimal
data loss. Typically the constituent parts of a failover cluster are located in the same data centre, or are all
located on the same LAN (i.e. multiple datacentres within the same building/campus).
High Availability
4. 4 Utilising the Cloud for Disaster Recovery
As a general rule Disaster Recovery refers to the provision of offsite facilities that are geographically
separate from the primary facilities. A consequence of the geographic separation is the introduction of
higher latency links. The high levels of latency, and potential unreliability of these links makes them
unsuitable for use by many clustering technologies.
The lines between HA and DR do become blurred by some newer technologies which can be used to
provide the levels of failover and reliability typically associated with HA over WAN links. Microsoft
Exchange Database Availability Groups being a typical example.
Defence in Depth
HA and DR are not mutual exclusive options and can be combined to further reduce the risk of
service outage.
Disaster Recovery
5. Utilising the Cloud for Disaster Recovery 5
Objectives
The success of any project is dependent upon clearly defined and understood objectives, without which
it is impossible to measure the success or effectiveness of the project. The exact objectives will vary from
project to project but at a minimum you should consider:
Physical Separation
Based on your risk assessment of the potential disasters what is the minimum level of physical
separation you require between your live and DR systems? Options to consider include:
• Different building
• Different campus
• Different town/city
• Different area of the country
• Different country
• Different continent
Acceptable Downtime
The initial reaction from many IT managers and business managers is that no downtime is acceptable.
However, if the building containing your primary data centre and finance department burns to the
ground it will take time for the finance team to be relocated to different premises, it will take time to
find computers for them to use etc… therefore how quickly do you really need to restore access to your
finance system?
Acceptable Data Loss Window
Whilst zero data loss is certainly desirable as the level of synchronicity between live and DR systems
increases so do the costs, either in terms of the technology required or bandwidth utilised to maintain
synchronicity.
Databases which handle real time transactions, such as on-line or face-to-face enrolments, normally
require a small data loss window, ideally the window should be no more than a handful of transactions.
If you lose a day of transactions can you recreate that data? Does the person who enrolled via your
website know you have lost their data? Do you even know who they are?
For other systems a high window may be more acceptable, what would be the impact of losing the last
3-4 hours of data from your file servers? Is this any different from someone forgetting to press save and
losing a file?
6. 6 Utilising the Cloud for Disaster Recovery
Capacity/Performance
What sort of capacity and performance is acceptable for your DR services? Thought needs to be given as
to whether your DR services need to give your users the same level of performance as your live systems.
Your DR system may introduce new bottlenecks to the mix such as available WAN/internet bandwidth
between DR facilities and users. The amount of expansion capacity and historical capacity also needs to
be considered.
Acceptable Restoration Time
If you have had to activate your DR services at some point you’ll want to switch back to your live
services. How will you do this? Will the failback result in any downtime?
The answers to many of the questions you will need to ask yourself will vary from system to system.
The Cloud Options
Maintaining DR facilities can be expensive, both in terms of investment in hardware, hardware which
you hope you will never need to use, and time to maintain and administer the DR hardware. Use of the
cloud to host your DR facilities can eliminate or reduce a number of these costs.
Most major Cloud providers have globally dispersed redundant data centres which that will generally
be hundreds of miles away from your facilities.
• Infrastructure as a Service (IaaS)
Selection of an IaaS option will remove the need to invest in hardware and construct a secondary
server room/data centre. An IaaS DR solution involves renting sufficient computing resources from
a cloud provider to allow you to create a “virtual data centre” in the cloud. You are then responsible
for creating and maintaining the virtual machines which provide your DR facilities.
• Platform as a Service (PaaS)
With PaaS the cloud provider is responsible for the hardware, operating systems and services.
This removes the need for you to maintain and patch virtual machines. An example of PaaS is
the Microsoft Azure SQL Database service, Microsoft are responsible for the hardware, operating
systems and SQL Server installation, you only need be concerned about your database.
In some cases you may be forced down an IaaS route due to the need to install 3rd party software on a
server, in other cases PaaS may be appropriate. For example, you may need to use IaaS for your finance
system DR as you need to install a 3rd party finance server product but you can use PaaS to provide DR
for your website.
Alternatives to Disaster Recovery - Software as a Service (SaaS)
When looking at the services for which you need to provide DR facilities it is worth asking the question
of whether there is a better way to deliver those services. By moving services such as e-mail from
traditional on-premises hosted solutions to cloud hosting you obviate the need to invest time and money
in providing DR facilities for those services, the availability and accessibility of those services becomes
the cloud providers concern.
7. Utilising the Cloud for Disaster Recovery 7
Selecting a Cloud Provider
Platform
The cloud is a growth area within the IT sector that is rapidly expanding, both in terms of services
offered and companies providing those services. Some providers have invested in the development of
proprietary platforms, such as Amazon E2C or Windows Azure, whilst other providers have developed
services based on “off the shelf” products, such as VMWare.
Compatibility
Compatibility between your cloud provider’s platform and your on-premises virtualisation platform can
affect the options available for your data replication strategy. If the two platforms are compatible or can
be managed by the same virtualisation management platform, such as Microsoft System Centre Virtual
Machine Manager, you may be able to move, or replicate, data and virtual machines between your on-
premises solution and your cloud solution.
Compliance
The requirements of the Data Protection Act (1998) are often cited as being a barrier to the use of the
cloud, in particular the need to obtain subject consent prior to transferring data outside of the EU. You
should not assume that because a cloud provider is based in the UK, or Europe, that your data will be
stored within the EU.
Most major cloud providers have data centres located within the EU and some allow you to select the
“region” or even individual data centre that will be used to store your data.
Security
Physical
Reputable cloud service providers should be able to provide information on the levels of security
accreditation to which their services and data centres comply. Many providers will be delivering services
to customers in the financial, health care, defence sectors as well as local and national governments and
as such will already comply with extremely stringent security requirements.
Connectivity
For your data to reach the data centres of your chosen cloud provider it will probably need to travel
across the public internet. It is important to ensure that the data is protected in transit.
Most SaaS and PaaS solutions have been developed from the ground up as internet services and will
make use of SSL HTTPS to provide secure connectivity. For example HTTPS to connect to a web based
SaaS e-mail solution or SFTP to transfer files to a PaaS hosted website.
8. 8 Utilising the Cloud for Disaster Recovery
IaaS services typically require Virtual Private Networks (VPN) to connect the hosted virtual machines to
your on-premises LAN. Site-to-site VPN’s require a device at both sites to “terminate” the connection,
therefore it is important to confirm that you have a suitable end point device capable of handling your
end of the connection and that the device will work with your cloud providers VPN implementation.
Pricing Model Contract Offerings
Is it necessary for all of your DR assets to be operational 24x7? or do you simply need them ready and
waiting to be fired up?
Most cloud providers pricing is based on the size, allocated storage and hours of usage of a virtual
machine. Applications which are built around an n-tier model will have application servers that host
websites or application software. You may only need to fire up the virtual machines hosting these
application server roles for a few hours a month for testing and patching. Does your cloud providers
pricing structure reflect this usage model?
Understanding Risk
An analysis of the roles and workloads of your systems will help you to identify the level of risk that the
loss of a system poses and therefore the level of DR protection and effort that it warrants.
Systems are often comprised of multiple servers each fulfilling distinct roles. The impact of loss, and ease
of restoration, will vary depending upon the role of the server.
9. Utilising the Cloud for Disaster Recovery 9
Suggested role are listed below:
Data Replication Strategy
Obviously it is necessary for the data in each of your DR systems to be updated regularly and to be no
older then the acceptable data loss window you have identify for that system. It is important to select a
replication method that is appropriate for the level of risk and acceptable data loss window.
Approaches
Application Replication
Many enterprise class applications incorporate their own replication technologies, for example,
Microsoft Exchange Database Availability Groups, Oracle Data Guard, MySQL master/slave replication
etc… Where application replication technologies are available they should be considered as the preferred
option as they are designed to replicate data in a manner that makes sense to the application.
File System Level
In some cases simply copying files from the live systems to the DR systems will suffice to replicate
the data.
Tools such as “robocopy” and “rsync” are able to intelligently determine what differences exist between
source and destination locations and only copy new or changed files to the DR location as well as
removing redundant files from the DR site. Services such as the “Distributed File System” (DFS) built
into Windows server can be used to automate and manage file replication.
It is important to check that a file system copy is appropriate for the type of data being replicated.
Using file system replication to copy the data files of your SQL Server whilst it is running could result
in data corruption.
Role Description
Data Change
Frequency
Ease of
Recreating
Data
Acceptable
Data Loss
Examples
Data Storage
Servers holding non-
transactional data
High Moderate
Moderate
4 hours
File servers, mailbox servers
etc… where users can recreate
documents
Database Databases servers High Low
Low 30
minutes
SQL Server, Oracle, MySQL
etc… especially on-line system
where may not be possible to
recreate data (i.e. e-registers,
on-line enrolment)
Application
Servers which do not
store volatile data
Low High High
Web servers, middle-tier
servers etc… static content
updated infrequently (i.e.
software upgrades, website
redesign etc…)
10. 10 Utilising the Cloud for Disaster Recovery
Virtual Machine Replication
Replication or cloning of entire virtual machines is also a strategy that should be considered. This is
especially useful for cases where all the components of a single system are located on a distinct virtual
machine. This approach should also be consider for application/middle-tier servers where significant
time and effort has been expended customising or configuring the middle-tier components.
Best Approach
Complex systems often consist of multiple servers each of which has a distinct role within that system.
Consider a student records system, this will probably consist of a database server, two identical
application servers and a client application. Your database will be experiencing constant changes
and you need to ensure that in the event of a disaster you don’t lose any records, on the other hand
the software on the application servers is updated via a controlled process every 6 months when the
software vendor releases an update. In this scenario it would be appropriate to make use of the database
systems inbuilt replication technology to protect your database and to use virtual machine replication to
replicate one of the application servers, you might only replicate the virtual machine once a month as it
has a low degree of data volatility.
Software Licences
Typically when you create a virtual machine in the cloud the machine will be based on a template which
has a cost associated with it, usually charged hourly, weekly or monthly. In most cases these prices
include the cost of the licence for the operating system used by the template.
The same usually applies to PaaS in that the charge for the period will include the licence costs for all the
components of that service. For example, you don’t need to purchase licences for Microsoft SQL Server
to use the Microsoft’s Azure SQL Database platform.
Pro’s Con’s Data Granularity Recommended For
Application • Application aware
• Transaction rollback
• Corruption detection
• Automatic failover
• Can be complicated
to setup
• Requires two
installations of
application software
• May require additional
licences
• May introduce
additional overhead
on live systems
Variable but appropriate
for application (i.e.
database transaction,
Active Directory object,
e-mail message etc…)
• Databases
• Mailboxes
• LDAP (inc Active
Directory)
File System
• Simple to set up
• Excludes open files
• Requires scripts and/
or additional software
File level • File shares
VM
Replication
• Replicates entire server • Can be complicated to
setup
• Lots of data to transfer
• Servers may require
reconfiguration once
activated
Virtual Machine (though
some solutions allow
block level)
• Application
servers
• “1 server”
systems
11. Utilising the Cloud for Disaster Recovery 11
You do need to ensure that you are adequately licenced for any software you install on the virtual
machines you create in the cloud. Consider a scenario where you create a virtual machine to host
Microsoft Exchange Server because you want to use Exchange Database Availability Groups to provide
application level replication for your e-mail system, in this scenario you probably wouldn’t need to
purchase a licence for Windows Server (as this will be included in the cost you are paying for the virtual
machine) but you will need to buy a licence for the copy of Microsoft Exchange you have installed on
that server.
Some software vendors incorporate provision in their education and volume licencing schemes that
allows you to install additional copies of their software for disaster recovery purposes.
Obviously you don’t want to spend money on licences you don’t need. Try checking the software
vendor’s website for licencing FAQ’s, contacting the retailers who you purchased the software from or
contacting the vendors directly if you are ensure about what you are or aren’t allowed to do with your
existing licences.
Considering Failover
If you have to activate your DR facilities how will your users and client devices know where to find the
systems they need to connect to?
Most modern networks make use of DNS to locate servers and services, in some cases you may be using
IP addresses to locate services. It is probably that your DR facilities will be on a different IP subnet from
your live systems, your clients need to be informed of this to allow them to connect to your DR facilities.
Active Directory DNS
Assuming that you are utilising Microsoft Active Directory (AD) the servers on your DR site will need
access to the AD and associated DNS in order to operate. Therefore it is recommended that you maintain
at least one operational Domain Controller in your DR facilities. This will also provide inherent DR for
your AD and DNS infrastructures without any further work on your part.
IP Address Allocation
If you have chosen to replicate virtual machines to your DR site do these virtual machines have static IP
addresses assigned? If so you will need to login to each VM as you bring it online and assign a new IP
address. Consider whether you can use DHCP to assign IP addresses to your servers.
Application Aware Failover
If an application has some form of application level replication it may also have application level
failover. Microsoft Exchange Database Availably Groups (DAG) are such an example, with DAG’s the
Exchange client access servers automatically connect to the mailbox server which is hosting the active
database.
Distribute File System (DFS)
Switching to an alternate file server normally involves finding all references to the UNC path of the
failed file server and replacing them with references to the new file server.
DFS allows the creation of a fault tolerant file share containing folders that refer to one or more real file
shares. By configuring an active and inactive referral for each file share, one referencing your live system
and the other your DR system, all you need do to failover is change the referrals appropriately.
12. 12 Utilising the Cloud for Disaster Recovery
DNS for Failover
It is assumed that you have created a Domain Controller in your DR site that is also a DNS server, thus
providing resilience for your DNS. Most of your clients will be using DNS to locate the servers and
services to which they connect, in many cases switching to your DR facilities may involve no more than
changing DNS entries so they point at the DR system.
Consideration needs to be given to the TTL value of the DNS entries as these determine the length of
time your clients will cache the returned DNS data. If your records have a TTL of an hour it could take
that long before some of your clients can access your DR services. You should ensure that the TTL values
for the critical DNS records are set to values that are consistent with your failover objectives.
When planning for DR it is recommend to review the way your clients currently locate their servers,
where possible try to avoid the use of IP addresses or server names and use DNS aliases (CNAME)
records. For example, instead of using http://servername.college.ac.uk/ebs create a DNS CNAME for
ebs-live.college.ac.uk which refers to servername.college.ac.uk that way if you have to switch to your DR
system all you need do is update the CNAME record.
Replicated Virtual Machines
In most cases failover of replicated VM’s will be as simple as powering on the VM, checking it has an
appropriate IP address and ensuring that DNS reflects the current IP address.
Where the VM is a part of a multi-tier application and you have also failed over database tier
components you may need to update the application with the new address of the database server. This
process can be simplified through the use of DNS aliases and application specific redirects, for example,
you might create an DNS alias for “studentrecords-live.college.ac.uk” which points at your live database
server, you then use this address when installing/configuration application-tier components, in the
event of failure all you need to do is change where the DNS alias points.
Network Load Balancers
Network load balancers (NLB) provide an option for failover of some services, good quality load
balancers will be able to detect server and application failure automatically and redirect traffic. However,
you also need to consider DR for your NLB, if you position an NLB on your live site which is configured
to redirect traffic to your DR site what will you do if your NLB is out of action?
Planning
Once you’ve carried out your risk assessment you will have a better idea of the disasters that you may
encounter and the how what the probability of each disaster is. As you have hopefully realised you are
probably more likely to encounter situations where one, or a small number, of related systems have
failed, probably as a result of hardware failure or software problem. The level of detail involved in your
DR plan should reflect how critical the system is and how quickly it needs to be recovered.
You may have generic processes that apply across multiple systems, for example, if you have multiple
database servers with identical DR processes a single process is probably sufficient.
Whilst it is possible to create detailed scripts and automated procedures that can be sued to activate DR
facilities every disaster tends to be different and needs to be assessed individually. The process to fix a
disaster of type A may in fact make a disaster of type B worse.
13. Utilising the Cloud for Disaster Recovery 13
The best approach is to take a scenario based approach, start with the highest probability highest
impact risks and work down to those with the lowest probability and impact.
An important consideration in your planning is who has the authority to declare a “disaster” and invoke
the DR plan? In some cases invoking the DR plan may result in more overall disruption then it would to
leave a particular service offline for an hour while you fix it.
Testing
It is essential to test your DR processes regularly. The scope of testing needs to be considered on a system
by system basis, also consider if you need to test every system? again if you have 20 servers with an
identical process do you need to test them all regularly?
For systems with transparent application level replication and failover testing should be straight forward
and can be done regularly. In cases where a failover would be disruptive is simulating failover sufficient
for the system in question?
Example Implementation
Background
Until the summer of 2011 South Tyneside College (STC) operated across two major campus (Westoe
Hebburn) and a third specialist campus (MSTC). STC’s primary data centre was located on the main
campus (Westoe) with a smaller sever room at Hebburn, the MSTC has only a single server. Systems had
been established for some time to replicate data and services between Westoe and Hebburn allowing
either campus to act as DR site for the other.
14. 14 Utilising the Cloud for Disaster Recovery
For reasons of operational efficiency a decision was taken to close Hebburn. Due to the high cost of
creating the necessary facilities and upgrading the data links it was not feasible to establish DR facilities
for Westoe at the MSTC. A redundant server room in a separate building on the Westoe campus was
refurbished for DR use.
Challenge
The primary data centre supports 46 physical servers and 69 virtual machines, a further 16 physical
servers are located in the secondary server room providing support for DR. The hardware in the
secondary server room had previously been the “live” hardware from Hebburn and was planned for
replacement in summer 2013. Estimated costs for replacing this equipment were expected to be in the
region of £50,000 - £60,000. Examination of the available options indicated that the use of the cloud for
our DR facilities would result savings of around 10-15% and provide a truly offsite solution. The work
involved would also allow us to gradually migrate a number of live services from on-premises to the
cloud in future, producing further cost savings.
15. Utilising the Cloud for Disaster Recovery 15
Planning
Numbers
Due to the levels of reliance and HA provided by the equipment in the primary data centre which meant
that we only expect to need to activate the DR facilities in the event of a disaster which renders our main
campus unusable (fire, floor, prolonged power outage etc..). Under these circumstances we anticipate
that the major performance bottleneck will be the available bandwidth of the internet connection(s) used
to connect to the virtual data centre.
Based on this supposition the following criteria were applied to determine if a system or server was
within scope of the project.
• Where multiple load balanced application servers for the same service existed we would only
provide one DR server
• Where we had split large workloads across non-load balanced servers (i.e. file servers) we would
consolidate these workloads on one DR server
• Servers in DMZ would be excluded where these services duplicated LAN servers which are in scope
• Servers which were used to support physical equipment which would likely be inaccessible during a
disaster would be excluded from scope. This was based on the grounds that if our buildings are out
of action so will be the equipment they contain therefore print servers, wi-fi controllers etc… would
not be required.
Analysis of the roles and workloads of our servers indicated that our disaster recovery strategy needed
to support a minimum of 29 servers.
Workloads
Of the 29 systems within project scope we identified 9 database servers and 3 data store servers (file
server, mailbox Active Directory). The remaining servers fit into the application server category.
Replication Failover Strategies
Based on the workloads of the systems in scope a combination of application level, file system level
and virtual machine cloning was adopted. For a small number of cases it was recognised that the best
option was to build a new application server in the cloud due to the comprehensive application level
functionality provided by that system, for example, Microsoft Exchange Client Access Servers.
Application Level Replication
Application level replication was select for Active Directory (AD has inherent replication), Microsoft
SQL Server, MySQL Server, Microsoft Exchange. All of these applications have built in multi-server
replication mechanisms which allowed for recovery windows of less than 15 minutes.
Failover procedures for these systems are either automatic/inherent (i.e. Active Directory Exchange),
or requires a flag setting within the application to indicate the primary server (SQL Server MySQL).
In the case of SQL Server and MySQL Server it is also necessary to update the configurations of the
application servers/client applications to reference the DR servers as opposed to the live servers.
16. 16 Utilising the Cloud for Disaster Recovery
File Level Replication
File level replication was used to replicate data from the 4 on-premises file servers to the single cloud
based file server using the built in “robocopy” command and its mirroring/synchronization option. The
synchronization was scheduled to run overnight as a one working day recovery window was deemed
adequate for file services.
As Microsoft DFS is used in all links and paths that reference the file shares on the file servers failover
involves disabling the referral to the on-premises servers and enabling the referral to the cloud servers.
Virtual Machine Replication – Database Servers
A small number of simple systems, some quite critical, have all their components installed on a single
virtual machine. These applications either do not have a high workload, or do not have scalable
architectures. Systems falling within this category include the payroll system, library management
system, active directory certificate services and an Oracle Express server used for teaching purposes. For
these systems virtual machine level replication was selected with a nightly replication interval.
Failover requires the virtual machines be brought on-line, they will automatically register their new IP
address with DNS.
Virtual Machine Replication – Application Servers
The remaining systems all fulfilled application front end/middle tier roles, therefore virtual machine
replication was selected as the replication strategy. As updates and changes to the live servers are
carried out via a controlled change management process a weekly virtual machine refresh was deemed
sufficient.
Failover requires the virtual machines be brought on-line, they will automatically register their new IP
address with DNS. In some cases it is also necessary to update the database server references to refer to
the DR database servers.
Cloud Provider Selection
Once the workloads, replication and failover strategies had been decided upon a review of the services
offered by various cloud service providers was undertaken.
As it was identified that 60% of the virtual machines required for the DR solution would only need to be
powered up for testing and patching for a couple of hours each month providers with an hourly pricing
model were favoured.
Compatibility with existing systems was also a factor in provider selection. The virtualisation
infrastructure at STC is based on Microsoft Hyper-V (Windows 2008 R2) managed by Microsoft System
Centre Virtual Machine Manager (MSCVMM) 2012. Therefore solutions that offered management
integration with MSCVMM and virtual machine migration from Hyper-V were favoured.
Consideration of the above factors, plus pricing, resulted in the selection of the Microsoft Windows
Azure platform, Microsoft were able to offer favourable educational pricing. However as we were the
first UK institution to sign up to Azure via an education agreement we discovered Microsoft’s signup
procedure were not fully developed which resulted in delays of many months. It should be noted that
we have been assured by Microsoft that these procedures are now fully developed and have been used
successfully by other institutions.
17. Utilising the Cloud for Disaster Recovery 17
Implementation Process
Implementation of the solution was approached via the following sequence:
1. Establish VPN connectivity
STC uses a pair of Smoothwall UTM-3000 appliances to provide internet content filtering and
firewall services. The Smoothwall UTM-3000 supports IPSec site-to-site VPN’s as does Windows
Azure. Establishing a site-to-site VPN between the two systems was relatively straight forward.
2. Build commission Domain Controller in Azure
The first server created in Azure was a Domain Controller to provide Active Directory and DNS
services to our other servers. This was accomplished through installation of a Windows 2008 R2 on
a new virtual machine which we then promoted this server to a Domain Controller and installed the
DNS server role.
3. Build database, mailbox and file servers
Servers were built to host these roles and the appropriate application software installed (i.e.
Microsoft Exchange, Microsoft SQL Server etc…)
4. Establish Replication
Application level replication was established for:
• Exchange – DR server was added to Exchange Database Availability Group and existing mailbox
database with the DAG had a new replication targets added.
• SQL Server – database log shipping was selected as the most appropriate replication method and
using the wizards built into SQL Server Management Studio new log shipping partnerships were
created.
• File Servers – initial replication of file data was accomplished via the “robocopy” command line
tool, subsequent replication runs made use of the “/mir” switch to synchronize the data on the
replica servers
5. Establish virtual machine replication
Virtual machine replication was initially achieved through the copying backups of the VHD files of
live virtual machines to Azure using the “csupload” command line tool. However work is on-going
to use System Centre App Controller and System Centre Orchestrator to accomplish these tasks in
future.
18. 18 Utilising the Cloud for Disaster Recovery
Future Developments
Partially as a result of our experiences with this project it is the intention of STC to make significantly
more use cloud computing services. In some cases we have identified that increased adoption of cloud
services may in fact increase costs but offers us significantly better functionality.
• Office 365
A project is underway to migrate all staff student e-mail content, 500GB of SharePoint content, and
the contents of staff student “My Documents” folders (approximately 1TB of files) to Office 365.
• Hyper-V Replica
Windows Server 2012 introduced the ability to have active/passive replicas of individual virtual
machines. An Azure implementation of this technology is in development which allows Azure to
participate as one side of this partnership. Once available this solution will be used to accomplish
VM replication to Azure.
• Server Migration
Work carried out to date has proven that it is feasible and practical for us to host servers in Windows
Azure. Over the next 3 years an increasing proportion of our server infrastructure will be moved
from on-premises hardware to Azure. The migration to Office 365 is the first step of this process as it
eliminates the need for on-premises e-mail, file storage and SharePoint servers.
19. Utilising the Cloud for Disaster Recovery 19
Provider OS Virtual Machines Storage Bandwidth VPN Requirements
Est Cost
Per Month
Annual
CostSmall Medium Large Space IOPS In Out SmallVM's MediumVM's LargeVM's Storage Bandwidth VPN
CPU RAM HDD Price/
hr
CPU RAM HDD Price/
hr
CPU RAM HDD Price/
hr
Price/
GB per
month
Price/
million
per
month
Price/
GB per
month
Price/
GB per
month
Price
Per
Hour
No. Hours
Per
VM Per
Month
No. Hours
Per
VM Per
Month
No. Hours
Per
VM Per
Month
GB per
month
IOPS per
month
In GB
per
month
Out
GB per
month
Hours
Per
Month
Appendix 1
20. 20 Utilising the Cloud for Disaster Recovery
Appendix 2
Disaster Scope Impact Assessment
Controls
Residual Risk
College Campus Building Service Downtime Liklihood Impact Score Downtime Liklihood Impact Score
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
21. Utilising the Cloud for Disaster Recovery 21
Appendix 3
Server System Workload Scope Why Not
in Scope?
Replication
Strategy
Recovery
Window
State Failover Size CPU RAM Storage Operating
System
22. 22 Utilising the Cloud for Disaster Recovery
System Downtime Trigger Failover Authorisation Sequence Role Action
Student Records 30 minutes IT Manager 1 Database Active standby mirror
2 Application
Update HKLMSoftwareAdatum
StudentRecordsDatabaseServer
3 Clients Advise users to reboot
Finance System 4 hours IT manager 1 Database Active standby mirror
Appendix 4
24. Association of Colleges
2-5 Stedham Place
London
WC1A 1HU
Telephone: 020 7034 9900
Facsimile: 020 7034 9950
Email: sharedservices@aoc.co.uk
Or visit our web site
www.aoc.co.uk