SlideShare une entreprise Scribd logo
1  sur  24
Utilising the Cloud for
Disaster Recovery
Craig Scott – Head of ICT Services
South Tyneside College
Supported by AoC
Disaster recovery toolkit final version
Utilising the Cloud for Disaster Recovery 3
Introduction
The Disaster Recovery is something that IT Managers spend a considerable amount of time planning
and preparing for with the hope they will never have to implement those plans. Over the years users
have come to expect IT to be “always on” and available 24/7 to allow them to study or carry out
the duties associated with their job role. These availability and reliability expectations also impacts
on disaster recovery provision, it is no longer sufficient to rely on restoration from backup instead
redundant hardware and facilities are required. This paper discusses factors that must be considered
when planning for disaster recovery and identifies how cloud services can be used as a disaster
recovery solution.
Determining Project Scope
Disaster Recovery – what is it?
The most important starting point for the project is to define what you mean by “Disaster Recovery”.
To you and your team is a disaster the failure of a single server? A fire in your data centre? A power
outage to your entire site or all of the above?
Until you know what you’re trying to protect yourself from its difficult to ensure that you have adequate
process and procedures in place. A risk based approach can help you to identify potential disasters, the
impact they will have on your services and likelihood of their occurrence.
Disaster Recovery vs. High Availability
High Availability (HA) is typically used to describe systems which are connected by high speed low
latency links and often have shared components. Many vendors provide failover clustering technologies
that provide high availability solutions, such as Microsoft Windows Failover Clustering, Oracle Real
Application Clusters, etc…
HA solutions are designed to minimise the downtime of business critical services and can protect against
hardware failure of specific components. HA clusters generally offer automated failover with minimal
data loss. Typically the constituent parts of a failover cluster are located in the same data centre, or are all
located on the same LAN (i.e. multiple datacentres within the same building/campus).
High Availability
4 Utilising the Cloud for Disaster Recovery
As a general rule Disaster Recovery refers to the provision of offsite facilities that are geographically
separate from the primary facilities. A consequence of the geographic separation is the introduction of
higher latency links. The high levels of latency, and potential unreliability of these links makes them
unsuitable for use by many clustering technologies.
The lines between HA and DR do become blurred by some newer technologies which can be used to
provide the levels of failover and reliability typically associated with HA over WAN links. Microsoft
Exchange Database Availability Groups being a typical example.
Defence in Depth
HA and DR are not mutual exclusive options and can be combined to further reduce the risk of
service outage.
Disaster Recovery
Utilising the Cloud for Disaster Recovery 5
Objectives
The success of any project is dependent upon clearly defined and understood objectives, without which
it is impossible to measure the success or effectiveness of the project. The exact objectives will vary from
project to project but at a minimum you should consider:
Physical Separation
Based on your risk assessment of the potential disasters what is the minimum level of physical
separation you require between your live and DR systems? Options to consider include:
•	 Different building
•	 Different campus
•	 Different town/city
•	 Different area of the country
•	 Different country
•	 Different continent
Acceptable Downtime
The initial reaction from many IT managers and business managers is that no downtime is acceptable.
However, if the building containing your primary data centre and finance department burns to the
ground it will take time for the finance team to be relocated to different premises, it will take time to
find computers for them to use etc… therefore how quickly do you really need to restore access to your
finance system?
Acceptable Data Loss Window
Whilst zero data loss is certainly desirable as the level of synchronicity between live and DR systems
increases so do the costs, either in terms of the technology required or bandwidth utilised to maintain
synchronicity.
Databases which handle real time transactions, such as on-line or face-to-face enrolments, normally
require a small data loss window, ideally the window should be no more than a handful of transactions.
If you lose a day of transactions can you recreate that data? Does the person who enrolled via your
website know you have lost their data? Do you even know who they are?
For other systems a high window may be more acceptable, what would be the impact of losing the last
3-4 hours of data from your file servers? Is this any different from someone forgetting to press save and
losing a file?
6 Utilising the Cloud for Disaster Recovery
Capacity/Performance
What sort of capacity and performance is acceptable for your DR services? Thought needs to be given as
to whether your DR services need to give your users the same level of performance as your live systems.
Your DR system may introduce new bottlenecks to the mix such as available WAN/internet bandwidth
between DR facilities and users. The amount of expansion capacity and historical capacity also needs to
be considered.
Acceptable Restoration Time
If you have had to activate your DR services at some point you’ll want to switch back to your live
services. How will you do this? Will the failback result in any downtime?
The answers to many of the questions you will need to ask yourself will vary from system to system.
The Cloud Options
Maintaining DR facilities can be expensive, both in terms of investment in hardware, hardware which
you hope you will never need to use, and time to maintain and administer the DR hardware. Use of the
cloud to host your DR facilities can eliminate or reduce a number of these costs.
Most major Cloud providers have globally dispersed redundant data centres which that will generally
be hundreds of miles away from your facilities.
•	 Infrastructure as a Service (IaaS)
	Selection of an IaaS option will remove the need to invest in hardware and construct a secondary
server room/data centre. An IaaS DR solution involves renting sufficient computing resources from
a cloud provider to allow you to create a “virtual data centre” in the cloud. You are then responsible
for creating and maintaining the virtual machines which provide your DR facilities.
•	 Platform as a Service (PaaS)
	With PaaS the cloud provider is responsible for the hardware, operating systems and services.
This removes the need for you to maintain and patch virtual machines. An example of PaaS is
the Microsoft Azure SQL Database service, Microsoft are responsible for the hardware, operating
systems and SQL Server installation, you only need be concerned about your database.
In some cases you may be forced down an IaaS route due to the need to install 3rd party software on a
server, in other cases PaaS may be appropriate. For example, you may need to use IaaS for your finance
system DR as you need to install a 3rd party finance server product but you can use PaaS to provide DR
for your website.
Alternatives to Disaster Recovery - Software as a Service (SaaS)
When looking at the services for which you need to provide DR facilities it is worth asking the question
of whether there is a better way to deliver those services. By moving services such as e-mail from
traditional on-premises hosted solutions to cloud hosting you obviate the need to invest time and money
in providing DR facilities for those services, the availability and accessibility of those services becomes
the cloud providers concern.
Utilising the Cloud for Disaster Recovery 7
Selecting a Cloud Provider
Platform
The cloud is a growth area within the IT sector that is rapidly expanding, both in terms of services
offered and companies providing those services. Some providers have invested in the development of
proprietary platforms, such as Amazon E2C or Windows Azure, whilst other providers have developed
services based on “off the shelf” products, such as VMWare.
Compatibility
Compatibility between your cloud provider’s platform and your on-premises virtualisation platform can
affect the options available for your data replication strategy. If the two platforms are compatible or can
be managed by the same virtualisation management platform, such as Microsoft System Centre Virtual
Machine Manager, you may be able to move, or replicate, data and virtual machines between your on-
premises solution and your cloud solution.
Compliance
The requirements of the Data Protection Act (1998) are often cited as being a barrier to the use of the
cloud, in particular the need to obtain subject consent prior to transferring data outside of the EU. You
should not assume that because a cloud provider is based in the UK, or Europe, that your data will be
stored within the EU.
Most major cloud providers have data centres located within the EU and some allow you to select the
“region” or even individual data centre that will be used to store your data.
Security
Physical
Reputable cloud service providers should be able to provide information on the levels of security
accreditation to which their services and data centres comply. Many providers will be delivering services
to customers in the financial, health care, defence sectors as well as local and national governments and
as such will already comply with extremely stringent security requirements.
Connectivity
For your data to reach the data centres of your chosen cloud provider it will probably need to travel
across the public internet. It is important to ensure that the data is protected in transit.
Most SaaS and PaaS solutions have been developed from the ground up as internet services and will
make use of SSL  HTTPS to provide secure connectivity. For example HTTPS to connect to a web based
SaaS e-mail solution or SFTP to transfer files to a PaaS hosted website.
8 Utilising the Cloud for Disaster Recovery
IaaS services typically require Virtual Private Networks (VPN) to connect the hosted virtual machines to
your on-premises LAN. Site-to-site VPN’s require a device at both sites to “terminate” the connection,
therefore it is important to confirm that you have a suitable end point device capable of handling your
end of the connection and that the device will work with your cloud providers VPN implementation.
Pricing Model  Contract Offerings
Is it necessary for all of your DR assets to be operational 24x7? or do you simply need them ready and
waiting to be fired up?
Most cloud providers pricing is based on the size, allocated storage and hours of usage of a virtual
machine. Applications which are built around an n-tier model will have application servers that host
websites or application software. You may only need to fire up the virtual machines hosting these
application server roles for a few hours a month for testing and patching. Does your cloud providers
pricing structure reflect this usage model?
Understanding Risk
An analysis of the roles and workloads of your systems will help you to identify the level of risk that the
loss of a system poses and therefore the level of DR protection and effort that it warrants.
Systems are often comprised of multiple servers each fulfilling distinct roles. The impact of loss, and ease
of restoration, will vary depending upon the role of the server.
Utilising the Cloud for Disaster Recovery 9
Suggested role are listed below:
Data Replication Strategy
Obviously it is necessary for the data in each of your DR systems to be updated regularly and to be no
older then the acceptable data loss window you have identify for that system. It is important to select a
replication method that is appropriate for the level of risk and acceptable data loss window.
Approaches
Application Replication
Many enterprise class applications incorporate their own replication technologies, for example,
Microsoft Exchange Database Availability Groups, Oracle Data Guard, MySQL master/slave replication
etc… Where application replication technologies are available they should be considered as the preferred
option as they are designed to replicate data in a manner that makes sense to the application.
File System Level
In some cases simply copying files from the live systems to the DR systems will suffice to replicate
the data.
Tools such as “robocopy” and “rsync” are able to intelligently determine what differences exist between
source and destination locations and only copy new or changed files to the DR location as well as
removing redundant files from the DR site. Services such as the “Distributed File System” (DFS) built
into Windows server can be used to automate and manage file replication.
It is important to check that a file system copy is appropriate for the type of data being replicated.
Using file system replication to copy the data files of your SQL Server whilst it is running could result
in data corruption.
Role Description
Data Change
Frequency
Ease of
Recreating
Data
Acceptable
Data Loss
Examples
Data Storage
Servers holding non-
transactional data
High Moderate
Moderate 
4 hours
File servers, mailbox servers
etc… where users can recreate
documents
Database Databases servers High Low
Low  30
minutes
SQL Server, Oracle, MySQL
etc… especially on-line system
where may not be possible to
recreate data (i.e. e-registers,
on-line enrolment)
Application
Servers which do not
store volatile data
Low High High
Web servers, middle-tier
servers etc… static content
updated infrequently (i.e.
software upgrades, website
redesign etc…)
10 Utilising the Cloud for Disaster Recovery
Virtual Machine Replication
Replication or cloning of entire virtual machines is also a strategy that should be considered. This is
especially useful for cases where all the components of a single system are located on a distinct virtual
machine. This approach should also be consider for application/middle-tier servers where significant
time and effort has been expended customising or configuring the middle-tier components.
Best Approach
Complex systems often consist of multiple servers each of which has a distinct role within that system.
Consider a student records system, this will probably consist of a database server, two identical
application servers and a client application. Your database will be experiencing constant changes
and you need to ensure that in the event of a disaster you don’t lose any records, on the other hand
the software on the application servers is updated via a controlled process every 6 months when the
software vendor releases an update. In this scenario it would be appropriate to make use of the database
systems inbuilt replication technology to protect your database and to use virtual machine replication to
replicate one of the application servers, you might only replicate the virtual machine once a month as it
has a low degree of data volatility.
Software Licences
Typically when you create a virtual machine in the cloud the machine will be based on a template which
has a cost associated with it, usually charged hourly, weekly or monthly. In most cases these prices
include the cost of the licence for the operating system used by the template.
The same usually applies to PaaS in that the charge for the period will include the licence costs for all the
components of that service. For example, you don’t need to purchase licences for Microsoft SQL Server
to use the Microsoft’s Azure SQL Database platform.
Pro’s Con’s Data Granularity Recommended For
Application •	 Application aware
•	 Transaction rollback
•	 Corruption detection
•	 Automatic failover
•	Can be complicated
to setup
•	Requires two
installations of
application software
•	May require additional
licences
•	May introduce
additional overhead
on live systems
Variable but appropriate
for application (i.e.
database transaction,
Active Directory object,
e-mail message etc…)
•	Databases
•	Mailboxes
•	LDAP (inc Active
Directory)
File System
•	 Simple to set up
•	 Excludes open files
•	Requires scripts and/
or additional software
File level •	 File shares
VM
Replication
•	 Replicates entire server •	Can be complicated to
setup
•	 Lots of data to transfer
•	Servers may require
reconfiguration once
activated
Virtual Machine (though
some solutions allow
block level)
•	Application
servers
•	“1 server”
systems
Utilising the Cloud for Disaster Recovery 11
You do need to ensure that you are adequately licenced for any software you install on the virtual
machines you create in the cloud. Consider a scenario where you create a virtual machine to host
Microsoft Exchange Server because you want to use Exchange Database Availability Groups to provide
application level replication for your e-mail system, in this scenario you probably wouldn’t need to
purchase a licence for Windows Server (as this will be included in the cost you are paying for the virtual
machine) but you will need to buy a licence for the copy of Microsoft Exchange you have installed on
that server.
Some software vendors incorporate provision in their education and volume licencing schemes that
allows you to install additional copies of their software for disaster recovery purposes.
Obviously you don’t want to spend money on licences you don’t need. Try checking the software
vendor’s website for licencing FAQ’s, contacting the retailers who you purchased the software from or
contacting the vendors directly if you are ensure about what you are or aren’t allowed to do with your
existing licences.
Considering Failover
If you have to activate your DR facilities how will your users and client devices know where to find the
systems they need to connect to?
Most modern networks make use of DNS to locate servers and services, in some cases you may be using
IP addresses to locate services. It is probably that your DR facilities will be on a different IP subnet from
your live systems, your clients need to be informed of this to allow them to connect to your DR facilities.
Active Directory  DNS
Assuming that you are utilising Microsoft Active Directory (AD) the servers on your DR site will need
access to the AD and associated DNS in order to operate. Therefore it is recommended that you maintain
at least one operational Domain Controller in your DR facilities. This will also provide inherent DR for
your AD and DNS infrastructures without any further work on your part.
IP Address Allocation
If you have chosen to replicate virtual machines to your DR site do these virtual machines have static IP
addresses assigned? If so you will need to login to each VM as you bring it online and assign a new IP
address. Consider whether you can use DHCP to assign IP addresses to your servers.
Application Aware Failover
If an application has some form of application level replication it may also have application level
failover. Microsoft Exchange Database Availably Groups (DAG) are such an example, with DAG’s the
Exchange client access servers automatically connect to the mailbox server which is hosting the active
database.
Distribute File System (DFS)
Switching to an alternate file server normally involves finding all references to the UNC path of the
failed file server and replacing them with references to the new file server.
DFS allows the creation of a fault tolerant file share containing folders that refer to one or more real file
shares. By configuring an active and inactive referral for each file share, one referencing your live system
and the other your DR system, all you need do to failover is change the referrals appropriately.
12 Utilising the Cloud for Disaster Recovery
DNS for Failover
It is assumed that you have created a Domain Controller in your DR site that is also a DNS server, thus
providing resilience for your DNS. Most of your clients will be using DNS to locate the servers and
services to which they connect, in many cases switching to your DR facilities may involve no more than
changing DNS entries so they point at the DR system.
Consideration needs to be given to the TTL value of the DNS entries as these determine the length of
time your clients will cache the returned DNS data. If your records have a TTL of an hour it could take
that long before some of your clients can access your DR services. You should ensure that the TTL values
for the critical DNS records are set to values that are consistent with your failover objectives.
When planning for DR it is recommend to review the way your clients currently locate their servers,
where possible try to avoid the use of IP addresses or server names and use DNS aliases (CNAME)
records. For example, instead of using http://servername.college.ac.uk/ebs create a DNS CNAME for
ebs-live.college.ac.uk which refers to servername.college.ac.uk that way if you have to switch to your DR
system all you need do is update the CNAME record.
Replicated Virtual Machines
In most cases failover of replicated VM’s will be as simple as powering on the VM, checking it has an
appropriate IP address and ensuring that DNS reflects the current IP address.
Where the VM is a part of a multi-tier application and you have also failed over database tier
components you may need to update the application with the new address of the database server. This
process can be simplified through the use of DNS aliases and application specific redirects, for example,
you might create an DNS alias for “studentrecords-live.college.ac.uk” which points at your live database
server, you then use this address when installing/configuration application-tier components, in the
event of failure all you need to do is change where the DNS alias points.
Network Load Balancers
Network load balancers (NLB) provide an option for failover of some services, good quality load
balancers will be able to detect server and application failure automatically and redirect traffic. However,
you also need to consider DR for your NLB, if you position an NLB on your live site which is configured
to redirect traffic to your DR site what will you do if your NLB is out of action?
Planning
Once you’ve carried out your risk assessment you will have a better idea of the disasters that you may
encounter and the how what the probability of each disaster is. As you have hopefully realised you are
probably more likely to encounter situations where one, or a small number, of related systems have
failed, probably as a result of hardware failure or software problem. The level of detail involved in your
DR plan should reflect how critical the system is and how quickly it needs to be recovered.
You may have generic processes that apply across multiple systems, for example, if you have multiple
database servers with identical DR processes a single process is probably sufficient.
Whilst it is possible to create detailed scripts and automated procedures that can be sued to activate DR
facilities every disaster tends to be different and needs to be assessed individually. The process to fix a
disaster of type A may in fact make a disaster of type B worse.
Utilising the Cloud for Disaster Recovery 13
The best approach is to take a scenario based approach, start with the highest probability  highest
impact risks and work down to those with the lowest probability and impact.
An important consideration in your planning is who has the authority to declare a “disaster” and invoke
the DR plan? In some cases invoking the DR plan may result in more overall disruption then it would to
leave a particular service offline for an hour while you fix it.
Testing
It is essential to test your DR processes regularly. The scope of testing needs to be considered on a system
by system basis, also consider if you need to test every system? again if you have 20 servers with an
identical process do you need to test them all regularly?
For systems with transparent application level replication and failover testing should be straight forward
and can be done regularly. In cases where a failover would be disruptive is simulating failover sufficient
for the system in question?
Example Implementation
Background
Until the summer of 2011 South Tyneside College (STC) operated across two major campus (Westoe 
Hebburn) and a third specialist campus (MSTC). STC’s primary data centre was located on the main
campus (Westoe) with a smaller sever room at Hebburn, the MSTC has only a single server. Systems had
been established for some time to replicate data and services between Westoe and Hebburn allowing
either campus to act as DR site for the other.
14 Utilising the Cloud for Disaster Recovery
For reasons of operational efficiency a decision was taken to close Hebburn. Due to the high cost of
creating the necessary facilities and upgrading the data links it was not feasible to establish DR facilities
for Westoe at the MSTC. A redundant server room in a separate building on the Westoe campus was
refurbished for DR use.
Challenge
The primary data centre supports 46 physical servers and 69 virtual machines, a further 16 physical
servers are located in the secondary server room providing support for DR. The hardware in the
secondary server room had previously been the “live” hardware from Hebburn and was planned for
replacement in summer 2013. Estimated costs for replacing this equipment were expected to be in the
region of £50,000 - £60,000. Examination of the available options indicated that the use of the cloud for
our DR facilities would result savings of around 10-15% and provide a truly offsite solution. The work
involved would also allow us to gradually migrate a number of live services from on-premises to the
cloud in future, producing further cost savings.
Utilising the Cloud for Disaster Recovery 15
Planning
Numbers
Due to the levels of reliance and HA provided by the equipment in the primary data centre which meant
that we only expect to need to activate the DR facilities in the event of a disaster which renders our main
campus unusable (fire, floor, prolonged power outage etc..). Under these circumstances we anticipate
that the major performance bottleneck will be the available bandwidth of the internet connection(s) used
to connect to the virtual data centre.
Based on this supposition the following criteria were applied to determine if a system or server was
within scope of the project.
•	Where multiple load balanced application servers for the same service existed we would only
provide one DR server
•	Where we had split large workloads across non-load balanced servers (i.e. file servers) we would
consolidate these workloads on one DR server
•	 Servers in DMZ would be excluded where these services duplicated LAN servers which are in scope
•	Servers which were used to support physical equipment which would likely be inaccessible during a
disaster would be excluded from scope. This was based on the grounds that if our buildings are out
of action so will be the equipment they contain therefore print servers, wi-fi controllers etc… would
not be required.
Analysis of the roles and workloads of our servers indicated that our disaster recovery strategy needed
to support a minimum of 29 servers.
Workloads
Of the 29 systems within project scope we identified 9 database servers and 3 data store servers (file
server, mailbox  Active Directory). The remaining servers fit into the application server category.
Replication  Failover Strategies
Based on the workloads of the systems in scope a combination of application level, file system level
and virtual machine cloning was adopted. For a small number of cases it was recognised that the best
option was to build a new application server in the cloud due to the comprehensive application level
functionality provided by that system, for example, Microsoft Exchange Client Access Servers.
Application Level Replication
Application level replication was select for Active Directory (AD has inherent replication), Microsoft
SQL Server, MySQL Server, Microsoft Exchange. All of these applications have built in multi-server
replication mechanisms which allowed for recovery windows of less than 15 minutes.
Failover procedures for these systems are either automatic/inherent (i.e. Active Directory  Exchange),
or requires a flag setting within the application to indicate the primary server (SQL Server  MySQL).
In the case of SQL Server and MySQL Server it is also necessary to update the configurations of the
application servers/client applications to reference the DR servers as opposed to the live servers.
16 Utilising the Cloud for Disaster Recovery
File Level Replication
File level replication was used to replicate data from the 4 on-premises file servers to the single cloud
based file server using the built in “robocopy” command and its mirroring/synchronization option. The
synchronization was scheduled to run overnight as a one working day recovery window was deemed
adequate for file services.
As Microsoft DFS is used in all links and paths that reference the file shares on the file servers failover
involves disabling the referral to the on-premises servers and enabling the referral to the cloud servers.
Virtual Machine Replication – Database Servers
A small number of simple systems, some quite critical, have all their components installed on a single
virtual machine. These applications either do not have a high workload, or do not have scalable
architectures. Systems falling within this category include the payroll system, library management
system, active directory certificate services and an Oracle Express server used for teaching purposes. For
these systems virtual machine level replication was selected with a nightly replication interval.
Failover requires the virtual machines be brought on-line, they will automatically register their new IP
address with DNS.
Virtual Machine Replication – Application Servers
The remaining systems all fulfilled application front end/middle tier roles, therefore virtual machine
replication was selected as the replication strategy. As updates and changes to the live servers are
carried out via a controlled change management process a weekly virtual machine refresh was deemed
sufficient.
Failover requires the virtual machines be brought on-line, they will automatically register their new IP
address with DNS. In some cases it is also necessary to update the database server references to refer to
the DR database servers.
Cloud Provider Selection
Once the workloads, replication and failover strategies had been decided upon a review of the services
offered by various cloud service providers was undertaken.
As it was identified that 60% of the virtual machines required for the DR solution would only need to be
powered up for testing and patching for a couple of hours each month providers with an hourly pricing
model were favoured.
Compatibility with existing systems was also a factor in provider selection. The virtualisation
infrastructure at STC is based on Microsoft Hyper-V (Windows 2008 R2) managed by Microsoft System
Centre Virtual Machine Manager (MSCVMM) 2012. Therefore solutions that offered management
integration with MSCVMM and virtual machine migration from Hyper-V were favoured.
Consideration of the above factors, plus pricing, resulted in the selection of the Microsoft Windows
Azure platform, Microsoft were able to offer favourable educational pricing. However as we were the
first UK institution to sign up to Azure via an education agreement we discovered Microsoft’s signup
procedure were not fully developed which resulted in delays of many months. It should be noted that
we have been assured by Microsoft that these procedures are now fully developed and have been used
successfully by other institutions.
Utilising the Cloud for Disaster Recovery 17
Implementation Process
Implementation of the solution was approached via the following sequence:
1.	 Establish VPN connectivity
	STC uses a pair of Smoothwall UTM-3000 appliances to provide internet content filtering and
firewall services. The Smoothwall UTM-3000 supports IPSec site-to-site VPN’s as does Windows
Azure. Establishing a site-to-site VPN between the two systems was relatively straight forward.
2.	 Build  commission Domain Controller in Azure
	The first server created in Azure was a Domain Controller to provide Active Directory and DNS
services to our other servers. This was accomplished through installation of a Windows 2008 R2 on
a new virtual machine which we then promoted this server to a Domain Controller and installed the
DNS server role.
3.	 Build database, mailbox and file servers
	Servers were built to host these roles and the appropriate application software installed (i.e.
Microsoft Exchange, Microsoft SQL Server etc…)
4.	 Establish Replication
	 Application level replication was established for:
	 •	Exchange – DR server was added to Exchange Database Availability Group and existing mailbox
database with the DAG had a new replication targets added.
	 •	SQL Server – database log shipping was selected as the most appropriate replication method and
using the wizards built into SQL Server Management Studio new log shipping partnerships were
created.
	•	File Servers – initial replication of file data was accomplished via the “robocopy” command line
tool, subsequent replication runs made use of the “/mir” switch to synchronize the data on the
replica servers
5.	 Establish virtual machine replication
	Virtual machine replication was initially achieved through the copying backups of the VHD files of
live virtual machines to Azure using the “csupload” command line tool. However work is on-going
to use System Centre App Controller and System Centre Orchestrator to accomplish these tasks in
future.
18 Utilising the Cloud for Disaster Recovery
Future Developments
Partially as a result of our experiences with this project it is the intention of STC to make significantly
more use cloud computing services. In some cases we have identified that increased adoption of cloud
services may in fact increase costs but offers us significantly better functionality.
•	 Office 365
	A project is underway to migrate all staff  student e-mail content, 500GB of SharePoint content, and
the contents of staff  student “My Documents” folders (approximately 1TB of files) to Office 365.
•	 Hyper-V Replica
	Windows Server 2012 introduced the ability to have active/passive replicas of individual virtual
machines. An Azure implementation of this technology is in development which allows Azure to
participate as one side of this partnership. Once available this solution will be used to accomplish
VM replication to Azure.
•	 Server Migration
	Work carried out to date has proven that it is feasible and practical for us to host servers in Windows
Azure. Over the next 3 years an increasing proportion of our server infrastructure will be moved
from on-premises hardware to Azure. The migration to Office 365 is the first step of this process as it
eliminates the need for on-premises e-mail, file storage and SharePoint servers.
Utilising the Cloud for Disaster Recovery 19
Provider OS Virtual Machines Storage Bandwidth VPN Requirements
Est Cost
Per Month
Annual
CostSmall Medium Large Space IOPS In Out SmallVM's MediumVM's LargeVM's Storage Bandwidth VPN
CPU RAM HDD Price/
hr
CPU RAM HDD Price/
hr
CPU RAM HDD Price/
hr
Price/
GB per
month
Price/
million
per
month
Price/
GB per
month
Price/
GB per
month
Price
Per
Hour
No. Hours
Per
VM Per
Month
No. Hours
Per
VM Per
Month
No. Hours
Per
VM Per
Month
GB per
month
IOPS per
month
In GB
per
month
Out
GB per
month
Hours
Per
Month
Appendix 1
20 Utilising the Cloud for Disaster Recovery
Appendix 2
Disaster Scope Impact Assessment
Controls
Residual Risk
College Campus Building Service Downtime Liklihood Impact Score Downtime Liklihood Impact Score
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
Utilising the Cloud for Disaster Recovery 21
Appendix 3
Server System Workload Scope Why Not
in Scope?
Replication
Strategy
Recovery
Window
State Failover Size CPU RAM Storage Operating
System
22 Utilising the Cloud for Disaster Recovery
System Downtime Trigger Failover Authorisation Sequence Role Action
Student Records 30 minutes IT Manager 1 Database Active standby mirror
2 Application
Update HKLMSoftwareAdatum
StudentRecordsDatabaseServer
3 Clients Advise users to reboot
Finance System 4 hours IT manager 1 Database Active standby mirror
Appendix 4
Disaster recovery toolkit final version
Association of Colleges
2-5 Stedham Place
London
WC1A 1HU
Telephone: 020 7034 9900
Facsimile: 020 7034 9950
Email: sharedservices@aoc.co.uk
Or visit our web site
www.aoc.co.uk

Contenu connexe

Tendances

A Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingA Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingScientific Review SR
 
White paper whitewater-datastorageinthecloud
White paper whitewater-datastorageinthecloudWhite paper whitewater-datastorageinthecloud
White paper whitewater-datastorageinthecloudAccenture
 
Changing Landscape of Data Centers
Changing Landscape of Data CentersChanging Landscape of Data Centers
Changing Landscape of Data CentersSuhas Kelkar
 
Developing a cloud strategy - Presentation Nexon ABC Event
Developing a cloud strategy - Presentation Nexon ABC EventDeveloping a cloud strategy - Presentation Nexon ABC Event
Developing a cloud strategy - Presentation Nexon ABC EventNexon Asia Pacific
 
Cloud computing-overview
Cloud computing-overviewCloud computing-overview
Cloud computing-overviewsri_kanth0526
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computingAsmaa Ibrahim
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IBM’s Offering for a Smart, Private Cloud Sits on a Strong Foundation
IBM’s Offering for a Smart, Private Cloud  Sits on a Strong FoundationIBM’s Offering for a Smart, Private Cloud  Sits on a Strong Foundation
IBM’s Offering for a Smart, Private Cloud Sits on a Strong FoundationIBM India Smarter Computing
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...hrmalik20
 
Aruba Rightsizing Your Network
Aruba Rightsizing Your NetworkAruba Rightsizing Your Network
Aruba Rightsizing Your Networkhypknight
 
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...IJTET Journal
 
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...IJERA Editor
 
Cloud computing challenges with emphasis on amazon ec2 and windows azure
Cloud computing challenges with emphasis on amazon ec2 and windows azureCloud computing challenges with emphasis on amazon ec2 and windows azure
Cloud computing challenges with emphasis on amazon ec2 and windows azureIJCNCJournal
 

Tendances (17)

A Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingA Short Appraisal on Cloud Computing
A Short Appraisal on Cloud Computing
 
OUTBOARD SERVERS
OUTBOARD SERVERSOUTBOARD SERVERS
OUTBOARD SERVERS
 
White paper whitewater-datastorageinthecloud
White paper whitewater-datastorageinthecloudWhite paper whitewater-datastorageinthecloud
White paper whitewater-datastorageinthecloud
 
Changing Landscape of Data Centers
Changing Landscape of Data CentersChanging Landscape of Data Centers
Changing Landscape of Data Centers
 
Developing a cloud strategy - Presentation Nexon ABC Event
Developing a cloud strategy - Presentation Nexon ABC EventDeveloping a cloud strategy - Presentation Nexon ABC Event
Developing a cloud strategy - Presentation Nexon ABC Event
 
A Review on Data Protection of Cloud Computing Security, Benefits, Risks and ...
A Review on Data Protection of Cloud Computing Security, Benefits, Risks and ...A Review on Data Protection of Cloud Computing Security, Benefits, Risks and ...
A Review on Data Protection of Cloud Computing Security, Benefits, Risks and ...
 
Cloud computing-overview
Cloud computing-overviewCloud computing-overview
Cloud computing-overview
 
Tombolo
TomboloTombolo
Tombolo
 
Fundamental cloud computing
Fundamental cloud computingFundamental cloud computing
Fundamental cloud computing
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
IBM’s Offering for a Smart, Private Cloud Sits on a Strong Foundation
IBM’s Offering for a Smart, Private Cloud  Sits on a Strong FoundationIBM’s Offering for a Smart, Private Cloud  Sits on a Strong Foundation
IBM’s Offering for a Smart, Private Cloud Sits on a Strong Foundation
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...
 
Aruba Rightsizing Your Network
Aruba Rightsizing Your NetworkAruba Rightsizing Your Network
Aruba Rightsizing Your Network
 
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
 
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
 
Cloud computing challenges with emphasis on amazon ec2 and windows azure
Cloud computing challenges with emphasis on amazon ec2 and windows azureCloud computing challenges with emphasis on amazon ec2 and windows azure
Cloud computing challenges with emphasis on amazon ec2 and windows azure
 
E0332427
E0332427E0332427
E0332427
 

En vedette (9)

Direcciones ip
Direcciones ipDirecciones ip
Direcciones ip
 
Parallelism в .net 4 и vs2010
Parallelism в .net 4 и vs2010Parallelism в .net 4 и vs2010
Parallelism в .net 4 и vs2010
 
Cómo se realiza y presenta un trabajo en 1º - cazurrineslancia
Cómo se realiza y presenta un trabajo en 1º - cazurrineslanciaCómo se realiza y presenta un trabajo en 1º - cazurrineslancia
Cómo se realiza y presenta un trabajo en 1º - cazurrineslancia
 
Iceland's volcano
Iceland's volcanoIceland's volcano
Iceland's volcano
 
C:\fakepath\lex
C:\fakepath\lexC:\fakepath\lex
C:\fakepath\lex
 
Moving finance to the cloud
Moving finance to the cloudMoving finance to the cloud
Moving finance to the cloud
 
Learner focused app cloud final version
Learner focused app cloud final versionLearner focused app cloud final version
Learner focused app cloud final version
 
Adult funding book
Adult funding bookAdult funding book
Adult funding book
 
Exam procurement update 2012
Exam procurement update 2012 Exam procurement update 2012
Exam procurement update 2012
 

Similaire à Disaster recovery toolkit final version

Therefore, if the company has to add a PDU, they now have to p
Therefore, if the company has to add a PDU, they now have to pTherefore, if the company has to add a PDU, they now have to p
Therefore, if the company has to add a PDU, they now have to pGrazynaBroyles24
 
Traditioanal vs-cloud based Data Centers
Traditioanal vs-cloud based Data CentersTraditioanal vs-cloud based Data Centers
Traditioanal vs-cloud based Data CentersShreya Srivastava
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private CloudsDataCore Software
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startupsSekhar Mohanty
 
Cloud Computing
 Cloud Computing Cloud Computing
Cloud ComputingAbdul Aslam
 
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdf
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdfHow Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdf
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdfRinkuSoni8
 
Cloud computing
Cloud computingCloud computing
Cloud computingsfu-kras
 
Cloud computing
Cloud computingCloud computing
Cloud computingleninlal
 
What to consider while selecting public cloud service
What to consider while selecting public cloud serviceWhat to consider while selecting public cloud service
What to consider while selecting public cloud serviceNetmagic Solutions Pvt. Ltd.
 
What to consider while selecting public cloud service
What to consider while selecting public cloud serviceWhat to consider while selecting public cloud service
What to consider while selecting public cloud serviceNetmagic Solutions Pvt. Ltd.
 
Best cloud computing training institute in noida
Best cloud computing training institute in noidaBest cloud computing training institute in noida
Best cloud computing training institute in noidataramandal
 
Cloud computing.pptx
Cloud computing.pptxCloud computing.pptx
Cloud computing.pptxRodolfoIII2
 
Cloud presentation for marketing purpose
Cloud presentation for marketing purposeCloud presentation for marketing purpose
Cloud presentation for marketing purposeAsif Anik
 
Cloud presentation for marketing purpose
Cloud presentation for marketing purposeCloud presentation for marketing purpose
Cloud presentation for marketing purposeAsif Anik
 
Analysis of Cloud Services
Analysis of Cloud ServicesAnalysis of Cloud Services
Analysis of Cloud ServicesIRJET Journal
 
Issues in cloud computing
Issues in cloud computingIssues in cloud computing
Issues in cloud computingronak patel
 

Similaire à Disaster recovery toolkit final version (20)

Therefore, if the company has to add a PDU, they now have to p
Therefore, if the company has to add a PDU, they now have to pTherefore, if the company has to add a PDU, they now have to p
Therefore, if the company has to add a PDU, they now have to p
 
Cloud computings
Cloud computingsCloud computings
Cloud computings
 
Traditioanal vs-cloud based Data Centers
Traditioanal vs-cloud based Data CentersTraditioanal vs-cloud based Data Centers
Traditioanal vs-cloud based Data Centers
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private Clouds
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
Cloud Computing
 Cloud Computing Cloud Computing
Cloud Computing
 
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdf
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdfHow Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdf
How Denver Data Center Colocation Fulfills Hybrid Cloud Strategy.pdf
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
What to consider while selecting public cloud service
What to consider while selecting public cloud serviceWhat to consider while selecting public cloud service
What to consider while selecting public cloud service
 
What to consider while selecting public cloud service
What to consider while selecting public cloud serviceWhat to consider while selecting public cloud service
What to consider while selecting public cloud service
 
Best cloud computing training institute in noida
Best cloud computing training institute in noidaBest cloud computing training institute in noida
Best cloud computing training institute in noida
 
Cloud computing.pptx
Cloud computing.pptxCloud computing.pptx
Cloud computing.pptx
 
Cloud presentation for marketing purpose
Cloud presentation for marketing purposeCloud presentation for marketing purpose
Cloud presentation for marketing purpose
 
Cloud presentation for marketing purpose
Cloud presentation for marketing purposeCloud presentation for marketing purpose
Cloud presentation for marketing purpose
 
Analysis of Cloud Services
Analysis of Cloud ServicesAnalysis of Cloud Services
Analysis of Cloud Services
 
Cloud
CloudCloud
Cloud
 
Issues in cloud computing
Issues in cloud computingIssues in cloud computing
Issues in cloud computing
 

Plus de Association of Colleges

AoC Beacon Awards 2014-15 - How to Apply
AoC Beacon Awards 2014-15 - How to ApplyAoC Beacon Awards 2014-15 - How to Apply
AoC Beacon Awards 2014-15 - How to ApplyAssociation of Colleges
 
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FE
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FEAoC Beacon Awards 2014-15 - OCR Award for Innovation in FE
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FEAssociation of Colleges
 
AoC Beacon Awards 2014-15 - NAMSS Award for Student Support
AoC Beacon Awards 2014-15 - NAMSS Award for Student SupportAoC Beacon Awards 2014-15 - NAMSS Award for Student Support
AoC Beacon Awards 2014-15 - NAMSS Award for Student SupportAssociation of Colleges
 
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...Association of Colleges
 
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...Association of Colleges
 
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...Association of Colleges
 
AoC Beacon Awards 2014-15 - How to Ppply
AoC Beacon Awards 2014-15 - How to PpplyAoC Beacon Awards 2014-15 - How to Ppply
AoC Beacon Awards 2014-15 - How to PpplyAssociation of Colleges
 
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FE
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FEAoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FE
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FEAssociation of Colleges
 
AoC Beacon Awards 2014-15 - Guidance on How to Apply
AoC Beacon Awards 2014-15 - Guidance on How to ApplyAoC Beacon Awards 2014-15 - Guidance on How to Apply
AoC Beacon Awards 2014-15 - Guidance on How to ApplyAssociation of Colleges
 
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...Association of Colleges
 
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...Association of Colleges
 
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...Association of Colleges
 
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...Association of Colleges
 
AoC Beacon Awards 2014-15 - Application Form
AoC Beacon Awards 2014-15 - Application FormAoC Beacon Awards 2014-15 - Application Form
AoC Beacon Awards 2014-15 - Application FormAssociation of Colleges
 

Plus de Association of Colleges (20)

Aoc prospectus online_2014-15_vict
Aoc prospectus online_2014-15_victAoc prospectus online_2014-15_vict
Aoc prospectus online_2014-15_vict
 
AoC Prospectus 2014-15
AoC Prospectus 2014-15AoC Prospectus 2014-15
AoC Prospectus 2014-15
 
AoC Beacon Awards 2014-15 - How to Apply
AoC Beacon Awards 2014-15 - How to ApplyAoC Beacon Awards 2014-15 - How to Apply
AoC Beacon Awards 2014-15 - How to Apply
 
AoC Beacon Awards 2014-15 - Prospectus
AoC Beacon Awards 2014-15 - ProspectusAoC Beacon Awards 2014-15 - Prospectus
AoC Beacon Awards 2014-15 - Prospectus
 
AoC Beacon Awards 2014-15 - Programme
AoC Beacon Awards 2014-15 - ProgrammeAoC Beacon Awards 2014-15 - Programme
AoC Beacon Awards 2014-15 - Programme
 
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FE
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FEAoC Beacon Awards 2014-15 - OCR Award for Innovation in FE
AoC Beacon Awards 2014-15 - OCR Award for Innovation in FE
 
AoC Beacon Awards 2014-15 - NAMSS Award for Student Support
AoC Beacon Awards 2014-15 - NAMSS Award for Student SupportAoC Beacon Awards 2014-15 - NAMSS Award for Student Support
AoC Beacon Awards 2014-15 - NAMSS Award for Student Support
 
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...
AoC Beacon Awards 2014-15 - Microlink, AoC Charitable Trust and Achievement f...
 
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...
AoC Beacon Awards 2014-15 - Learning Consortium Award for Improvement in Teac...
 
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...
AoC Beacon Awards 2014-15 prospectus - JLT Employee Benefits Award for Health...
 
AoC Beacon Awards 2014-15 - How to Ppply
AoC Beacon Awards 2014-15 - How to PpplyAoC Beacon Awards 2014-15 - How to Ppply
AoC Beacon Awards 2014-15 - How to Ppply
 
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FE
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FEAoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FE
AoC Beacon Awards 2014-15 - Jisc Award for the Effective Use of Technology in FE
 
AoC Beacon Awards 2014-15 - Guidance on How to Apply
AoC Beacon Awards 2014-15 - Guidance on How to ApplyAoC Beacon Awards 2014-15 - Guidance on How to Apply
AoC Beacon Awards 2014-15 - Guidance on How to Apply
 
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...
AoC Beacon Awards 2014-15 - Education and Training Foundation Award for Trans...
 
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...
AoC Beacon Awards 2014-15 - edge award for practical teaching and practical l...
 
AoC Beacon Awards 2014-15 - Calendar
AoC Beacon Awards 2014-15 - CalendarAoC Beacon Awards 2014-15 - Calendar
AoC Beacon Awards 2014-15 - Calendar
 
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...
AoC Beacon Awards 2014-15 - Association of Colleges Award for College Engagem...
 
AoC Beacon Awards 2014-15 - Assessment
AoC Beacon Awards 2014-15 - AssessmentAoC Beacon Awards 2014-15 - Assessment
AoC Beacon Awards 2014-15 - Assessment
 
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...
AoC Beacon Awards 2014-15 - AQA Award for Continued Engagement in Education a...
 
AoC Beacon Awards 2014-15 - Application Form
AoC Beacon Awards 2014-15 - Application FormAoC Beacon Awards 2014-15 - Application Form
AoC Beacon Awards 2014-15 - Application Form
 

Dernier

Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 

Dernier (20)

Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 

Disaster recovery toolkit final version

  • 1. Utilising the Cloud for Disaster Recovery Craig Scott – Head of ICT Services South Tyneside College Supported by AoC
  • 3. Utilising the Cloud for Disaster Recovery 3 Introduction The Disaster Recovery is something that IT Managers spend a considerable amount of time planning and preparing for with the hope they will never have to implement those plans. Over the years users have come to expect IT to be “always on” and available 24/7 to allow them to study or carry out the duties associated with their job role. These availability and reliability expectations also impacts on disaster recovery provision, it is no longer sufficient to rely on restoration from backup instead redundant hardware and facilities are required. This paper discusses factors that must be considered when planning for disaster recovery and identifies how cloud services can be used as a disaster recovery solution. Determining Project Scope Disaster Recovery – what is it? The most important starting point for the project is to define what you mean by “Disaster Recovery”. To you and your team is a disaster the failure of a single server? A fire in your data centre? A power outage to your entire site or all of the above? Until you know what you’re trying to protect yourself from its difficult to ensure that you have adequate process and procedures in place. A risk based approach can help you to identify potential disasters, the impact they will have on your services and likelihood of their occurrence. Disaster Recovery vs. High Availability High Availability (HA) is typically used to describe systems which are connected by high speed low latency links and often have shared components. Many vendors provide failover clustering technologies that provide high availability solutions, such as Microsoft Windows Failover Clustering, Oracle Real Application Clusters, etc… HA solutions are designed to minimise the downtime of business critical services and can protect against hardware failure of specific components. HA clusters generally offer automated failover with minimal data loss. Typically the constituent parts of a failover cluster are located in the same data centre, or are all located on the same LAN (i.e. multiple datacentres within the same building/campus). High Availability
  • 4. 4 Utilising the Cloud for Disaster Recovery As a general rule Disaster Recovery refers to the provision of offsite facilities that are geographically separate from the primary facilities. A consequence of the geographic separation is the introduction of higher latency links. The high levels of latency, and potential unreliability of these links makes them unsuitable for use by many clustering technologies. The lines between HA and DR do become blurred by some newer technologies which can be used to provide the levels of failover and reliability typically associated with HA over WAN links. Microsoft Exchange Database Availability Groups being a typical example. Defence in Depth HA and DR are not mutual exclusive options and can be combined to further reduce the risk of service outage. Disaster Recovery
  • 5. Utilising the Cloud for Disaster Recovery 5 Objectives The success of any project is dependent upon clearly defined and understood objectives, without which it is impossible to measure the success or effectiveness of the project. The exact objectives will vary from project to project but at a minimum you should consider: Physical Separation Based on your risk assessment of the potential disasters what is the minimum level of physical separation you require between your live and DR systems? Options to consider include: • Different building • Different campus • Different town/city • Different area of the country • Different country • Different continent Acceptable Downtime The initial reaction from many IT managers and business managers is that no downtime is acceptable. However, if the building containing your primary data centre and finance department burns to the ground it will take time for the finance team to be relocated to different premises, it will take time to find computers for them to use etc… therefore how quickly do you really need to restore access to your finance system? Acceptable Data Loss Window Whilst zero data loss is certainly desirable as the level of synchronicity between live and DR systems increases so do the costs, either in terms of the technology required or bandwidth utilised to maintain synchronicity. Databases which handle real time transactions, such as on-line or face-to-face enrolments, normally require a small data loss window, ideally the window should be no more than a handful of transactions. If you lose a day of transactions can you recreate that data? Does the person who enrolled via your website know you have lost their data? Do you even know who they are? For other systems a high window may be more acceptable, what would be the impact of losing the last 3-4 hours of data from your file servers? Is this any different from someone forgetting to press save and losing a file?
  • 6. 6 Utilising the Cloud for Disaster Recovery Capacity/Performance What sort of capacity and performance is acceptable for your DR services? Thought needs to be given as to whether your DR services need to give your users the same level of performance as your live systems. Your DR system may introduce new bottlenecks to the mix such as available WAN/internet bandwidth between DR facilities and users. The amount of expansion capacity and historical capacity also needs to be considered. Acceptable Restoration Time If you have had to activate your DR services at some point you’ll want to switch back to your live services. How will you do this? Will the failback result in any downtime? The answers to many of the questions you will need to ask yourself will vary from system to system. The Cloud Options Maintaining DR facilities can be expensive, both in terms of investment in hardware, hardware which you hope you will never need to use, and time to maintain and administer the DR hardware. Use of the cloud to host your DR facilities can eliminate or reduce a number of these costs. Most major Cloud providers have globally dispersed redundant data centres which that will generally be hundreds of miles away from your facilities. • Infrastructure as a Service (IaaS) Selection of an IaaS option will remove the need to invest in hardware and construct a secondary server room/data centre. An IaaS DR solution involves renting sufficient computing resources from a cloud provider to allow you to create a “virtual data centre” in the cloud. You are then responsible for creating and maintaining the virtual machines which provide your DR facilities. • Platform as a Service (PaaS) With PaaS the cloud provider is responsible for the hardware, operating systems and services. This removes the need for you to maintain and patch virtual machines. An example of PaaS is the Microsoft Azure SQL Database service, Microsoft are responsible for the hardware, operating systems and SQL Server installation, you only need be concerned about your database. In some cases you may be forced down an IaaS route due to the need to install 3rd party software on a server, in other cases PaaS may be appropriate. For example, you may need to use IaaS for your finance system DR as you need to install a 3rd party finance server product but you can use PaaS to provide DR for your website. Alternatives to Disaster Recovery - Software as a Service (SaaS) When looking at the services for which you need to provide DR facilities it is worth asking the question of whether there is a better way to deliver those services. By moving services such as e-mail from traditional on-premises hosted solutions to cloud hosting you obviate the need to invest time and money in providing DR facilities for those services, the availability and accessibility of those services becomes the cloud providers concern.
  • 7. Utilising the Cloud for Disaster Recovery 7 Selecting a Cloud Provider Platform The cloud is a growth area within the IT sector that is rapidly expanding, both in terms of services offered and companies providing those services. Some providers have invested in the development of proprietary platforms, such as Amazon E2C or Windows Azure, whilst other providers have developed services based on “off the shelf” products, such as VMWare. Compatibility Compatibility between your cloud provider’s platform and your on-premises virtualisation platform can affect the options available for your data replication strategy. If the two platforms are compatible or can be managed by the same virtualisation management platform, such as Microsoft System Centre Virtual Machine Manager, you may be able to move, or replicate, data and virtual machines between your on- premises solution and your cloud solution. Compliance The requirements of the Data Protection Act (1998) are often cited as being a barrier to the use of the cloud, in particular the need to obtain subject consent prior to transferring data outside of the EU. You should not assume that because a cloud provider is based in the UK, or Europe, that your data will be stored within the EU. Most major cloud providers have data centres located within the EU and some allow you to select the “region” or even individual data centre that will be used to store your data. Security Physical Reputable cloud service providers should be able to provide information on the levels of security accreditation to which their services and data centres comply. Many providers will be delivering services to customers in the financial, health care, defence sectors as well as local and national governments and as such will already comply with extremely stringent security requirements. Connectivity For your data to reach the data centres of your chosen cloud provider it will probably need to travel across the public internet. It is important to ensure that the data is protected in transit. Most SaaS and PaaS solutions have been developed from the ground up as internet services and will make use of SSL HTTPS to provide secure connectivity. For example HTTPS to connect to a web based SaaS e-mail solution or SFTP to transfer files to a PaaS hosted website.
  • 8. 8 Utilising the Cloud for Disaster Recovery IaaS services typically require Virtual Private Networks (VPN) to connect the hosted virtual machines to your on-premises LAN. Site-to-site VPN’s require a device at both sites to “terminate” the connection, therefore it is important to confirm that you have a suitable end point device capable of handling your end of the connection and that the device will work with your cloud providers VPN implementation. Pricing Model Contract Offerings Is it necessary for all of your DR assets to be operational 24x7? or do you simply need them ready and waiting to be fired up? Most cloud providers pricing is based on the size, allocated storage and hours of usage of a virtual machine. Applications which are built around an n-tier model will have application servers that host websites or application software. You may only need to fire up the virtual machines hosting these application server roles for a few hours a month for testing and patching. Does your cloud providers pricing structure reflect this usage model? Understanding Risk An analysis of the roles and workloads of your systems will help you to identify the level of risk that the loss of a system poses and therefore the level of DR protection and effort that it warrants. Systems are often comprised of multiple servers each fulfilling distinct roles. The impact of loss, and ease of restoration, will vary depending upon the role of the server.
  • 9. Utilising the Cloud for Disaster Recovery 9 Suggested role are listed below: Data Replication Strategy Obviously it is necessary for the data in each of your DR systems to be updated regularly and to be no older then the acceptable data loss window you have identify for that system. It is important to select a replication method that is appropriate for the level of risk and acceptable data loss window. Approaches Application Replication Many enterprise class applications incorporate their own replication technologies, for example, Microsoft Exchange Database Availability Groups, Oracle Data Guard, MySQL master/slave replication etc… Where application replication technologies are available they should be considered as the preferred option as they are designed to replicate data in a manner that makes sense to the application. File System Level In some cases simply copying files from the live systems to the DR systems will suffice to replicate the data. Tools such as “robocopy” and “rsync” are able to intelligently determine what differences exist between source and destination locations and only copy new or changed files to the DR location as well as removing redundant files from the DR site. Services such as the “Distributed File System” (DFS) built into Windows server can be used to automate and manage file replication. It is important to check that a file system copy is appropriate for the type of data being replicated. Using file system replication to copy the data files of your SQL Server whilst it is running could result in data corruption. Role Description Data Change Frequency Ease of Recreating Data Acceptable Data Loss Examples Data Storage Servers holding non- transactional data High Moderate Moderate 4 hours File servers, mailbox servers etc… where users can recreate documents Database Databases servers High Low Low 30 minutes SQL Server, Oracle, MySQL etc… especially on-line system where may not be possible to recreate data (i.e. e-registers, on-line enrolment) Application Servers which do not store volatile data Low High High Web servers, middle-tier servers etc… static content updated infrequently (i.e. software upgrades, website redesign etc…)
  • 10. 10 Utilising the Cloud for Disaster Recovery Virtual Machine Replication Replication or cloning of entire virtual machines is also a strategy that should be considered. This is especially useful for cases where all the components of a single system are located on a distinct virtual machine. This approach should also be consider for application/middle-tier servers where significant time and effort has been expended customising or configuring the middle-tier components. Best Approach Complex systems often consist of multiple servers each of which has a distinct role within that system. Consider a student records system, this will probably consist of a database server, two identical application servers and a client application. Your database will be experiencing constant changes and you need to ensure that in the event of a disaster you don’t lose any records, on the other hand the software on the application servers is updated via a controlled process every 6 months when the software vendor releases an update. In this scenario it would be appropriate to make use of the database systems inbuilt replication technology to protect your database and to use virtual machine replication to replicate one of the application servers, you might only replicate the virtual machine once a month as it has a low degree of data volatility. Software Licences Typically when you create a virtual machine in the cloud the machine will be based on a template which has a cost associated with it, usually charged hourly, weekly or monthly. In most cases these prices include the cost of the licence for the operating system used by the template. The same usually applies to PaaS in that the charge for the period will include the licence costs for all the components of that service. For example, you don’t need to purchase licences for Microsoft SQL Server to use the Microsoft’s Azure SQL Database platform. Pro’s Con’s Data Granularity Recommended For Application • Application aware • Transaction rollback • Corruption detection • Automatic failover • Can be complicated to setup • Requires two installations of application software • May require additional licences • May introduce additional overhead on live systems Variable but appropriate for application (i.e. database transaction, Active Directory object, e-mail message etc…) • Databases • Mailboxes • LDAP (inc Active Directory) File System • Simple to set up • Excludes open files • Requires scripts and/ or additional software File level • File shares VM Replication • Replicates entire server • Can be complicated to setup • Lots of data to transfer • Servers may require reconfiguration once activated Virtual Machine (though some solutions allow block level) • Application servers • “1 server” systems
  • 11. Utilising the Cloud for Disaster Recovery 11 You do need to ensure that you are adequately licenced for any software you install on the virtual machines you create in the cloud. Consider a scenario where you create a virtual machine to host Microsoft Exchange Server because you want to use Exchange Database Availability Groups to provide application level replication for your e-mail system, in this scenario you probably wouldn’t need to purchase a licence for Windows Server (as this will be included in the cost you are paying for the virtual machine) but you will need to buy a licence for the copy of Microsoft Exchange you have installed on that server. Some software vendors incorporate provision in their education and volume licencing schemes that allows you to install additional copies of their software for disaster recovery purposes. Obviously you don’t want to spend money on licences you don’t need. Try checking the software vendor’s website for licencing FAQ’s, contacting the retailers who you purchased the software from or contacting the vendors directly if you are ensure about what you are or aren’t allowed to do with your existing licences. Considering Failover If you have to activate your DR facilities how will your users and client devices know where to find the systems they need to connect to? Most modern networks make use of DNS to locate servers and services, in some cases you may be using IP addresses to locate services. It is probably that your DR facilities will be on a different IP subnet from your live systems, your clients need to be informed of this to allow them to connect to your DR facilities. Active Directory DNS Assuming that you are utilising Microsoft Active Directory (AD) the servers on your DR site will need access to the AD and associated DNS in order to operate. Therefore it is recommended that you maintain at least one operational Domain Controller in your DR facilities. This will also provide inherent DR for your AD and DNS infrastructures without any further work on your part. IP Address Allocation If you have chosen to replicate virtual machines to your DR site do these virtual machines have static IP addresses assigned? If so you will need to login to each VM as you bring it online and assign a new IP address. Consider whether you can use DHCP to assign IP addresses to your servers. Application Aware Failover If an application has some form of application level replication it may also have application level failover. Microsoft Exchange Database Availably Groups (DAG) are such an example, with DAG’s the Exchange client access servers automatically connect to the mailbox server which is hosting the active database. Distribute File System (DFS) Switching to an alternate file server normally involves finding all references to the UNC path of the failed file server and replacing them with references to the new file server. DFS allows the creation of a fault tolerant file share containing folders that refer to one or more real file shares. By configuring an active and inactive referral for each file share, one referencing your live system and the other your DR system, all you need do to failover is change the referrals appropriately.
  • 12. 12 Utilising the Cloud for Disaster Recovery DNS for Failover It is assumed that you have created a Domain Controller in your DR site that is also a DNS server, thus providing resilience for your DNS. Most of your clients will be using DNS to locate the servers and services to which they connect, in many cases switching to your DR facilities may involve no more than changing DNS entries so they point at the DR system. Consideration needs to be given to the TTL value of the DNS entries as these determine the length of time your clients will cache the returned DNS data. If your records have a TTL of an hour it could take that long before some of your clients can access your DR services. You should ensure that the TTL values for the critical DNS records are set to values that are consistent with your failover objectives. When planning for DR it is recommend to review the way your clients currently locate their servers, where possible try to avoid the use of IP addresses or server names and use DNS aliases (CNAME) records. For example, instead of using http://servername.college.ac.uk/ebs create a DNS CNAME for ebs-live.college.ac.uk which refers to servername.college.ac.uk that way if you have to switch to your DR system all you need do is update the CNAME record. Replicated Virtual Machines In most cases failover of replicated VM’s will be as simple as powering on the VM, checking it has an appropriate IP address and ensuring that DNS reflects the current IP address. Where the VM is a part of a multi-tier application and you have also failed over database tier components you may need to update the application with the new address of the database server. This process can be simplified through the use of DNS aliases and application specific redirects, for example, you might create an DNS alias for “studentrecords-live.college.ac.uk” which points at your live database server, you then use this address when installing/configuration application-tier components, in the event of failure all you need to do is change where the DNS alias points. Network Load Balancers Network load balancers (NLB) provide an option for failover of some services, good quality load balancers will be able to detect server and application failure automatically and redirect traffic. However, you also need to consider DR for your NLB, if you position an NLB on your live site which is configured to redirect traffic to your DR site what will you do if your NLB is out of action? Planning Once you’ve carried out your risk assessment you will have a better idea of the disasters that you may encounter and the how what the probability of each disaster is. As you have hopefully realised you are probably more likely to encounter situations where one, or a small number, of related systems have failed, probably as a result of hardware failure or software problem. The level of detail involved in your DR plan should reflect how critical the system is and how quickly it needs to be recovered. You may have generic processes that apply across multiple systems, for example, if you have multiple database servers with identical DR processes a single process is probably sufficient. Whilst it is possible to create detailed scripts and automated procedures that can be sued to activate DR facilities every disaster tends to be different and needs to be assessed individually. The process to fix a disaster of type A may in fact make a disaster of type B worse.
  • 13. Utilising the Cloud for Disaster Recovery 13 The best approach is to take a scenario based approach, start with the highest probability highest impact risks and work down to those with the lowest probability and impact. An important consideration in your planning is who has the authority to declare a “disaster” and invoke the DR plan? In some cases invoking the DR plan may result in more overall disruption then it would to leave a particular service offline for an hour while you fix it. Testing It is essential to test your DR processes regularly. The scope of testing needs to be considered on a system by system basis, also consider if you need to test every system? again if you have 20 servers with an identical process do you need to test them all regularly? For systems with transparent application level replication and failover testing should be straight forward and can be done regularly. In cases where a failover would be disruptive is simulating failover sufficient for the system in question? Example Implementation Background Until the summer of 2011 South Tyneside College (STC) operated across two major campus (Westoe Hebburn) and a third specialist campus (MSTC). STC’s primary data centre was located on the main campus (Westoe) with a smaller sever room at Hebburn, the MSTC has only a single server. Systems had been established for some time to replicate data and services between Westoe and Hebburn allowing either campus to act as DR site for the other.
  • 14. 14 Utilising the Cloud for Disaster Recovery For reasons of operational efficiency a decision was taken to close Hebburn. Due to the high cost of creating the necessary facilities and upgrading the data links it was not feasible to establish DR facilities for Westoe at the MSTC. A redundant server room in a separate building on the Westoe campus was refurbished for DR use. Challenge The primary data centre supports 46 physical servers and 69 virtual machines, a further 16 physical servers are located in the secondary server room providing support for DR. The hardware in the secondary server room had previously been the “live” hardware from Hebburn and was planned for replacement in summer 2013. Estimated costs for replacing this equipment were expected to be in the region of £50,000 - £60,000. Examination of the available options indicated that the use of the cloud for our DR facilities would result savings of around 10-15% and provide a truly offsite solution. The work involved would also allow us to gradually migrate a number of live services from on-premises to the cloud in future, producing further cost savings.
  • 15. Utilising the Cloud for Disaster Recovery 15 Planning Numbers Due to the levels of reliance and HA provided by the equipment in the primary data centre which meant that we only expect to need to activate the DR facilities in the event of a disaster which renders our main campus unusable (fire, floor, prolonged power outage etc..). Under these circumstances we anticipate that the major performance bottleneck will be the available bandwidth of the internet connection(s) used to connect to the virtual data centre. Based on this supposition the following criteria were applied to determine if a system or server was within scope of the project. • Where multiple load balanced application servers for the same service existed we would only provide one DR server • Where we had split large workloads across non-load balanced servers (i.e. file servers) we would consolidate these workloads on one DR server • Servers in DMZ would be excluded where these services duplicated LAN servers which are in scope • Servers which were used to support physical equipment which would likely be inaccessible during a disaster would be excluded from scope. This was based on the grounds that if our buildings are out of action so will be the equipment they contain therefore print servers, wi-fi controllers etc… would not be required. Analysis of the roles and workloads of our servers indicated that our disaster recovery strategy needed to support a minimum of 29 servers. Workloads Of the 29 systems within project scope we identified 9 database servers and 3 data store servers (file server, mailbox Active Directory). The remaining servers fit into the application server category. Replication Failover Strategies Based on the workloads of the systems in scope a combination of application level, file system level and virtual machine cloning was adopted. For a small number of cases it was recognised that the best option was to build a new application server in the cloud due to the comprehensive application level functionality provided by that system, for example, Microsoft Exchange Client Access Servers. Application Level Replication Application level replication was select for Active Directory (AD has inherent replication), Microsoft SQL Server, MySQL Server, Microsoft Exchange. All of these applications have built in multi-server replication mechanisms which allowed for recovery windows of less than 15 minutes. Failover procedures for these systems are either automatic/inherent (i.e. Active Directory Exchange), or requires a flag setting within the application to indicate the primary server (SQL Server MySQL). In the case of SQL Server and MySQL Server it is also necessary to update the configurations of the application servers/client applications to reference the DR servers as opposed to the live servers.
  • 16. 16 Utilising the Cloud for Disaster Recovery File Level Replication File level replication was used to replicate data from the 4 on-premises file servers to the single cloud based file server using the built in “robocopy” command and its mirroring/synchronization option. The synchronization was scheduled to run overnight as a one working day recovery window was deemed adequate for file services. As Microsoft DFS is used in all links and paths that reference the file shares on the file servers failover involves disabling the referral to the on-premises servers and enabling the referral to the cloud servers. Virtual Machine Replication – Database Servers A small number of simple systems, some quite critical, have all their components installed on a single virtual machine. These applications either do not have a high workload, or do not have scalable architectures. Systems falling within this category include the payroll system, library management system, active directory certificate services and an Oracle Express server used for teaching purposes. For these systems virtual machine level replication was selected with a nightly replication interval. Failover requires the virtual machines be brought on-line, they will automatically register their new IP address with DNS. Virtual Machine Replication – Application Servers The remaining systems all fulfilled application front end/middle tier roles, therefore virtual machine replication was selected as the replication strategy. As updates and changes to the live servers are carried out via a controlled change management process a weekly virtual machine refresh was deemed sufficient. Failover requires the virtual machines be brought on-line, they will automatically register their new IP address with DNS. In some cases it is also necessary to update the database server references to refer to the DR database servers. Cloud Provider Selection Once the workloads, replication and failover strategies had been decided upon a review of the services offered by various cloud service providers was undertaken. As it was identified that 60% of the virtual machines required for the DR solution would only need to be powered up for testing and patching for a couple of hours each month providers with an hourly pricing model were favoured. Compatibility with existing systems was also a factor in provider selection. The virtualisation infrastructure at STC is based on Microsoft Hyper-V (Windows 2008 R2) managed by Microsoft System Centre Virtual Machine Manager (MSCVMM) 2012. Therefore solutions that offered management integration with MSCVMM and virtual machine migration from Hyper-V were favoured. Consideration of the above factors, plus pricing, resulted in the selection of the Microsoft Windows Azure platform, Microsoft were able to offer favourable educational pricing. However as we were the first UK institution to sign up to Azure via an education agreement we discovered Microsoft’s signup procedure were not fully developed which resulted in delays of many months. It should be noted that we have been assured by Microsoft that these procedures are now fully developed and have been used successfully by other institutions.
  • 17. Utilising the Cloud for Disaster Recovery 17 Implementation Process Implementation of the solution was approached via the following sequence: 1. Establish VPN connectivity STC uses a pair of Smoothwall UTM-3000 appliances to provide internet content filtering and firewall services. The Smoothwall UTM-3000 supports IPSec site-to-site VPN’s as does Windows Azure. Establishing a site-to-site VPN between the two systems was relatively straight forward. 2. Build commission Domain Controller in Azure The first server created in Azure was a Domain Controller to provide Active Directory and DNS services to our other servers. This was accomplished through installation of a Windows 2008 R2 on a new virtual machine which we then promoted this server to a Domain Controller and installed the DNS server role. 3. Build database, mailbox and file servers Servers were built to host these roles and the appropriate application software installed (i.e. Microsoft Exchange, Microsoft SQL Server etc…) 4. Establish Replication Application level replication was established for: • Exchange – DR server was added to Exchange Database Availability Group and existing mailbox database with the DAG had a new replication targets added. • SQL Server – database log shipping was selected as the most appropriate replication method and using the wizards built into SQL Server Management Studio new log shipping partnerships were created. • File Servers – initial replication of file data was accomplished via the “robocopy” command line tool, subsequent replication runs made use of the “/mir” switch to synchronize the data on the replica servers 5. Establish virtual machine replication Virtual machine replication was initially achieved through the copying backups of the VHD files of live virtual machines to Azure using the “csupload” command line tool. However work is on-going to use System Centre App Controller and System Centre Orchestrator to accomplish these tasks in future.
  • 18. 18 Utilising the Cloud for Disaster Recovery Future Developments Partially as a result of our experiences with this project it is the intention of STC to make significantly more use cloud computing services. In some cases we have identified that increased adoption of cloud services may in fact increase costs but offers us significantly better functionality. • Office 365 A project is underway to migrate all staff student e-mail content, 500GB of SharePoint content, and the contents of staff student “My Documents” folders (approximately 1TB of files) to Office 365. • Hyper-V Replica Windows Server 2012 introduced the ability to have active/passive replicas of individual virtual machines. An Azure implementation of this technology is in development which allows Azure to participate as one side of this partnership. Once available this solution will be used to accomplish VM replication to Azure. • Server Migration Work carried out to date has proven that it is feasible and practical for us to host servers in Windows Azure. Over the next 3 years an increasing proportion of our server infrastructure will be moved from on-premises hardware to Azure. The migration to Office 365 is the first step of this process as it eliminates the need for on-premises e-mail, file storage and SharePoint servers.
  • 19. Utilising the Cloud for Disaster Recovery 19 Provider OS Virtual Machines Storage Bandwidth VPN Requirements Est Cost Per Month Annual CostSmall Medium Large Space IOPS In Out SmallVM's MediumVM's LargeVM's Storage Bandwidth VPN CPU RAM HDD Price/ hr CPU RAM HDD Price/ hr CPU RAM HDD Price/ hr Price/ GB per month Price/ million per month Price/ GB per month Price/ GB per month Price Per Hour No. Hours Per VM Per Month No. Hours Per VM Per Month No. Hours Per VM Per Month GB per month IOPS per month In GB per month Out GB per month Hours Per Month Appendix 1
  • 20. 20 Utilising the Cloud for Disaster Recovery Appendix 2 Disaster Scope Impact Assessment Controls Residual Risk College Campus Building Service Downtime Liklihood Impact Score Downtime Liklihood Impact Score 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 21. Utilising the Cloud for Disaster Recovery 21 Appendix 3 Server System Workload Scope Why Not in Scope? Replication Strategy Recovery Window State Failover Size CPU RAM Storage Operating System
  • 22. 22 Utilising the Cloud for Disaster Recovery System Downtime Trigger Failover Authorisation Sequence Role Action Student Records 30 minutes IT Manager 1 Database Active standby mirror 2 Application Update HKLMSoftwareAdatum StudentRecordsDatabaseServer 3 Clients Advise users to reboot Finance System 4 hours IT manager 1 Database Active standby mirror Appendix 4
  • 24. Association of Colleges 2-5 Stedham Place London WC1A 1HU Telephone: 020 7034 9900 Facsimile: 020 7034 9950 Email: sharedservices@aoc.co.uk Or visit our web site www.aoc.co.uk