This lesson looks at Backup and Recovery. Backup and Recovery are a major part of the planning for Business Continuity.
The lessons in this course will provide an overview of Backup and Recovery including the business and technical aspects.
This lesson provides an overview of the business drivers for backup and recovery and introduces some of the common terms used when developing a backup and recovery plan.
A Backup is a copy of the online data that resides on primary storage. The backup copy is created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the primary disk.
The backup copy is usually retained over a period of time, depending on the type of the data, and on the type of backup. There are three derivatives for backup: disaster recovery, Archival, and operational backup. We will review them in more detail, on the next slide.
The data that is backed up may be on such media as disk or tape, depending on the backup derivative the customer is targeting. For example, backing up to disk may be more efficient than tape in operational backup environments.
Several choices are available to get the data written to the backup media.
You can simply copy the data from the primary storage to the secondary storage (disk or tape), onsite. This is a simple strategy, easily implemented, but impacts the production server where the data is located, since it will use the server’s resources. This may be tolerated on some applications, but not high demand ones.
To avoid an impact on the production application, and to perform serverless backups, you can mirror (or snap) a production volume. For example, you can mount it on a separate server and then copy it to the backup media (disk or tape). This option will completely free up the production server, with the added infrastructure cost associated with additional resources.
Remote Backup, can be used to comply with offsite requirements. A copy from the primary storage is done directly to the backup media that is sitting on another site. The backup media can be a real library, a virtual library or even a remote filesystem.
You can do a copy to a first set of backup media, which will be kept onsite for operational restore requirements, and then duplicate it to another set of media for offsite purposes. To simplify thr procedure, you can replicate it to an offsite location to remove any manual procedures associated with moving the backup media to another site.
Disaster Recovery addresses the requirement to be able to restore all, or a large part of, an IT infrastructure in the event of a major disaster.
Archival is a common requirement used to preserve transaction records, email, and other business work products for regulatory compliance. The regulations could be internal, governmental, or perhaps derived from specific industry requirements.
Operational is typically the collection of data for the eventual purpose of restoring, at some point in the future, data that has become lost or corrupted.
Reasons for a backup plan include:
Physical damage to a storage element (such as a disk) that can result in data loss.
People make mistakes and unhappy employees or external hackers may breach security and maliciously destroy data.
Software failures can destroy or lose data and viruses can destroy data, impact data integrity, and halt key operations.
Physical security breaches can destroy equipment that contains data and applications.
Natural disasters and other events such as earthquakes, lightning strikes, floods, tornados, hurricanes, accidents, chemical spills, and power grid failures can cause not only the loss of data but also the loss of an entire computer facility. Offsite data storage is often justified to protect a business from these types of events.
Government regulations may require certain data to be kept for extended timeframes. Corporations may establish their own extended retention policies for intellectual property to protect them against litigation. The regulations and business requirements that drive data as an archive generally require data to be retained at an offsite location.
Backup products vary, but they do have some common characteristics. The basic architecture of a backup system is client-server, with a backup server and some number of backup clients or agents. The backup server directs the operations and owns the backup catalog (the information about the backup). The catalog contains the table-of-contents for the data set. It also contains information about the backup session itself.
The backup server depends on the backup client to gather the data to be backed up. The backup client can be local or it can reside on another system, presumably to backup the data visible to that system. A backup server receives backup metadata from backup clients to perform its activities.
There is another component called a storage node. The storage node is the entity responsible for writing the data set to the backup device. Typically there is a storage node packaged with the backup server and the backup device is attached directly to the backup server’s host platform. Storage nodes play an important role in backup planning as it can be used to consolidate backup servers.
The following represents a typical Backup process:
The Backup Server initiates the backup process (starts the backup application).
The Backup Server sends a request to a server to “send me your data”.
The server sends the data to the Backup Server and/or Storage Node.
The Storage Node sends the data to the tape storage device and the Backup Server begins building the catalog (metadata) of the backup session.
When all of the data has been transferred from the server to the Backup Server, the Backup Server writes the catalog to a disk file and closes the connection to the tape device.
Some important decisions that need consideration before implementing a Backup/Restore solution are shown above. Some examples include:
The Recovery Point Objective (RPO)
The Recovery Time Objective (RTO)
The media type to be used (disk or tape)
Where and when the restore operations will occur – especially if an alternative host will be used to receive the restore data.
When to perform backups.
The granularity of backups – Full, Incremental or cumulative.
How long to keep the backup – for example, some backups need to be retained for 4 years, others just for 1 month
Is it necessary to take copies of the backup or not
Location: Many organizations have dozens of heterogeneous platforms that support a complex application. Consider a data warehouse where data from many sources is fed into the warehouse. When this scenario is viewed as “The Data Warehouse Application”, it easily fits this model. Some of the issues are:
How the backups for subsets of the data are synchronized
How these applications are restored
Size: Backing up a large amount of data that consists of a few big files may have less system overhead than backing up a large number of small files. If a file system contains millions of small files, the very nature of searching the file system structures for changed files can take hours, since the entire file structure is searched.
Number: a file system containing one million files with a ten-percent daily change rate will potentially have to create 100,000 entries in the backup catalog. This brings up other issues such as:
How a massive file system search impacts the system
Search time/Media impact
Is there an impact on tape start/stop processing?
Many backup devices such as tape drives, have built-in hardware compression technologies. To effectively use these technologies, it is important to understand the characteristics of the data. Some data, such as application binaries, do not compress well. Text data can compress very well, while other data, such as JPEG and ZIP files, are already compressed.
As mentioned before, there are three types of backup models (Operational, Disaster Recovery, and Archive). Each can be defined by its retention period. Retention Periods are the length of time that a particular version of a dataset is available to be restored.
Retention periods are driven by the type of recovery the business is trying to achieve:
For operational restore, data sets could be maintained on a disk primary backup storage target for a period of time, where most restore requests are likely to be achieved, and then moved to a secondary backup storage target, such as tape, for long term offsite storage.
For disaster recovery, backups must be done and moved to an offsite location.
For archiving, requirements usually will be driven by the organization’s policy and regulatory conformance requirements. Tapes can be used for some applications, but for others a more robust and reliable solution, such as disks, may be more appropriate.
In this lesson we reviewed the business and data considerations when planning for Backup and Recovery including:
What is a Backup and Recovery?
What is the Backup and Recovery process?
Business recovery needs
RPO Recovery point objectives
RTO Recovery time objectives
Data characteristics
Files, compression, retention
We’ve discussed the importance and considerations for a Backup Plan, now this lesson provides an overview of the different methods for creating a backup set.
Backing up databases can occur useing two different methods:
A Hot backup, which means that the application is still up and running, with users accessing it, while backup is taking place.
A Cold backup, which means that the application will be shut down for the backup to take place.
Most backup applications offer various Backup Agents to do these kinds of operations. There will be different agents for different types of data and applications.
The granularity and levels for backups depend on business needs, and, to some extent, technological limitations. Some backup strategies define as many as ten levels of backup. IT organizations use a combination of these to fulfill their requirements. Most use some combination of Full, Cumulative, and Incremental backups.
A Full backup is a backup of all data on the target volumes, regardless of any changes made to the data itself.
An Incremental backup contains the changes since the last backup, of any type, whichever was most recent.
A Cumulative backup, also known as a Differential backup, is a type of incremental that contains changes made to a file since the last full backup.
Following is an example of an incremental backup and restore:
A full backup of the business data is taken on Monday evening. Each day after that, an incremental backup is taken. These incremental backups only backup files that are new or that have changed since the last full or incremental backup.
On Tuesday, a new file is added, File 4. No other files have changed. Since File 4 is a new file added after the previous backup on Monday evening, it will be backed up Tuesday evening.
On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File 3 was changed after the previous evening backup (Tuesday), it will be backed up Wednesday evening.
On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was added after the previous evening backup, it will be backed up Thursday evening.
On Friday morning, there is a data corruption, so the data must be restored from tape.
The first step is to restore the full backup from Monday evening. Then, every incremental backup that was done since the last full backup must be applied, which, in this example, means the:
Tuesday,
Wednesday, and
Thursday incremental backups.
The following is an example of cumulative backup and restore:
A full backup of the data is taken on Monday evening. Each day after that, a cumulative backup is taken. These cumulative backups backup ALL FILES that have changed since the LAST FULL BACKUP.
On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full backup, it will be backed up Tuesday evening.
On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added or changed since the last full backup, both files will be backed up Wednesday evening.
On Thursday, File 6 is added. Again, File 4, File 5, and File 6 are files that have been added or changed since the last full backup; all three files will be backed up Thursday evening.
On Friday morning, there is a corruption of the data, so the data must be restored from tape.
The first step is to restore the full backup from Monday evening.
Then, only the backup from Thursday evening is restored because it contains all the new/changed files from Tuesday, Wednesday, and Thursday.
This lesson provided an introduction to Backup methods and granularity levels.
So far, we have discussed the importance of the Backup plan and the different methods used when creating a backup set. This lesson provides an overview of the different topologies and media types that are used to support creating a backup set.
There are three basic topologies that are used in a backup environment: Direct Attached Based Backup, LAN Based Backup, and SAN Based Backup.
There is also a fourth topology, called “Mixed”, which is formed when mixing two or more of these topologies in a given situation.
Here, the backup data flows directly from the host to be backed up to the tape, without utilizing the LAN. In this model, there is no centralized management and it is difficult to grow the environment.
In this model, the backup data flows from the host to be backed up to the tape through the LAN. There is centralized management, but there may be an issue with the LAN utilization since all data goes through it.
A SAN based backup, also known as LAN Free backup, is achieved when there is no backup data movement over the LAN. In this case, all backup data travels through a SAN to the destination backup device.
This type of backup still requires network connectivity from the Storage Node to the Backup Server, since metadata always has to travel through the LAN.
A SAN/LAN Mixed Based Backup environment is achieved by using two or more of the topologies described in the previous slides. In this example, some servers are SAN based while others are LAN based.
Tape drive streaming is recommended from all vendors, in order to keep the drive busy. If you do not keep the drive busy during the backup process (writing), performance will suffer. Multiple streaming helps to improve performance drastically, but it generates one issue as well: the backup data becomes interleaved, and thus the recovery times are increased.
Backup to disk replaces tape and its associated devices, as the primary target for backup, with disk. Backup to disk systems offer major advantages over equivalent scale tape systems, in terms of capital costs, operating costs, support costs, and quality of service. It can be implemented fully on day 1 or over a phased approach.
This example shows a typical recovery scenario using tape and disk. As you can see, recovery with disk provides much faster recovery than does recovery with tape.
The diagram shows typical recovery scenarios using different technical solutions. As you can see recovery with a Local Replica or clones provides the quickest recovery method.
It is important to note that using clones on Disk, enables you to be able to make more copies of your data more often. This will improve RPO (the point from which they can recover). It will also improve RTO because the log files will be smaller and that will reduce the log playback time.
In a traditional approach for backup and archive, businesses take a backup of production. Typically backup jobs use weekly full backups and nightly incremental backups. Based on business requirements, they will then copy the backup jobs and eject the tapes to have them sent offsite, where they will be stored for a specified amount of time.
The problem with this approach is simple - as the production environment grows, so does the backup environment.
Backup/Recovery and Archiving support different business and goals. This slide compares and contrasts some of the differences that are significant.
The recovery process is much more important than the backup process. It is based on the appropriate recovery-point objectives (RPOs) and recovery-time objectives (RTOs). The process usually drives a decision to have a combination of technologies in place, from online local replicas, to backup to disk, to backup to tape for long-term, passive RPOs.
Archive processes are determined not only by the required retention times, but also by retrieval-time service levels and the availability requirements of the information in the archive.
For both processes, a combination of hardware and software is needed to deliver the appropriate service level. The best way to discover the appropriate service level is to classify the data and align the business applications with it.
This lesson provided an overview of the different topologies and media types that support creating a backup set.
We have discussed the planning and operations of creating a Backup. Now, this lesson provides an overview of Management activities and applications that help manage the Backup and Recovery process.
This lesson provided an overview of Backup and Recovery management activities and tools.
This lesson looked at Backup and Recovery. Backup and Recovery are a major part of the planning for Business Continuity.
NetWorker’s installed base of more than 20,000 customers worldwide is a testament to the product’s market leadership.
Data-growth rates are accelerating, and the spectrum of data and systems that live in environments runs the gamut from key applications that are central to the business to other types of information that may be less important.
What is interesting is that the industry has been somewhat stuck for several years at a one-size-fits-all strategy to backup and recovery. We’re referring to a “basic” backup scenario, or traditional tape backup.
Tape backup serves a noble purpose and is working very well for some companies—it’s been EMC’s core business for some time, so EMC knows it well. But shifting market dynamics, as well as more demanding business environments, have lead to other important choices for backup.
Today, traditional tape faces the challenge of meeting service-level requirements for protection and availability of an ever-increasing quantity of enterprise data. This is why EMC has built into NetWorker key options to meet the needs of a wide range of environments. This includes the ability to use disk for backup, as well as to take advantage of advanced-backup capabilities that connect backup with array-based snapshot and replication management. These provide you with essentially the highest-possible performance levels for backup and recovery. As the value of information changes over time, you may choose any one of these, or a combination thereof, to meet your needs.
The first key focus is on providing complete coverage. Enterprise protection means the ability to provide coverage for all the components in the environment. NetWorker provides data protection for the widest heterogeneous support of operating systems, and is integrated with leading databases and applications for complete data protection.
A single NetWorker server can be used to protect all clients and servers in the environment—or secondary servers can be employed, which EMC calls Storage Nodes, as a conduit for additional processing power or to protect large critical servers directly across a SAN without having to take data back over the network. Such LAN-free backup is standard with NetWorker.
NetWorker can easily back up environments in LAN, SAN, or WAN environments, with coverage for key storage such as NAS. As a matter of fact, NetWorker’s NAS-protection capabilities, leveraging the Network Data Management Protocol (NDMP), are unequaled.
The key here is that NetWorker can easily grow and scale as needed in the environment and provide advanced functionality, including clustering technologies, open-file protection and compatibility with tape hardware and the new class of virtual-tape and virtual-disk libraries.
While NetWorker encompasses all these pieces in the environment, EMC has made sure there is a common set of management tools.
With NetWorker, EMC has focused on what it takes within environments both large and small to get the best performance possible, in terms of both speed and reliability. This means the inclusion of capabilities such as multiplexing to protect data as quickly as possible while making use of the backup storage’s maximum bandwidth. It also means ensuring that the way in which EMC indexes and manages the saving of data is designed to provide not only the best performance, but also stability and reliability.
Applications can be backed up either offline or online. NetWorker by itself can back up closed applications as flat files. During an offline, or cold, backup, the application is shut down, backed up and restarted after the backup is finished.
This is fine, but during the shutdown and backup period, the application will be unavailable. This is not acceptable in today’s business environments. This is why EMC has worked to integrate NetWorker with applications to provide online backup—specifically, with the use of NetWorker in conjunction with NetWorker Modules.
During an online, or hot, backup, the application is open and is backed up while open. The NetWorker Module extracts data for backup with an API; the application need not be shut down, and remains open while the backup finishes.
NetWorker supports a wide range of applications for online backup with granular-level recovery, including:
Oracle
Microsoft Exchange
Microsoft SQL Server
Lotus Notes
Sybase
Informix
IBM DB2
EMC Documentum
One key advantage of NetWorker is its media-management features.
The first feature is Open Tape Format. It is NetWorker’s way of recording data to tape, specifically designed to provide several advantages:
Data can be multiplexed, or interleaved, for performance. This essentially means data can be accepted and written to the backup media as it comes in, regardless of what order it comes in, so the tape drives can keep spinning. This enables you to back up faster, but also reduces wear and tear on the tape hardware, which is more susceptible to error if it is continually stopping and starting.
Tapes created by NetWorker are self-describing, so if everything else is gone except for the tape, you’ll be able to load it and understand what data is there to be restored.
As the image on the right indicates, Open Tape Format allows you to move tape media between systems and servers on unlike operating systems—with Open Tape Format, a tape that began life on a UNIX-based system can easily be read on a Windows-based system. This is key not just for disaster recovery, but for the entire environment, as you go through a regular system lifecycle and adopt new platforms.
Also, with Open Tape Format, NetWorker can skip bad spots on tape and continue data access. When other solutions on the market encounter any error on tape, they are unable to do anything further with the tape. Imagine if there is a bad spot 100 MB into a backup tape…
Finally, NetWorker can broker tape devices on a SAN to get the best use and performance out of the hardware investment. So, instead of hard-assigning tape drives to a backup server or Storage Node, you can dynamically allocate any drive on demand.
The focus here is the resolution of the top pain points around traditional tape-based backup.
Performance—NetWorker backup to disk allows for simultaneous-access operations to a volume, both reads (restore, staging, cloning) and writes (backups). With NetWorker, as opposed to with traditional tape-only backup, you don’t "pay a penalty on restore."
Also, cloning from disk to tape is up to 50% faster. Why? As soon as the Save Set (backup job) is complete, the cloning process can begin without the Administrator having to wait for all the backup jobs to complete. NetWorker can back up to disk and clone to tape at the same time. You don’t have to spend 12–16 hours a day running clone operations (tape-to-tape copies)—in fact, you might actually be able to eliminate the clone jobs. Some NetWorker customers have seen cloning times reduced from 12–16 hours daily to three to four hours daily.
Cloning from disk to tape also augments the disaster-recovery strategy for tape. As data grows, more copies must be sent offsite. Because NetWorker backup to disk improves cloning performance, you can now continue to meet the daily service-level agreements to get tapes offsite to a vaulting provider.
Taking the idea of leveraging disk even idea further leads us into a discussion of to NetWorker’s advanced backup capability, which also leverages disk-based technologies.
Disk-solution providers, like EMC, provide array-based abilities to perform snapshots and replication. These “point-in-time” copies of data allow for instant recovery of disk and data volumes. Many are likely familiar with array-based replication or snapshot capabilities.
NetWorker is engineered to take advantage of these capabilities by providing direct tie-ins with EMC offerings such as CLARiiON with SnapView, or Symmetrix with TimeFinder/Snap. This will enable you to begin to meet the most stringent recovery requirements.
In a study done in the spring of 2004, the Taneja Group identified that the market intends to rely on snapshots for ensuring application-data availability and rapid recoveries. The figures represent a scale of one to five, with one as the low point, five as the high point:
Rapid application recovery (4.34)
Ability to automate backup to tape (4.13)
Instant backup (3.98)
Roll back to point in time (3.88)
Integration with backup strategy (3.87)
Flexibility to leverage hardware (3.61)
Multiple fulls throughout day (3.49)
In addition to traditional backup-and-recovery application modules for disk and tape, the snapshot management capability called NetWorker PowerSnap enables you to meet the demanding service-level agreement requirements in both tape and disk environments by seamlessly integrating snapshot technology and applications. NetWorker PowerSnap software works with NetWorker Modules to enable snapshot backups of applications—with consistency.
PowerSnap performs snapshot management by policy—just like standard backup policies to tape or disk. It uses these policies to determine how many snapshots to create, how long to retain the snapshots, when to do backups to tape from specified snapshots…all based on business needs that you define.
Note to Presenter: Click now in Slide Show mode for animation.
For example, snapshots might be taken every few hours, and the three most recent are retained. You can easily leverage any of those snapshots to back up to tape in an off-host fashion—i.e., with no impact to the application servers.
PowerSnap manages the full life cycle of snapshots, including creation, scheduling, backups, and expiration. This, along with its orchestration with applications, provides a comprehensive solution for complete application-data protection to help you meet the most stringent of RTOs and RPOs.
If there are servers with lots of files and lots of directories—what we refer to as high-density file systems—backup and recovery are particularly challenging. With so many files, traditional backup struggles to keep up with backup windows.
NetWorker SnapImage enables block-level backup of these file systems while maintaining the ability to restore a single file. SnapImage is intelligent enough to also support sparse backups.
Sparse files contain data with portions of empty blocks, or “zeroes.”
NetWorker backs up only the non-zero blocks, thereby reducing:
Time for backup
Amount of backup-media space consumed
Sparse-file examples:
Large database files with deleted data or unused database fields
Files from image applications
With the NetWorker SnapImage Module, backup and recovery of servers with high-density file systems is significantly increased:
The time required to back up 18.8 million 1 KB files in a 100 GB file system with a block size of 4 KB can be reduced from 31 to seven hours.
The time required to perform a Save Set restore of one million 4 KB files in a 5.36 GB internal disk can be reduced from 72 to seven minutes.
EMC has worked with a large Telecommunications company to meet their most demanding IT challenges:
Complex application environment—Oracle, and lots of data
No backup window
Recovery-time objective: Restore 24 TB in two hours.
They chose to implement NetWorker, along with other key EMC offerings, to achieve a superior level of protection and recovery management—and confidence in the ability to recover.
Solution:
NetWorker PowerSnap with Symmetrix and TimeFinder/Snap
Server-free backup and rapid recovery
NetWorker DiskBackup with CLARiiON with ATA disks
Rapid primary-site protection and recovery
NetWorker and SRDF/S
Disaster recovery, offsite protection
Here is what they have been able to achieve with the above:
Zero backup time for their applications
Zero data loss
Significantly reduced management overhead
Not all environments will be this complex or demanding, but NetWorker can meet any backup and recovery requirements, and can easily be upgraded to meet more stringent requirements as needed.