SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Sorting Through the Confusion
Replacing Tape with Disk for Backups




              WHITE PAPER
Table of Contents
Introduction .................................................................................................................... 2
Considerations When Examining Disk- Based Backup Approaches........................................... 4
Backup Requirements ....................................................................................................... 7
Backup or Cloud Services .................................................................................................. 8
Disk Staging .................................................................................................................. 10
Primary Storage SNAPS .................................................................................................. 12
Backup Application Deduplication in the Media Server ......................................................... 13
Backup Application Client Side Deduplication ..................................................................... 15
Purpose-Built Target Side Deduplication Appliances ............................................................ 17
Summary ...................................................................................................................... 18
About ExaGrid ............................................................................................................... 19




                   Sorting Through the Confusion                                                                      Page 1|
Introduction
The reason a 50-year old technology like tape is still around is simple; it’s CHEAP. But there is
increasing pressure on businesses to fix their backups, as detailed in many sources including the
report, “Best Practices for Addressing the Broken State of Backup” by Dave Russell, research
vice president at Gartner. He found that “for many organizations, backup has become an
increasingly daunting and brittle task fraught with significant challenges.”

The pressure of data growth has increased sharply as businesses need to store both onsite and
offsite copies of their data. This can mean storing 40 to 100 times the volume of their primary
dataset, due to storing weeks of retention onsite and weeks, months and in some cases, years of
retention off site.

Onsite copies are kept in order to recover a deleted or overwritten file or to recover from a
system outage, hardware failure or data corruption that goes unnoticed until the data is needed
again, perhaps weeks later. Offsite copies are kept in order to recover data if the primary site
has a disaster.

Maintaining more copies with longer term retention is being driven by business needs such as
SEC Audits, regulations such as the Gramm Leach Bliley Act (GLBA), Health Insurance Portability
and Accountability Act (HIPAA) and Sarbanes Oxley (SOX), legal requirements such as the need
for legal discovery, Service Level Agreements (SLA), contractual reasons and many other
business and legal reasons. The challenge of labeling tapes, transporting tapes, storing them
and ultimately finding the right tape when requested is a challenge in itself. This is compounded
by the fact that the data may not even be on the tapes, since 30% of tapes are found corrupted,
damaged or blank.

Tape has some intrinsic problems
      The number of simultaneous jobs that can be writing to tape is determined by the number
       of drives in the tape library resulting in unnecessarily long backup times.
      Restores fail about 30% of the time from tapes, which can be missing files, have
       corrupted files or can be unreadable or blank.
      Tapes are physically transported repeatedly and can be lost, misplaced, or stolen in
       transit.
      Tape labels are handwritten, making them subject to human error. They can also fall off
       or be unreadable.
      Tapes can be damaged by wear though overuse in the rotation, heat, humidity, magnetic
       fields, dirt and other environmental conditions.
      Data at rest on tapes is not encrypted. Tape encryption dramatically increases the backup
       window. Tapes can be password-protected but require a system to track passwords, which
       is subject to human error.
      It takes time to restore from tapes and even more time if, for any reason, a tape set is
       bad. In that case, you have to fall back to an earlier tape set and start the restore again.



              Sorting Through the Confusion                                              Page 2|
The time to first find, then retrieve, tapes in transit or at remote locations must be
      factored in, as well.

Inertia and confusion have kept tape alive
The market is full of affordable alternatives to tape, but at the same time there is a lot of
confusion about technologies like deduplication. Disk has been making steady inroads on tape as
the primary target for backup software for reasons that go much deeper than price with respect
to tape, or being faster and easier to manage than tape. Because of the ambiguity surrounding
tape alternatives, some businesses are confused when looking to replace tape and tape
solutions.

Disk is just one part of a whole new equation that has emerged where near real- time business
continuity and disaster recovery are the new desired end results. Disk eliminates the daily grind
and uncertainty that typically surrounds backup to tape. Instead, IT staffs get relief from
worrying whether backups and restores are completing successfully, or that backup jobs have
failed.

Disk by its very nature fixes all the intrinsic problems of tape
 Volumes or NAS shares are virtual and can have a large number of backup jobs being written
  in parallel, reducing the backup window.
 Backups and restores are reliable with disk. With tape, up to 15 percent of backups fail and
  up to 30% of restores fail. With disk virtually 100% of backups complete and 100% of
  restores succeed.
 Disks are in a hermetically sealed case inside a temperature and humidity controlled data
  center, eliminating the environmental degradation issues of tape.
 Disks reside a rack, in a data center, which is in turn is secured by physical and network
  security. Therefore, the security issues of tape moving around are eliminated.
 There are no handwritten labels that can fall off. The software automatically tags all jobs that
  have been written to disk.
 In addition to physical and network data center security, data stored on disk can be encrypted
  with only a 3 - 4% performance reduction. Encrypting while writing to tape dramatically slows
  down the backups.
 Restoring data from backups, including incremental backups, is fast from disk. No time is lost
  to finding or retrieving tapes in transit or at remote locations. Not only is disk more reliable to
  restore from, but it is random access versus tape which is sequential access.

Over the last decade, a range of technologies has emerged that makes it feasible for disk to
replace tape. Disk-based solutions now offer the benefits that only tape once offered, such as
infinite capacity, portability and manageability.

Make better use of resources
The use of disk storage for augmenting tape, or of disk storage and deduplication either
augmenting or eliminating tape, is becoming a more logical investment for organizations. .


              Sorting Through the Confusion                                                Page 3|
Scarce resources once used to deliver "just" data protection can be repurposed for the strategic
business initiatives of disaster recovery and business continuity.

Those responsible for planning or carrying out backups are looking for tape alternatives that
offer:
      Less IT staff time spent on backups, resulting in time to focus on other valuable IT
       initiatives
      Faster backups
      More reliable backups
      Faster and more reliable restores
      Ability to meet all financial, governmental and legal retention requirements
      Achieve all of the above without making any major changes to the current environment
       that could create work, risk or change


Considerations When Examining Disk- Based Backup Approaches
Now that it is economically feasible to move from a tape-based to disk-based backup approach,
a large number of vendors with varying approaches have emerged. This has caused a great
amount of confusion for IT managers looking to adopt a disk-based backup system for their
organization.

To help clear away this confusion, this white paper will first present a general overview of
several different deduplication approaches. This section will show how deduplication can store
far less data on a given amount of disk using new technologies that minimize the amount of disk
required. This results in a cost for disk that is about the same cost as tape.

For reference, a chart is included that lists the backup requirements of each of these
approaches. Next, each of the six potential solutions that are often considered to replace tape
with disk will be presented in turn. Information about each approach will be shown, including the
pros and cons of each approach. These six approaches are as follows:


      Backup services or cloud backup services
      Disk staging – storing data on disk that has been inserted between the media servers and
       the tape libraries
      Primary storage SNAPs
      Backup application data deduplication in the media server writing to standard disk
      Backup application data deduplication on server agents (client side) writing to standard
       disk
      Purpose-built target side appliance with deduplication

Data Deduplication Overview
One of the few remaining arguments for tape is that tape libraries will technically never "run out
of retention capacity". As soon as a tape cartridge fills up, it can be replaced with another tape



              Sorting Through the Confusion                                              Page 4|
cartridge and the full cartridges can be stored. When writing to disk, storing the same amount of
data that is stored on tape would require a massive amount of disk, resulting in high cost.

However, if you could use a fraction of the space required to store the data on disk and bring the
cost of disk storage close to the cost of tape, then disk is clearly the better alternative. From
week to week, only about 2% of the bytes
change. However, with tape backup 98%
of the unchanged data is backed up
repeatedly, resulting in saving the
identical data dozens and even hundreds
of times.

With disk, deduplication software can
intelligently save only the 2% of the data
that changes from week to week, saving
only the changed data. The net result of
using disk storage and data deduplication
together is you only need 1/20th to 1/50th
of the storage you would need on tape.
                                                     Figure 1 - Data Deduplication Taxonomy

Since tape costs about 1/20th the price of
disk per TB of usable capacity, using data
deduplication effectively neutralizes the price gap between tape and disk by using far less disk
space than is required to store the same data on tape. There are many methods to data
deduplication including:


      Fixed data block (64KB to 128KB) - used in Backup Software Applications

      Changed storage blocks - used in primary storage SNAPS

      Byte level - used in target side appliances

      Data block with variable content splitting - used in target side appliances

      Zone-byte level - used in target side appliances

All of these methods reduce redundant data in backups. For example, if a full backup of 50TB of
data is completed every Friday night, and 10 weeks are kept onsite, it would take 500TB of disk
space to store the backup. However, most of the full backup is unchanged from week to week.
Only the data that has been changed, edited or created that week needs to be stored. On
average, only about 2% of the data changes from week to week. In this example, 2% is about
1TB per week.



              Sorting Through the Confusion                                              Page 5|
If you were to take out all of the redundant data, over time the storage required can be reduced
by as much as 50:1, depending on the deduplication method used.

Factors Impacting Deduplication
Results
In general, the higher the deduplication
ratios, the better. A higher deduplication
ratio uses less disk space over time and
needs far less WAN bandwidth to
replicate data to the offsite disaster
recover site.

Deduplication Approach
The deduplication approach selected
impacts the amount of storage savings
that will result.
                                               Figure 2 - Deduplication Reduces Storage over Time

       64KB to 128KB fixed block will average about 7 to 1
       Byte, Segment-block and Zone will average from average from about 20: 1 to 50: 1
        reduction in data storage

Data Mix Affects Results
The deduplication ratio can range from 10: 1 to as much as 50: 1, depending on the mix of data
types being backed up. Databases can get very high deduplication ratios of over 100: 1.
Unstructured file data will see an average ratio of 7-10:1. Deduplicating compressed or
encrypted files does not yield a high ratio or significant space savings.

Retention Period
The longer the retention period, the higher the deduplication ratio will be.

Getting the Best Results
The best deduplication ratios will be achieved in environments that are:
    Using byte, data block or zone-level deduplication
    Backing up no compressed or encrypted data
    Retaining data for longer-term periods, on the order of 18 weeks

   The worst deduplication ratio will be achieved in environments that are:
     Using 64KB or 128KB fixed block deduplication
     Backing up a large amount of compressed or encrypted data
     Retaining data for shorter-term periods, on the order of 4 weeks or less

The net is that not all deduplication approaches achieve the same results. Deduplication ratios
are clearly impacted by data types and retention periods. All of these factors need to be taken
into consideration when choosing the proper disk backup approach.

               Sorting Through the Confusion                                               Page 6|
Backup Requirements
The chart below shows the top backup requirements of most IT shops, arranged in priority order. Each of the
approaches, including staying with tape, is shown in its own column. As you can see, not all approaches can meet all
requirements. The key is to list your requirements and match them against each of the solutions to see which solutions
best meet your requirements. The following sections show the strengths and limitations of each of the 6 disk solutions.
Backup or Cloud Services
 There are many backup or cloud services to which backup can be outsourced, and the market is
 evolving as new players enter the field. These services require replacing the server agents used
 by the backup application. The service can then remotely manage the backup environment.

 At the start, one complete backup of the data
 needs to be sent to the backup server. The
 logistics of doing this data transfer can be
 troublesome, due to the large, sustained
 bandwidth required. After the initial full
 backup is transferred, just the changes in the
 data need to be uploaded to the outsourced
 service. Most of these agents only move
 changed bytes once the initial full backup is at
 the server provider (in the cloud).

 Before a cloud backup recovery strategy is
 implemented, two key factors should be
 considered. First, one should ask what the
 recovery point objective (RPO) is for the              Figure 3 - Typical Cloud Backup

 business service that is being considered.
 Second, one should ask what the recovery time objective (RTO) is for the business service.

 Be sure to evaluate carefully the claims made in cloud service contracts. The most important of
 these contractual promises is the availability of the service, the provider’s service level
 agreements (SLAs), and the security of your data.

 According to a Yankee Group report 1,”cloud contracts are rife with disclaimers, misleading
 uptime guarantees, and questionable privacy policies…”

 Strengths
          Frees up IT staff to do other core/critical IT tasks

 Weaknesses
          Requires changing all the server agents from your existing backup application to the
           outsourced service backup agents. Any changes to the agents will require weeks or
           months of tweaking.



 1
     http://www.yankeegroup.com/about_us/press_releases/2010-04-21.html

Sorting Through the Confusion                                                               Page 8|
   Good for small amounts of data, typically under 1TB. Best fit for small IT shops or a large
        company’s small remote office, but not for multi-TB environments. This limitation is due to
        the time needed to recover the data over the Internet. Under normal operation only the
        changed bytes or blocks get sent for replication. However, if a full backup restore is
        required it would take about 31 days to get 1TB of data over 3MB of bandwidth from the
        internet. It is key to note that it is not the bandwidth between site you are using but
        rather your bandwidth to the internet.


       If the data is over a few TB, most service providers need to place a hardware appliance
        (cache) in the IT environment to keep at least one week of backups (including a full
        backup) on-site to overcome the recovery bottleneck presented by bandwidth to the
        internet. The cost of the cache appliance plus the monthly fees makes a backup or cloud
        service the most expensive backup choice if you have more than a few TBs of data to
        protect.

 Summary
       For consumers, small IT environments (<1TB) and small remote offices with a small or
        nonexistent IT staff, a small data center (if any) and low bandwidth, a backup service is
        the best way to go.

       If there is a reasonable amount of data (>1TB) services become too cumbersome and too
        costly.




Sorting Through the Confusion                                                             Page 9|
Disk Staging
 Disk staging places a disk between the
 media server or storage nodes and the
 tape library. This is also considered
 tape augmentation.

 All backup applications can write
 directly to a disk volume or NAS share,
 so disk staging works natively with all
 backup applications. Disk staging
 reduces the perceived backup window
 at the client level, reduces the backup
 verification window at the server level,
 and provides the high speed recovery               Figure 4 - Disk Staging Concept Overview

 of files from disk, rather than tape.

 Strengths
       By placing disk between the media servers/storage nodes and the tape library many
        problems are solved:

              Multiple parallel jobs can be handled, without being limited to the number of
               physical tape drives. This results in faster backups, assuming that media servers
               can keep up.

              Reliable backups and reliable restores for the data are assured using disk.

 Weaknesses
       Disk staging becomes expensive very quickly:
            Disk staging does not eliminate the use of tape onsite or offsite. It simply augments
               tape onsite.
            There is no data deduplication when using disk staging so the amount of disk grow
               very quickly and becomes extremely expensive with any level of retention.

        For example, two weeks of nightly backups and weekly full backups require storing four
        times the size of the primary data on disk. This assumes a rotation of full backups for
        databases and email nightly, incremental backups on files nightly and full backups on
        Friday.


        Each night, a combination of incremental backups of files and full backups of databases
        and email will equal about 25% of a full backup. These Monday through Thursday nightly
        backups will add up roughly to the size of a full backup.


Sorting Through the Confusion                                                                  P a g e 10 |
Using 40TB of data as an example, nightly backups after four nights will be 40TB and a
        Friday full backup will be 40TB. Together, they will require a total of 80TB of disk storage.
        After two weeks, this expands to 160TB of disk storage required. Therefore, 90% of
        customers using disk staging keep between one and two weeks of data on disk.



 Summary
       Disk staging is good for one to two weeks of onsite retention on disk.

              It is estimated that about 70% of tape users use disk staging

       For retention over one or two weeks, or tape replacement onsite, an organization must
        use data deduplication in order to store only unique data (not the redundant data) in
        order to use far less disk, thus reducing the cost impact.




Sorting Through the Confusion                                                               P a g e 11 |
Primary Storage SNAPS
 Primary storage SNAPS (a quick logical copy or snapshot) are useful primarily for short-term
 retention. They are just the first line of defense in a layered backup scheme that includes long-
 term backups. SNAPS save changed
 storage blocks on a periodic basis (e.g.
 hourly) that allow for roll back to the last
 period. Primary storage SNAPS are not
 intended for long-term or historical backup.

 Strengths
       SNAPS allow rolling back to earlier
        points and are more granular than a
        nightly backup                                        Figure 5 - SNAPS Concept


       SNAPS can be replicated offsite for disaster recovery of short-term, periodic SNAP points

 Weaknesses
       SNAPs write into the same volume as the primary data so they do not offer protection
        against a system crash, virus attack, data corruption or other event that destroys the
        primary data. The SNAPs would get destroyed along with the primary data. This is why
        99% of IT environments keep a backup copy on a separate system onsite (tape or disk).

       SNAPs are not good for long-term retention uses such as legal discovery, regulatory
        compliance or SEC audits. When years of retention are required, a traditional backup
        approach is required due to the need to store data at specific points in time but not every
        interval in between, such as monthly backups for 3 years and then yearly backups for 4
        additional years.

 Summary
       Primary Storage SNAPS and long-term traditional backup can co-exist as part of a multi-
        layered approach to backup tailored to the specific requirements of the business.
        Primary Storage SNAPS provide for fine-granularity backup points onsite and also offsite,
        if replicated.
       It is estimated that 99% of IT environments use a traditional, longer-term backup system.
        About 50% of IT environments deploy some type of Primary SNAPs as well.




Sorting Through the Confusion                                                             P a g e 12 |
Backup Application Deduplication in the Media Server
 Some backup applications have a data deduplication feature that can be deployed as an agent in
 the media server. The intent is to be able to eliminate tape using standard disk in conjunction
 with the backup application.

 Data deduplication is a very compute
 intensive process. If deduplication is
 run in the media server, resource
 utilization will increase significantly.
 This can slow backups down
 dramatically.

 To avoid this hit to overall backup
 performance, backup software uses a
 form of deduplication that results in a
                                             Figure 6 - Running Deduplication on Media Server
 lower reduction rate. Using the least
 possible processor and memory
 resources for the deduplication process avoids starving the media server tasks of resources, but
 at the cost of lowering deduplication performance.

 Typically this approach uses 64KB or 128KB fixed blocks and will yield a data reduction ratio of
 about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block
 with variable length content-splitting average from about a 20:1 to as much as a 50:1 data
 reduction ratio, or a minimum of approximately three times that of the software approach.

 In addition, software deduplication can only process data that comes from its own proprietary
 agents. It cannot deduplicate data from other sources including other backup applications,
 utilities or data base dumps.

 Some vendors bundle the media server software on a storage server that includes a CPU,
 memory and disk. This does not change the deduplication rate or the heterogeneous nature of
 the solution.

 Strengths
       Relatively simple to manage through the backup application
       Good for environments that have less than 3TB of data to backup, use a single backup
        application and do not plan to replicate to a second site for disaster recovery

 Weaknesses
       Disk usage is high as the deduplication ratio is only 6-7:1. Over time the disk space
        required grows sharply.


Sorting Through the Confusion                                                             P a g e 13 |
   Bandwidth needed to send backups to a second site is high as the deduplication ratio is
        only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-
        block with variable length content-splitting average from about a 20:1 to as much as a
        50:1 data reduction ratio, or a minimum of approximately three times that of the software
        approach.

       Cannot deduplicate data from:

              Veeam, VizionCore
              Lightspeed, SQL Safe, Redgate
              Direct SQL Dumps, Direct Oracle RMAN Dumps
              Bridgehead for Meditech data
              Direct UNIX TAR files
              Other traditional backup applications

 Summary
       Deduplication in the backup software is good for short-term retention and low amounts of
        data in environments that are not heterogeneous and where offsite disaster recovery data
        is not required.




Sorting Through the Confusion                                                          P a g e 14 |
Backup Application Client Side Deduplication
 Some industry backup applications offer a form of
 data deduplication in the application server agents
 or clients. The intent is to be able to eliminate tape
 using standard disk along with the backup
 application. The deduplication occurs at the backup
 agent/client on each application server.

 Data deduplication is a very compute intensive
 process. Resource utilization will increase
 significantly if deduplication is run in the application
 server (client side), and slow down backups
 dramatically.

 To minimize this impact, client side deduplication
 software approaches use a less-efficient form of
                                                             Figure 7 - Client Side Deduplication
 deduplication. Typically they use 64KB or 128KB
 fixed blocks where they achieve a data reduction
 rate of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-
 block with variable length content-splitting average from about a 20:1 to as much as a 50:1
 data reduction ratio, or a minimum of approximately three times that of the software approach.

 Running a compute intensive deduplication process on your applications servers creates other
 performance and availability challenges. Furthermore, databases and email, which are 80% of
 the Monday through Thursday backups, are still sent as full backups. This means that only 20%
 of the nightly data is actually deduplicated, by client side deduplication, during the week. The
 true impact is on the Friday night full backup, where 80% of the data is unstructured file data.

 In addition, the software approach to deduplication can only process data that comes from its
 own proprietary agents. It cannot deduplicate data from other sources including other backup
 applications, utilities or data base dumps.

 Strengths
       Great fit for deduplicating data from small remote sites, then replicating it back to a
        corporate datacenter for backup.

       This approach can shorten the backup window, but only on the Friday full backup. During
        the week, backups are still full backups for data base and email.

 Weaknesses
       Requires new agents on servers; added risk and cost of changing agents.


Sorting Through the Confusion                                                               P a g e 15 |
   Deduplication ratio is only 6-7:1 and the disk space required increases quickly.

 Bandwidth usage to a second site is high as the deduplication ratio is only 6-7:1. By comparison,
 target-side appliances that use byte, zone-byte or segment-block with variable length content
 splitting average from about 20: 1 to 50: 1 data reduction ratio, or at a minimum three times
 that of software deduplication.


       Cannot deduplicate data from:

              Veeam, VizionCore
              Lightspeed, SQL Safe, Redgate
              Direct SQL Dumps, Direct Oracle RMAN Dumps
              Bridgehead for Meditech data
              Direct UNIX TAR files
              Other traditional backup applications

 Summary
       Very good for replicated remote site data back to a corporate datacenter

       Very few businesses actually use this approach due to its risk to application servers and
        weaknesses




Sorting Through the Confusion                                                              P a g e 16 |
Purpose-Built Target Side Deduplication Appliances
 Target-side deduplication appliances are built specifically to replace the tape library in the
 backup process onsite and, optionally, offsite. Because they are dedicated appliances, the
 hardware and the deduplication methods used can be optimized for that single purpose. Future
 disk space requirements to deal with data growth are drastically reduced because deduplication
 ratios from 20:1 to as much as 50:1 can be achieved, Only the data that changes, about 2% of
 the backup size, is replicated offsite and requires far less bandwidth.

 In addition, target-side appliances can process
 data from a variety of utilities and backup
 applications.

 Strengths
       No change to your backup environment.
        Use all backup applications, utilities and
        dumps you are currently using.

              Can take in data from:
                                                    Figure 8 - Target Side Deduplication Appliance
                   Traditional backup
                     applications
                   Veeam, VizionCore
                   Lightspeed, Redgate, SQLSafe
                   SQL Dumps, Oracle RMAN dumps
                   Direct UNIX TAR files
                   Many other backup applications and utilities

       20:1 to as much as 50:1 deduplication ratios use less disk space and far less bandwidth
        for replication.

       Special features for:
            Tracking data to offsite Disaster Recovery
            Improving Disaster Recovery RPO (recovery point objective) and RTO (recover time
              objective)
            Purging data as the retention policy calls for aging out data

 Weaknesses
       Backup window improves over using a tape library, but not by as much as client side
        deduplication for the Friday night full backup




Sorting Through the Confusion                                                              P a g e 17 |
Summary
 When evaluating different approaches to replacing tape with disk, take the time to ask the right questions and
 understand the strengths and weaknesses of each alternative.




Sorting Through the Confusion                                                           P a g e 18 |
About ExaGrid
 ExaGrid is the leader in cost-effective disk-based backup solutions. A highly scalable system that
 works with existing backup applications, the ExaGrid system is ideal for companies looking to
 quickly eliminate the hassles of tape backup while reducing their existing backup windows.
 ExaGrid’s innovative approach minimizes the amount of data to be stored by providing standard
 data compression for the most recent backups along with zone-level data deduplication
 technology for all previous backups. Customers can deploy ExaGrid at primary and secondary
 sites to supplement or eliminate offsite tapes with live data repositories or for disaster recovery.

 With offices and distribution worldwide, ExaGrid has more than 3,500 systems installed and
 hundreds of published customer success stories and testimonial videos available at
 www.exagrid.com.




Sorting Through the Confusion                                                              P a g e 19 |
ExaGrid Systems, Inc | 2000 West Park Drive | Westborough, MA 01581 | 1-800-868-6985 |
www.exagrid.com

© 2011 ExaGrid Systems, Inc. All rights reserved.
ExaGrid is a registered trademark of ExaGrid Systems, Inc.

Contenu connexe

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Sorting through the confusion white paper

  • 1. Sorting Through the Confusion Replacing Tape with Disk for Backups WHITE PAPER
  • 2. Table of Contents Introduction .................................................................................................................... 2 Considerations When Examining Disk- Based Backup Approaches........................................... 4 Backup Requirements ....................................................................................................... 7 Backup or Cloud Services .................................................................................................. 8 Disk Staging .................................................................................................................. 10 Primary Storage SNAPS .................................................................................................. 12 Backup Application Deduplication in the Media Server ......................................................... 13 Backup Application Client Side Deduplication ..................................................................... 15 Purpose-Built Target Side Deduplication Appliances ............................................................ 17 Summary ...................................................................................................................... 18 About ExaGrid ............................................................................................................... 19 Sorting Through the Confusion Page 1|
  • 3. Introduction The reason a 50-year old technology like tape is still around is simple; it’s CHEAP. But there is increasing pressure on businesses to fix their backups, as detailed in many sources including the report, “Best Practices for Addressing the Broken State of Backup” by Dave Russell, research vice president at Gartner. He found that “for many organizations, backup has become an increasingly daunting and brittle task fraught with significant challenges.” The pressure of data growth has increased sharply as businesses need to store both onsite and offsite copies of their data. This can mean storing 40 to 100 times the volume of their primary dataset, due to storing weeks of retention onsite and weeks, months and in some cases, years of retention off site. Onsite copies are kept in order to recover a deleted or overwritten file or to recover from a system outage, hardware failure or data corruption that goes unnoticed until the data is needed again, perhaps weeks later. Offsite copies are kept in order to recover data if the primary site has a disaster. Maintaining more copies with longer term retention is being driven by business needs such as SEC Audits, regulations such as the Gramm Leach Bliley Act (GLBA), Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes Oxley (SOX), legal requirements such as the need for legal discovery, Service Level Agreements (SLA), contractual reasons and many other business and legal reasons. The challenge of labeling tapes, transporting tapes, storing them and ultimately finding the right tape when requested is a challenge in itself. This is compounded by the fact that the data may not even be on the tapes, since 30% of tapes are found corrupted, damaged or blank. Tape has some intrinsic problems  The number of simultaneous jobs that can be writing to tape is determined by the number of drives in the tape library resulting in unnecessarily long backup times.  Restores fail about 30% of the time from tapes, which can be missing files, have corrupted files or can be unreadable or blank.  Tapes are physically transported repeatedly and can be lost, misplaced, or stolen in transit.  Tape labels are handwritten, making them subject to human error. They can also fall off or be unreadable.  Tapes can be damaged by wear though overuse in the rotation, heat, humidity, magnetic fields, dirt and other environmental conditions.  Data at rest on tapes is not encrypted. Tape encryption dramatically increases the backup window. Tapes can be password-protected but require a system to track passwords, which is subject to human error.  It takes time to restore from tapes and even more time if, for any reason, a tape set is bad. In that case, you have to fall back to an earlier tape set and start the restore again. Sorting Through the Confusion Page 2|
  • 4. The time to first find, then retrieve, tapes in transit or at remote locations must be factored in, as well. Inertia and confusion have kept tape alive The market is full of affordable alternatives to tape, but at the same time there is a lot of confusion about technologies like deduplication. Disk has been making steady inroads on tape as the primary target for backup software for reasons that go much deeper than price with respect to tape, or being faster and easier to manage than tape. Because of the ambiguity surrounding tape alternatives, some businesses are confused when looking to replace tape and tape solutions. Disk is just one part of a whole new equation that has emerged where near real- time business continuity and disaster recovery are the new desired end results. Disk eliminates the daily grind and uncertainty that typically surrounds backup to tape. Instead, IT staffs get relief from worrying whether backups and restores are completing successfully, or that backup jobs have failed. Disk by its very nature fixes all the intrinsic problems of tape  Volumes or NAS shares are virtual and can have a large number of backup jobs being written in parallel, reducing the backup window.  Backups and restores are reliable with disk. With tape, up to 15 percent of backups fail and up to 30% of restores fail. With disk virtually 100% of backups complete and 100% of restores succeed.  Disks are in a hermetically sealed case inside a temperature and humidity controlled data center, eliminating the environmental degradation issues of tape.  Disks reside a rack, in a data center, which is in turn is secured by physical and network security. Therefore, the security issues of tape moving around are eliminated.  There are no handwritten labels that can fall off. The software automatically tags all jobs that have been written to disk.  In addition to physical and network data center security, data stored on disk can be encrypted with only a 3 - 4% performance reduction. Encrypting while writing to tape dramatically slows down the backups.  Restoring data from backups, including incremental backups, is fast from disk. No time is lost to finding or retrieving tapes in transit or at remote locations. Not only is disk more reliable to restore from, but it is random access versus tape which is sequential access. Over the last decade, a range of technologies has emerged that makes it feasible for disk to replace tape. Disk-based solutions now offer the benefits that only tape once offered, such as infinite capacity, portability and manageability. Make better use of resources The use of disk storage for augmenting tape, or of disk storage and deduplication either augmenting or eliminating tape, is becoming a more logical investment for organizations. . Sorting Through the Confusion Page 3|
  • 5. Scarce resources once used to deliver "just" data protection can be repurposed for the strategic business initiatives of disaster recovery and business continuity. Those responsible for planning or carrying out backups are looking for tape alternatives that offer:  Less IT staff time spent on backups, resulting in time to focus on other valuable IT initiatives  Faster backups  More reliable backups  Faster and more reliable restores  Ability to meet all financial, governmental and legal retention requirements  Achieve all of the above without making any major changes to the current environment that could create work, risk or change Considerations When Examining Disk- Based Backup Approaches Now that it is economically feasible to move from a tape-based to disk-based backup approach, a large number of vendors with varying approaches have emerged. This has caused a great amount of confusion for IT managers looking to adopt a disk-based backup system for their organization. To help clear away this confusion, this white paper will first present a general overview of several different deduplication approaches. This section will show how deduplication can store far less data on a given amount of disk using new technologies that minimize the amount of disk required. This results in a cost for disk that is about the same cost as tape. For reference, a chart is included that lists the backup requirements of each of these approaches. Next, each of the six potential solutions that are often considered to replace tape with disk will be presented in turn. Information about each approach will be shown, including the pros and cons of each approach. These six approaches are as follows:  Backup services or cloud backup services  Disk staging – storing data on disk that has been inserted between the media servers and the tape libraries  Primary storage SNAPs  Backup application data deduplication in the media server writing to standard disk  Backup application data deduplication on server agents (client side) writing to standard disk  Purpose-built target side appliance with deduplication Data Deduplication Overview One of the few remaining arguments for tape is that tape libraries will technically never "run out of retention capacity". As soon as a tape cartridge fills up, it can be replaced with another tape Sorting Through the Confusion Page 4|
  • 6. cartridge and the full cartridges can be stored. When writing to disk, storing the same amount of data that is stored on tape would require a massive amount of disk, resulting in high cost. However, if you could use a fraction of the space required to store the data on disk and bring the cost of disk storage close to the cost of tape, then disk is clearly the better alternative. From week to week, only about 2% of the bytes change. However, with tape backup 98% of the unchanged data is backed up repeatedly, resulting in saving the identical data dozens and even hundreds of times. With disk, deduplication software can intelligently save only the 2% of the data that changes from week to week, saving only the changed data. The net result of using disk storage and data deduplication together is you only need 1/20th to 1/50th of the storage you would need on tape. Figure 1 - Data Deduplication Taxonomy Since tape costs about 1/20th the price of disk per TB of usable capacity, using data deduplication effectively neutralizes the price gap between tape and disk by using far less disk space than is required to store the same data on tape. There are many methods to data deduplication including:  Fixed data block (64KB to 128KB) - used in Backup Software Applications  Changed storage blocks - used in primary storage SNAPS  Byte level - used in target side appliances  Data block with variable content splitting - used in target side appliances  Zone-byte level - used in target side appliances All of these methods reduce redundant data in backups. For example, if a full backup of 50TB of data is completed every Friday night, and 10 weeks are kept onsite, it would take 500TB of disk space to store the backup. However, most of the full backup is unchanged from week to week. Only the data that has been changed, edited or created that week needs to be stored. On average, only about 2% of the data changes from week to week. In this example, 2% is about 1TB per week. Sorting Through the Confusion Page 5|
  • 7. If you were to take out all of the redundant data, over time the storage required can be reduced by as much as 50:1, depending on the deduplication method used. Factors Impacting Deduplication Results In general, the higher the deduplication ratios, the better. A higher deduplication ratio uses less disk space over time and needs far less WAN bandwidth to replicate data to the offsite disaster recover site. Deduplication Approach The deduplication approach selected impacts the amount of storage savings that will result. Figure 2 - Deduplication Reduces Storage over Time  64KB to 128KB fixed block will average about 7 to 1  Byte, Segment-block and Zone will average from average from about 20: 1 to 50: 1 reduction in data storage Data Mix Affects Results The deduplication ratio can range from 10: 1 to as much as 50: 1, depending on the mix of data types being backed up. Databases can get very high deduplication ratios of over 100: 1. Unstructured file data will see an average ratio of 7-10:1. Deduplicating compressed or encrypted files does not yield a high ratio or significant space savings. Retention Period The longer the retention period, the higher the deduplication ratio will be. Getting the Best Results The best deduplication ratios will be achieved in environments that are:  Using byte, data block or zone-level deduplication  Backing up no compressed or encrypted data  Retaining data for longer-term periods, on the order of 18 weeks  The worst deduplication ratio will be achieved in environments that are:  Using 64KB or 128KB fixed block deduplication  Backing up a large amount of compressed or encrypted data  Retaining data for shorter-term periods, on the order of 4 weeks or less The net is that not all deduplication approaches achieve the same results. Deduplication ratios are clearly impacted by data types and retention periods. All of these factors need to be taken into consideration when choosing the proper disk backup approach. Sorting Through the Confusion Page 6|
  • 8. Backup Requirements The chart below shows the top backup requirements of most IT shops, arranged in priority order. Each of the approaches, including staying with tape, is shown in its own column. As you can see, not all approaches can meet all requirements. The key is to list your requirements and match them against each of the solutions to see which solutions best meet your requirements. The following sections show the strengths and limitations of each of the 6 disk solutions.
  • 9. Backup or Cloud Services There are many backup or cloud services to which backup can be outsourced, and the market is evolving as new players enter the field. These services require replacing the server agents used by the backup application. The service can then remotely manage the backup environment. At the start, one complete backup of the data needs to be sent to the backup server. The logistics of doing this data transfer can be troublesome, due to the large, sustained bandwidth required. After the initial full backup is transferred, just the changes in the data need to be uploaded to the outsourced service. Most of these agents only move changed bytes once the initial full backup is at the server provider (in the cloud). Before a cloud backup recovery strategy is implemented, two key factors should be considered. First, one should ask what the recovery point objective (RPO) is for the Figure 3 - Typical Cloud Backup business service that is being considered. Second, one should ask what the recovery time objective (RTO) is for the business service. Be sure to evaluate carefully the claims made in cloud service contracts. The most important of these contractual promises is the availability of the service, the provider’s service level agreements (SLAs), and the security of your data. According to a Yankee Group report 1,”cloud contracts are rife with disclaimers, misleading uptime guarantees, and questionable privacy policies…” Strengths  Frees up IT staff to do other core/critical IT tasks Weaknesses  Requires changing all the server agents from your existing backup application to the outsourced service backup agents. Any changes to the agents will require weeks or months of tweaking. 1 http://www.yankeegroup.com/about_us/press_releases/2010-04-21.html Sorting Through the Confusion Page 8|
  • 10. Good for small amounts of data, typically under 1TB. Best fit for small IT shops or a large company’s small remote office, but not for multi-TB environments. This limitation is due to the time needed to recover the data over the Internet. Under normal operation only the changed bytes or blocks get sent for replication. However, if a full backup restore is required it would take about 31 days to get 1TB of data over 3MB of bandwidth from the internet. It is key to note that it is not the bandwidth between site you are using but rather your bandwidth to the internet.  If the data is over a few TB, most service providers need to place a hardware appliance (cache) in the IT environment to keep at least one week of backups (including a full backup) on-site to overcome the recovery bottleneck presented by bandwidth to the internet. The cost of the cache appliance plus the monthly fees makes a backup or cloud service the most expensive backup choice if you have more than a few TBs of data to protect. Summary  For consumers, small IT environments (<1TB) and small remote offices with a small or nonexistent IT staff, a small data center (if any) and low bandwidth, a backup service is the best way to go.  If there is a reasonable amount of data (>1TB) services become too cumbersome and too costly. Sorting Through the Confusion Page 9|
  • 11. Disk Staging Disk staging places a disk between the media server or storage nodes and the tape library. This is also considered tape augmentation. All backup applications can write directly to a disk volume or NAS share, so disk staging works natively with all backup applications. Disk staging reduces the perceived backup window at the client level, reduces the backup verification window at the server level, and provides the high speed recovery Figure 4 - Disk Staging Concept Overview of files from disk, rather than tape. Strengths  By placing disk between the media servers/storage nodes and the tape library many problems are solved:  Multiple parallel jobs can be handled, without being limited to the number of physical tape drives. This results in faster backups, assuming that media servers can keep up.  Reliable backups and reliable restores for the data are assured using disk. Weaknesses  Disk staging becomes expensive very quickly:  Disk staging does not eliminate the use of tape onsite or offsite. It simply augments tape onsite.  There is no data deduplication when using disk staging so the amount of disk grow very quickly and becomes extremely expensive with any level of retention. For example, two weeks of nightly backups and weekly full backups require storing four times the size of the primary data on disk. This assumes a rotation of full backups for databases and email nightly, incremental backups on files nightly and full backups on Friday. Each night, a combination of incremental backups of files and full backups of databases and email will equal about 25% of a full backup. These Monday through Thursday nightly backups will add up roughly to the size of a full backup. Sorting Through the Confusion P a g e 10 |
  • 12. Using 40TB of data as an example, nightly backups after four nights will be 40TB and a Friday full backup will be 40TB. Together, they will require a total of 80TB of disk storage. After two weeks, this expands to 160TB of disk storage required. Therefore, 90% of customers using disk staging keep between one and two weeks of data on disk. Summary  Disk staging is good for one to two weeks of onsite retention on disk.  It is estimated that about 70% of tape users use disk staging  For retention over one or two weeks, or tape replacement onsite, an organization must use data deduplication in order to store only unique data (not the redundant data) in order to use far less disk, thus reducing the cost impact. Sorting Through the Confusion P a g e 11 |
  • 13. Primary Storage SNAPS Primary storage SNAPS (a quick logical copy or snapshot) are useful primarily for short-term retention. They are just the first line of defense in a layered backup scheme that includes long- term backups. SNAPS save changed storage blocks on a periodic basis (e.g. hourly) that allow for roll back to the last period. Primary storage SNAPS are not intended for long-term or historical backup. Strengths  SNAPS allow rolling back to earlier points and are more granular than a nightly backup Figure 5 - SNAPS Concept  SNAPS can be replicated offsite for disaster recovery of short-term, periodic SNAP points Weaknesses  SNAPs write into the same volume as the primary data so they do not offer protection against a system crash, virus attack, data corruption or other event that destroys the primary data. The SNAPs would get destroyed along with the primary data. This is why 99% of IT environments keep a backup copy on a separate system onsite (tape or disk).  SNAPs are not good for long-term retention uses such as legal discovery, regulatory compliance or SEC audits. When years of retention are required, a traditional backup approach is required due to the need to store data at specific points in time but not every interval in between, such as monthly backups for 3 years and then yearly backups for 4 additional years. Summary  Primary Storage SNAPS and long-term traditional backup can co-exist as part of a multi- layered approach to backup tailored to the specific requirements of the business. Primary Storage SNAPS provide for fine-granularity backup points onsite and also offsite, if replicated.  It is estimated that 99% of IT environments use a traditional, longer-term backup system. About 50% of IT environments deploy some type of Primary SNAPs as well. Sorting Through the Confusion P a g e 12 |
  • 14. Backup Application Deduplication in the Media Server Some backup applications have a data deduplication feature that can be deployed as an agent in the media server. The intent is to be able to eliminate tape using standard disk in conjunction with the backup application. Data deduplication is a very compute intensive process. If deduplication is run in the media server, resource utilization will increase significantly. This can slow backups down dramatically. To avoid this hit to overall backup performance, backup software uses a form of deduplication that results in a Figure 6 - Running Deduplication on Media Server lower reduction rate. Using the least possible processor and memory resources for the deduplication process avoids starving the media server tasks of resources, but at the cost of lowering deduplication performance. Typically this approach uses 64KB or 128KB fixed blocks and will yield a data reduction ratio of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. In addition, software deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Some vendors bundle the media server software on a storage server that includes a CPU, memory and disk. This does not change the deduplication rate or the heterogeneous nature of the solution. Strengths  Relatively simple to manage through the backup application  Good for environments that have less than 3TB of data to backup, use a single backup application and do not plan to replicate to a second site for disaster recovery Weaknesses  Disk usage is high as the deduplication ratio is only 6-7:1. Over time the disk space required grows sharply. Sorting Through the Confusion P a g e 13 |
  • 15. Bandwidth needed to send backups to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment- block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach.  Cannot deduplicate data from:  Veeam, VizionCore  Lightspeed, SQL Safe, Redgate  Direct SQL Dumps, Direct Oracle RMAN Dumps  Bridgehead for Meditech data  Direct UNIX TAR files  Other traditional backup applications Summary  Deduplication in the backup software is good for short-term retention and low amounts of data in environments that are not heterogeneous and where offsite disaster recovery data is not required. Sorting Through the Confusion P a g e 14 |
  • 16. Backup Application Client Side Deduplication Some industry backup applications offer a form of data deduplication in the application server agents or clients. The intent is to be able to eliminate tape using standard disk along with the backup application. The deduplication occurs at the backup agent/client on each application server. Data deduplication is a very compute intensive process. Resource utilization will increase significantly if deduplication is run in the application server (client side), and slow down backups dramatically. To minimize this impact, client side deduplication software approaches use a less-efficient form of Figure 7 - Client Side Deduplication deduplication. Typically they use 64KB or 128KB fixed blocks where they achieve a data reduction rate of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment- block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. Running a compute intensive deduplication process on your applications servers creates other performance and availability challenges. Furthermore, databases and email, which are 80% of the Monday through Thursday backups, are still sent as full backups. This means that only 20% of the nightly data is actually deduplicated, by client side deduplication, during the week. The true impact is on the Friday night full backup, where 80% of the data is unstructured file data. In addition, the software approach to deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Strengths  Great fit for deduplicating data from small remote sites, then replicating it back to a corporate datacenter for backup.  This approach can shorten the backup window, but only on the Friday full backup. During the week, backups are still full backups for data base and email. Weaknesses  Requires new agents on servers; added risk and cost of changing agents. Sorting Through the Confusion P a g e 15 |
  • 17. Deduplication ratio is only 6-7:1 and the disk space required increases quickly. Bandwidth usage to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content splitting average from about 20: 1 to 50: 1 data reduction ratio, or at a minimum three times that of software deduplication.  Cannot deduplicate data from:  Veeam, VizionCore  Lightspeed, SQL Safe, Redgate  Direct SQL Dumps, Direct Oracle RMAN Dumps  Bridgehead for Meditech data  Direct UNIX TAR files  Other traditional backup applications Summary  Very good for replicated remote site data back to a corporate datacenter  Very few businesses actually use this approach due to its risk to application servers and weaknesses Sorting Through the Confusion P a g e 16 |
  • 18. Purpose-Built Target Side Deduplication Appliances Target-side deduplication appliances are built specifically to replace the tape library in the backup process onsite and, optionally, offsite. Because they are dedicated appliances, the hardware and the deduplication methods used can be optimized for that single purpose. Future disk space requirements to deal with data growth are drastically reduced because deduplication ratios from 20:1 to as much as 50:1 can be achieved, Only the data that changes, about 2% of the backup size, is replicated offsite and requires far less bandwidth. In addition, target-side appliances can process data from a variety of utilities and backup applications. Strengths  No change to your backup environment. Use all backup applications, utilities and dumps you are currently using.  Can take in data from: Figure 8 - Target Side Deduplication Appliance  Traditional backup applications  Veeam, VizionCore  Lightspeed, Redgate, SQLSafe  SQL Dumps, Oracle RMAN dumps  Direct UNIX TAR files  Many other backup applications and utilities  20:1 to as much as 50:1 deduplication ratios use less disk space and far less bandwidth for replication.  Special features for:  Tracking data to offsite Disaster Recovery  Improving Disaster Recovery RPO (recovery point objective) and RTO (recover time objective)  Purging data as the retention policy calls for aging out data Weaknesses  Backup window improves over using a tape library, but not by as much as client side deduplication for the Friday night full backup Sorting Through the Confusion P a g e 17 |
  • 19. Summary When evaluating different approaches to replacing tape with disk, take the time to ask the right questions and understand the strengths and weaknesses of each alternative. Sorting Through the Confusion P a g e 18 |
  • 20. About ExaGrid ExaGrid is the leader in cost-effective disk-based backup solutions. A highly scalable system that works with existing backup applications, the ExaGrid system is ideal for companies looking to quickly eliminate the hassles of tape backup while reducing their existing backup windows. ExaGrid’s innovative approach minimizes the amount of data to be stored by providing standard data compression for the most recent backups along with zone-level data deduplication technology for all previous backups. Customers can deploy ExaGrid at primary and secondary sites to supplement or eliminate offsite tapes with live data repositories or for disaster recovery. With offices and distribution worldwide, ExaGrid has more than 3,500 systems installed and hundreds of published customer success stories and testimonial videos available at www.exagrid.com. Sorting Through the Confusion P a g e 19 |
  • 21. ExaGrid Systems, Inc | 2000 West Park Drive | Westborough, MA 01581 | 1-800-868-6985 | www.exagrid.com © 2011 ExaGrid Systems, Inc. All rights reserved. ExaGrid is a registered trademark of ExaGrid Systems, Inc.