Practical framework to help SharePoint Administrators understand and develop a successful SharePoint recovery plan. Focus is on pragmatic "what and how" for an administrator rather than academic theory or burdensome project management structure.
Intros….
We created this webinar because we realized there’s not a lot out there on the subject. We’re going to be talking about the meat and potatoes of SharePoint backup and restore – what is necessary to prepare, what are some gotchas, and what are some solutions. I want you all to be able to either create and validate your disaster recovery solution for SharePoint. We all may be in different situations at work – maybe your systems or infrastructure teams are handling the backup process. But you are the one who own SharePoint- you need to make sure that your baby is protected.
That is precisely the use case that Michael will be sharing with us in a bit.
I promise you, this webinar will not be a product pitch but at the end, I will briefly show how a 3rd party product like Metalogix backup can help fill in the gaps.
Alrighty, on to the subject at hand…
Click…
That’s really going to be the theme of todays webinar- to get involved in the strategy. Backup and recovery can be a dry subject. Let’s be honest, it’s not the sexiest thing talk about. It’s pretty doom and gloom. It’s like buying life insurance. But if you’re prepared, not only technically with what steps you’re going to take, but have also prepared the business, then there’s nothing to worry about. You can sleep well knowing your baby is safe.
Click…
So today, we’ll either validate the steps you’ve already taken or we’ll fill in the gaps that have been left out of SharePoint’s recovery objectives. We’ll discuss:
How you’re involved in the process as a SharePoint admin, architect, DBA, or whatever your role may be. A lot of us may have just had SharePoint dumped on us and we’re treating like any other application we have. If you haven’t realized it yet, SharePoint is a massive platform with a lot of moving parts. It does not fit into a one size fits all recovery plan.
We’ll go into about 8 or so pieces that will make you successful. The theme here is going to be focused on creating the right backup strategy for the business. What you’ll notice is that from a technology point of view, backup is pretty simple and you have a lot of options. I always say that backup is a science while recovery is an art. I tell clients and prospects this all the time because anyone can backup SharePoint with a few clicks or scripts. But a surprising few think about preparing for the recovery- like can I restore everything I need? How long will it take to recover from a complete hardware malfunction versus a more common scenario like how long the loss of content will take to restore? When we go through creating a plan here in a bit, the backup piece involves working with the business to come up with a solution while the recovery piece is heavily driven by the technology you choose.
Click…
From there, we’ll go into what options you have (technology-wise) and what will be a good fit for you. Everything from what’s free OOTB to product add ons.
Click…
Then, once you’ve weighed your options and understand how many man hours you are willing to put in and how much you’re willing to spend, we’ll discuss who to involve so that your backup and recovery plan becomes a sharply honed process.
The goal is to have the smallest RPO possible
Let’s look at an example and discuss some of Microsoft best practices. First of, MSFT recommends not using OOTB backup tools for content DBs > 200 GB due to risk of missing backup windows and Recover Point Objectives.
Microsoft tested native SharePoint and SQL backup and was able to back up only 600 gigabytes in six hours using a high-end server. In my opinion, a 600 GB database is smaller than a typical SharePoint farm these days.
MS even states that if you’re using OOTB techniques, limit your content database size to 100 GB and site collection backup should not be used on anything larger than 85 GB.
That means you can’t granularly restore content reliably if a site collection is too large. It’s bc the process is too resource intensive and will take too long.
You need to understand your role. Are you responsible if the power goes out and you can’t even communicate with your servers? That’s probably not your responsibility and it will be handled by the infrastructure team. But that team is taking backups and you need to know what state they can get you back to after a disaster.
Ask them what tools they’re using and work with them to find the limitations they have with SharePoint
Maybe they don’t support only restoring individual content dbs, or individual objects like sites, lists and items in full fidelity- with workflow, versions, and such.
This is a true test of a good recovery strategy. End users can be demanding, and often don’t care about the limitations of SharePoint and its toolset. There are always particular industries – I’m looking at the world of finance and law and such verticals – who’s users are not very understanding and demand a very low RTO.
So where do you start?
There’s a lot that must be prepared so we’ll run through the major planning aspects from who to involve to analyzing what you can do to lessen your risk
Get yourself an Executive sponsor
Find a tech savvy executive to roll out your plan as part of overall SharePoint governance
As I said in the opening objectives, the backup piece of your strategy involves more human interaction than working with the technology.
You just need to decide what is possible- here are the RPOs and RTOs we can technically meet, and the business must decide if that aligns with the knowledge workers.
This way, you’re not making decisions in a vacuum.
Create the dreaded Service Level Agreement (SLA) and get it signed off by executive sponsors and other stakeholders
Define acceptable levels of what can be recovered and how long that recovery process will take.
Define certain locations of SharePoint that you know contain documents that are heavily edited and constantly changing and may need to be backed up more often than stale locations.
Keep in mind to separate disaster from day to day recovery…The majority of recovery work you’ll do is simple document, list or site recovery. These are simple to backup and restore on the surface but they will probably be different from environment to environment. Maybe there’s a customization you didn’t even know about because it was implemented before you got there. Are there certain dependencies on this content that need to be restored as well or are is the business ok with simply getting the document back online?
All of this needs to be considered and documented in the SLA
Document the ownership of tasks, responsibilities, demarcation points, and handoffs
This will involve different parts of the business. We need to ask questions and assign responsibilities to the different part of the business who have a stake in SharePoint. Have quarterly meetings with key stakeholders in each department. Go over their section of the farm to know what is important to them.
Give ownership so the SLA can be updated and signed off by these folks. These people can also be your QA after a restore is finished.
Taking these steps will involve making the SLA a public document. The onus does not fall squarely on you once this document is public. You’ve provided steps to take along the way from a backup plan to a restore methodology but it’s a team effort that all are aware of.
Next, Conduct an impact and risk analysis of current environment
For example, I had a client who had a site collection that housed financial reports and data for the executive team. Content was added to it every day between 8am and noon. The admin knew he needed the shortest RPO possible on this section of the farm. After discussing the actual business case with the executives, he realized that the content was only being added to the site for read only publication and was not being edited AND it wasn’t the only system of record of the document. It wasn’t SharePoint’s responsibility to immediately back up these documents and once a day would suffice for the business. He saved himself a lot of time and money by having a conversation.
On the other hand, you may have some databases or sites that need to backed up more often then the rest of your farm. Ensure that these sections of the farm get special treatment and have shorter RPOs.
Yes, you must prove it! All this planning and research you’ve done needs to be tested and constantly reviewed….
Proving out your SLA goes a long way so you wont have to act like this guy in the cartoon and scream for help.
Did I mention you need to prove it?! I can’t stress this point enough- mostly because it’s rarely accomplished: Conduct ongoing fire drills!
Continually review the outcomes. SharePoint has probably changed since you last did a test and you don’t want to be caught with your pants down.
Once you conduct fire drills, and based on their outcomes, update the SLA document with the changes to the farm and changes to RPOs and RTOs…..Then you can start the process all over again…
It will be worth it in the end. You’re not ignoring the fact that a loss of data will happen. It’s when, not if. And if you’ve put these processes into place, you’ll be prepared.
My clients always find something to change when conducting fire drills. I once worked with someone who noticed that their backup sets were corrupted but there was no clue in the actual backup file itself. It was only surfaced once the restore failed. This ended up being exponentially detrimental because he was using incremental backups. To remind you, incremental backups go back to the last backup taken no matter what type, and only backup what has changed. This means that if one backup is corrupted, the rest of the future backups are useless since they’re all dependent on each other.
A best practice I usually recommend is to conduct these fire drills on a secondary recovery environment. In a true disaster, you can even restore the databases to this recovery environment and redirect users there.
Don’t forget that a large portion of a successful strategy involves communication and ensuring the expectations of the business is set. The point of preparing these documents and doing all this planning is that the expectations of the users don’t outstep reality. Your job is on the line and you need a way to ensure end users to not get angry because of their outlandish expectations.
Funny story, I had a client who would send an email that said “Site back, relax and grab a cup of coffee. Your content will be with you shortly” when he received a ticket to restore lost data. Attached to the email was the SLA.
Setting expectations to the business is key to your strategy. It’s not all about the technology.
A review of what we’ve been discussing but use this as your checklist…
Attain an Executive sponsor
Service Level Agreement (SLA) signed off by executive sponsor and stakeholders
Documented ownership of tasks, responsibilities, demarcation points, and handoffs
Conduct risk analysis of current environment
Fully tested, documented and sign off
Ongoing fire drills and updates to SLA
So those recommendations are well and good but it doesn’t address the root of the problem. BLOBs or Binary Large Objects make up 90-95% of your databases. This is the content. It’s considered the unstructured data while the metadata of the document is the structured portion. Basically, BLOBs are what is growing your environment so rapidly and extending your RPOs.
The interesting thing about them is that blobs are immutable. They are never updated, only created and delete. If they never change, why do we have to back them up every time? You don’t. Think about it- every time you back your farm up for granularly recovery, you’re backing up data you know has not changed. In reality, you only need to backup the blobs that have been added to SharePoint because new documents were added or existing documents were edited.
Microsoft provides a couple of different libraries to externalize these BLOBs outside of the content database. Docs can be put on devices that have faster read/writes, i/o, and are much cheaper. This is a win for performance and cost and if done correctly, can drastically shorten your RPOs.
Metalogix backup integrates with our or RBS product called StoragePoint. With StoragePoint, blobs are backed up continuously as they are added to SharePoint. They are not included as part of the backup because they have not changed. Thus, once a restore is initiated, a call will be made to grab the BLOB based on the pointer to it in the database.
Your backup time is drastically reduced because you only have to backup your databases now. And guess what, once BLOBs are externalized, your database shrinks by 95%. Thus your backup window shrinks drastically.
So back to our example a couple of slides ago, your 1 TB content database has become 50GB. BLOBs are immediately backed up and ready at any point to be restored.
Clearly this changes your backup strategy so any RBS product you look at should do a couple of things:
1) Retain blobs even if they’re deleted in SharePoint for a specified amount of time. This means that you can continue to use out of the box methods for restores. Let’s say you define a blob retention period of 30 days. This means you can restore your SQL or SharePoint backups from anytime within the last 30 days and those backups will have references to blobs that have been retained.
Or
2) Ensure your RBS product integrates with your 3rd party backup product. Obviously this provides more automation and can really be the subject of a whole other webinar.