A cautionary tale of relying on the automatic backups in SharePoint Online

Microsoft Stores Backups For 14 Days, But Restores Them in 15

So Microsoft keeps 14-day rolling backups of your SharePoint Online sites. That’s awesome – no need to take backups anymore, right?

Not so fast. It’s not always so easy, and by just relying on these backups, you risk losing your data. Forever, I might add.

This cautionary tale is about SharePoint Online, but I’d say you’ll need to take caution anytime you’re dealing with Microsoft’s automatic backups. The story starts with the client doing something unwise – a prime example would be them removing the root web of their classic SharePoint Site Collection (don’t do that!). 

Preface

A few days ago I got an interesting email newsletter from the folks at SPTechCon. They shared a blog post by Backupify, a cloud backup company. The blog post is pretty commercial, and normally, I’d dismiss the post as marketing, but this time it actually reminded me of a fairly difficult experience I had with Microsoft’s included backup functionality in SharePoint Online.

Data security is not just about being safe from data breaches, but it’s also about conserving the data – keeping it safe from accidental removal! In that process, backups are extremely important, and luckily they are conveniently included in your normal Office 365 subscription. 

Or maybe conveniently is a bit of a strong word here…

The root site was removed, the client is agitated, what’s next?

This is an action, that’s tricky to undo – you can’t access recycle bin, and Restore-SPOSite doesn’t seem to do much. You’ll need to contact Microsoft Support to resolve the case!

This’ll be quite a ranty post (sorry about that!), but for a timeline and overview, you can see the TL: DR;-version here.

On to the case!

Well, I jumped in, noticed the Restore-SPODeletedSite didn’t do anything and opened a support ticket on 29.10. The site had been removed for about 6 days (some of that time we were expecting the Restore-SPODeletedSite to finish, some of it was just general confusion after the removal). At this point, it’s a good idea to emphasize, that the site was a non-critical test/dev system – not important enough for the customer to completely lose their mind, but important enough to try and restore it.

The next day I got a response:

Please verify if you need help with a SharePoint site collection, or public facing site (web site) issue

I didn’t even think public -facing sites were supported in SharePoint Online anymore, but I replied to them with the url, and the fact that I needed the site to be restored to a certain date using their backups, and told them Restore-SPODeletedSite didn’t work. The next day I got a response only stating, that I’d need to restore the site myself by browsing to the recycle bin (not possible) or by running Restore-SPODeletedSite (didn’t work, which I already told them). This exchange happened 30-31.10.

I was getting confused. I replied again, that I wanted the backup to be restored, and couldn’t use recycle bin, to which the support told me to browse to the recycle bin. All in all, this continued for a few days (and a lot of emails), after which I got a call late in the night from someone working in Microsoft Support, I was able to convince him I actually needed the backup to be restored, and was finally put into some kind of backup restore queue, and got a long-ish email asking me for the requested restore date and url of the site.

Let’s play this safe!

Since I knew at that point, that the site had been removed 8 days ago, I asked for us to go back 10 days. I requested “21.10.2017 12:00 (UTC).” This was late 31.10.

I thought that would be it. But for about 6 days, the site was still unrestored, and my emails went unanswered. I then got an email confirming my request had been received(!), which prompted me to ask them to hurry. I got the following response:

The data in the recycle bin will be secure for that date 11-2-17

I will send you another update for this issue since we are still working on the restore process.

This was on 6.11. Soon after, I got another email from them on 7.11.:

I need another date and time from you since 10-21-17 is out of range. I need another date today as soon as possible to submit the restore.

After sitting on the ticket for over a week, after ensuring me the restore was ok, my original requested date was (of course) outside the 2-week restore window, and they requested a new date. I was quite disappointed – so much for the 14-day backups!

So – all is now lost?

Now we already blasted through the 14-day limit of backups, and all hope was lost.

In any case, I didn’t want to give up anymore. I responded “the oldest possible”, of course – to which they responded “24.10.” That was already too late, as the site had been deleted roughly 23.10. We gave up, started recreating the site that was removed, but I still gave Microsoft Support a go-ahead to restore 24.10. just to see what happens.

The next response kind of blew my mind – they actually sat on my email for a day, and on 8.11. replied:

I need you to confirm the date so we can use it for the restore process for 10-25-17 which puts you in the (2 week time period)

For example: today is 11-7-17 and 2 weeks ago was 10-25-17

Please reply to this email and just list this date today.

I am sorry for the delay and will submit your request again today

Still more stalling. I just responded “10-25-17”, as I had given up on the content, but was interested in seeing what will happen. This was on 8.11., so roughly 16 days after the site had been removed, and 10 days after the ticket had been opened, and Microsoft Support had still failed to do anything about the restore after missing one restore date as they sat on the ticket for 6 days.

Can even escalating the case help us this time around?

Weirdly enough, I now got an email from a second person:

My name is X, an escalations engineer with Office 365 support. I’m writing to let you know that I’ve filed an internal ticket with our SharePoint engineers to investigate and resolve your site recovery issue. Just to be clear, unfortunately there was a miscommunication regarding the time frame mentioned earlier; the 14-day limit wouldn’t apply here. This seems to be a provisioning issue as the site collection was undeleted. However, it may take up to 48 hours at most before I have any new info, but as soon as I do you’ll be the first to know. If you have any additional questions or concerns, please don’t hesitate to reach out.

Oh – that sounded promising! I thanked for his contact and continued rebuilding anyway. This happened late 8.10.

14.11. the 25.10. version of the site was seemingly restored. Weirdly enough, that was an empty publishing portal. That’s not really what should’ve happened – you’d think we’d have gotten an empty SharePoint site collection without a root site if something, but nope.

16.11. I got another message from the escalations engineer – They’d found a duplicate site (caused by a failed Restore-SPODeletedSite), removed that, and somehow restored the RIGHT site instead. This was already 24 days after the site was removed.

Too long, didn’t read -version

Okay, so this is for those of you that don’t like rants… 🙂

Key takeaways

  1. Microsoft actually seems to have backups for a longer time than 14 days – but don’t rely on them!
  2. Restoring backups might take weeks! If you have a premier support plan, it might be a good idea to use that…
  3. For any critical data, use an external backup solution! There are multiple to choose from, and I’m not in a position to give any recommendations.
  4. This story was probably more or less a worst-case scenario – I’ve understood, that usually restoring backups goes without a hitch.

TL;DR -Timeline

  • 23.10. – the client removed the root site.
  • A few days after that, they contacted me, and I tried restoring it with Restore-SPODeletedSite
  • 29.10. we opened the ticket to restore the site, since Restore-SPODeletedSite didn’t work
  • A lot of back-and-forth, since the support personnel wouldn’t believe Restore-SPODeletedSite didn’t work and I couldn’t access the Site Collection Recycle bin
  • 31.10. I got an ok for a restore to an earlier version of the site, and requested 21.10. version of the site to be restored
  • 6.11. support person let me know, that site restore is progressing and date should be ok
  • 7.11. support person let me know, that requested restore date was outside the window, proposes a new date, which I accept
    • At this point the restore is already too late. I continue out of curiosity.
  • 8.11. support person let me know, that their proposed date is now outside the window, and proposes a new date, which I (begrudgingly) accept
  • 8.11. an escalation engineer promises to look into it
  • 14.11. the escalations team seemingly restored the 25.10. version, e.g. an empty site. I pointed this out, and they continued the investigation.
  • 16.11. the escalations team restored the original version (pre-23.10.)! All is good, except for the long wait (and hence downtime)

So, the 100% downtime-causing issue was fixed after 3 weeks, by restoring a backup that was over 3 weeks old, even though Microsoft only promises to save the backups for 2 weeks.

This blog post could’be been titled “Restore-SPODeletedSite failed – what to do”

Unfortunately, I think this is ‘just one of those things’. I even got a verification for this from the escalations team – there was no real reason why the commandlet to restore the site originally failed. So using Restore-SPODeletedSite is fine, and should usually work, but it seems to have just gotten stuck this time around.

However, the Restore-SPODeletedSite DID do something. It apparently created an empty site! This should simply lead to an error, but according to the escalations team, this time it lead to “both a new one and the old one stuck together – that’s why [they] were able to restore the content when [they] did.”

I’m not sure if the explanation I got made any sense, but at least the restore finally worked. Which, in the end, is confusing, as we were over the restore window already!

The following two tabs change content below.
Antti Koskela is a proud digital native nomadic millenial full stack developer (is that enough funny buzzwords? That's definitely enough funny buzzwords!), who works as a Solutions Architect for Valo Intranet, the product that will make you fall in love with your intranet. Working with the global partner network, he's responsible for the success of Valo deployments happening all around the world. He's been a developer from 2004 (starting with PHP and Java), and he's been bending and twisting SharePoint into different shapes since MOSS. Nowadays he's not only working on SharePoint, but also on .NET projects, Azure, Office 365 and a lot of other stuff.This is his personal professional (professional, but definitely personal) blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.