It has now been over a month since the OVHcloud’s SBG2 data centre in Strasbourg went up in flames, sending shockwaves through the IT services industry. We have come to expect HDDs to fail at some point and accept that our home broadband my be a little unstable, but years of stability has meant that far too many of us have just come to expect that data centres were bulletproof. The OVH fire has proved that even huge data centres can be wiped off the map in a matter of minutes and as we rely more and more on digital services, we can hope for the best but need to plan for the worst.
Whilst the fire was still raging, OVH founder Octave Klaba tweeted: “We recommend to activate your Disaster Recovery Plan” and scared many business owners who up until that point may have not even considered that they should have a disaster recovery plan in place, and those that did probably had little idea how to activate it. So, what have we learned from this disaster?
We have a major incident on SBG2. The fire declared in the building. Firefighters were immediately on the scene but could not control the fire in SBG2. The whole site has been isolated which impacts all services in SGB1-4. We recommend to activate your Disaster Recovery Plan.
— Octave Klaba (@olesovhcom) March 10, 2021
You need to cerate a disaster recovery plan *now*
Businesses need to plan for a number of eventualities, and the OVH fire has demonstrated that one such eventuality is a complete data centre disaster. Data centres are very reliable and these sorts of incidents are thankfully very rare, but the scale of the problem means that every business needs to get a disaster recovery plan in place as soon as possible.
All IT industry veterans know that whilst we strive for 100 per cent reliability, that will never be possible in reality and we need plans and processes in place to make sure any issues that do arise can be isolated and managed to get services running quickly. We have backup power supplies and fire prevention systems in place, but if something big happens digital businesses need to be able to shift their entire operations to a different data centre and resume operations.
A disaster recovery plan needs to take account for all the possible risks and create a path to resuming operations if everything goes wrong. In general, only one or two problems on that list will probably happen at once, but you need to be prepared if everything goes down together.
Understand the risks
If your business doesn’t have such a plan in place, this sort of planning needs to be done today and if you don’t have the expertise in-house then this is where IT service consultants are worth every penny. This plan needs to work.
You need to know what your most important files and databases are and where and how they are stored. And then you need to know where the most recent backups (hopefully daily) are located and where any longer-term backups (monthly) are stored, and how best to restore that data so your company can resume operations.
Critically, you also need to understand the risk of each problem arising. HDD failures are relatively common, and so it is useful to have that data backed up locally to another disk, so that switching between them in the case of such a failure is quick and relatively seamless.
However, the difference with a total system collapse risk, such as a data centre fire, is that if you have only backed up to another disk on the same server, or even another server at the same facility, then you are out of luck. You need offsite backups as well as server-based and in-facility options.
See what your provider offers
What is interesting in the case of the OVH fire, is who is expected to keep the backups. You should *always* keep some backups yourself, but most cloud providers will offer some form of backup service themselves.
In the case of the SBG2 data centre, however, many companies used the data centre to host their own “bare metal” servers. In this case, OVH provides no form of backups – these are entirely private servers that individual businesses have full control of, and they are on the hook for all the backups themselves. Sadly, some of those that used such bare metal servers did not plan for a data centre fire, and they are the ones that are facing complete data loss.
OVH has pulled out all the stops for their cloud customers and are recovering as much data as they possibly can, so those hosted by the OVHcloud in SBG2 have got another line of defence, but if the fire has demonstrated anything it is that we all need multiple lines of defence and multiple backups stored around the world.