Disaster Recovery Plan: Would Your SMB Survive a Crash Tomorrow Morning?

A DRP isn't a 200-page document. It's knowing what to do when everything goes down.

TL;DR: A disaster recovery plan (DRP) defines how your organization gets back up and running after a major incident. You don't need a 200-page document: identify your critical systems, set your recovery objectives (RPO/RTO), document your restoration procedures, and test the whole thing at least once a year. If your DRP is "we'll call the IT guy," you don't have a DRP.

Your server catches fire tomorrow morning at 3 a.m. Your accountant arrives at the office at 8 a.m. and nothing works. Your email is dead, your ERP won't respond, your website is showing an error. What happens next?

If the answer is "I don't know," you're not alone. Nearly half of all SMBs have no formal plan for this kind of situation. And fire isn't even the most likely scenario: ransomware that encrypts everything, a hard drive that fails, an update that goes sideways, or simply someone deleting the wrong folder by accident. It happens every day.

The problem is that most people confuse "having backups" with "having a recovery plan." It's like saying "I have a fire extinguisher, so I have an evacuation plan." Backups are one ingredient. The DRP is the full recipe.

A DRP, the short version

A disaster recovery plan (DRP) is a document that answers one simple question: if everything goes down, who does what, in what order, to get the organization running again?

That includes backups, yes. But also: who has the access to restore? Which machine do we restore to if the server is dead? How long does it take? Do we lose data, and how much? Who calls the clients to warn them? Who contacts the hosting provider?

A DRP isn't a theoretical document you write once and stick in a drawer. It's a living tool that has to be tested, kept up to date, and known by the people who will execute it. If the only person who knows how to restore the backups is on vacation in Mexico when the server crashes, your DRP is worthless.

The scenarios that actually happen

We tend to think of "disaster" as a fire or a flood. It's possible, but it's far from the most common. Here's what we see in the field:

Ransomware: an employee clicks on the wrong attachment, and within a few hours all your files are encrypted. The attacker demands a ransom. Even if you pay (which we don't recommend), there's no guarantee you'll get your data back. It's the most common and most devastating scenario for SMBs. If your backups are on the same network, they're probably encrypted too.

Hardware failure: a hard drive that fails, a RAID controller that goes bad, a power supply that burns out. It happens without warning. If your server is 5 years old and has never been replaced, it's a question of when, not if.

Human error: someone deletes the wrong database, overwrites a critical file, or runs an update that breaks everything. It's more frequent than you'd think, and it's often the hardest to detect quickly.

Cloud provider outage: your host has a problem, your SaaS provider is down, or your email provider goes offline. You have no control over it, but you suffer the impact anyway. And no, Google and Microsoft don't guarantee your data.

Physical disaster: fire, flood, water damage, equipment theft. Less frequent, but when it happens, it's total. If your only backup is on a hard drive in the same office as your server, you lose everything.

RPO and RTO: the two numbers to know

Before you plan anything, you have to answer two fundamental questions:

RPO (Recovery Point Objective): how much data can you afford to lose? If your last backup was last night and the server goes down at 4 p.m., you lose a full day of work. For some organizations, that's acceptable. For others, losing an hour of transactions is catastrophic. The RPO dictates how often you back up.

RTO (Recovery Time Objective): how long can you afford to be down? If it takes 4 hours to restore everything, will your organization survive? Will your clients accept 4 hours without service? The RTO dictates your recovery strategy: do you restore to the same server (slower), fail over to a standby server (faster), or run real-time replication (the fastest, but the most expensive)?

The thing is, the RPO and RTO aren't the same for all your systems. Your marketing website can probably be down for 24 hours without a catastrophe. Your billing system or your email is another story.

How much does an outage cost?

Recent studies estimate that downtime costs between $8,000 and $25,000 an hour for a typical SMB. But that figure hides a lot of nuance.

To calculate your own cost, think about:

Factor	How to calculate it
Lost productivity	Number of affected employees x average hourly wage x duration of the outage
Direct revenue loss	Daily revenue / business hours x duration of the outage
Restoration costs	Hours of technical work (in-house + external consultants)
Lost clients	Hard to quantify, but real: a client who can't reach you calls a competitor
Reputational damage	Long-term impact on the trust of clients and partners
Contractual penalties	If you have service-level agreements (SLAs) with your clients

For a 20-person SMB with an average wage of $30/h, lost productivity alone represents $600/h. Add in lost revenue and restoration costs, and a full day of downtime can easily exceed $10,000. That's often more than the cost of a complete DRP.

The minimum viable DRP for an SMB

A 20-person SMB doesn't need the same DRP as a bank. But it does need a DRP. Here's the bare minimum:

1. Inventory of critical systems. Make a list of everything your organization needs to function: email, ERP, accounting, file sharing, website, telephony. Rank them by priority. What has to come back first?

2. Recovery objectives. For each critical system, define your RPO and your RTO. Be realistic: a 15-minute RTO when your only strategy is to restore a backup onto a new server is fiction.

3. A backup strategy that follows the 3-2-1 rule. Three copies of your data, on two different types of media, with one copy off-site. If all your backups are on the same server, or in the same office, it's not a plan, it's wishful thinking. We'll cover the 3-2-1 rule in detail in an upcoming article.

4. Documented restoration procedures. Not "we'll restore the backup." Actual steps: which server, with which account, which command, in what order. Detailed enough that someone other than your lead technician could do it.

5. Emergency contact list. Who do you call first? The IT lead, the hosting provider, the backup provider, management. With phone numbers, not just emails (because if the email server is down...).

6. Communication plan. How do you let your employees know the systems are down? How do you warn your clients? Having a message template prepared in advance keeps you from panicking and sending an awkward email in the middle of a crisis.

Roles and responsibilities: who does what?

A DRP with no designated owner is an orphaned document. For every step of the plan, someone has to be named. And you need a backup person, because the primary one could be unavailable when disaster strikes.

At a minimum, you need:

A decision owner: the person who declares that yes, we are in a disaster situation, and who authorizes execution of the DRP. Usually the general management or the IT lead.

A technical owner: the person who carries out the restorations, fails systems over, and coordinates the technical aspects. It's often your in-house IT person or your managed services provider.

A communications owner: the person who informs the employees, the clients, the partners. It's a role that often gets forgotten, but in the middle of a crisis, communication makes all the difference.

Each of these roles needs a clearly identified backup person, with the same access and the same knowledge.

Testing the plan: everyone's weak spot

The reality is that most SMBs that have a DRP have never tested it. It's like having a fire extinguisher you've never checked. Maybe it works. Maybe not.

Testing a DRP means simulating a disaster and following the procedures for real. Not reading the document in a meeting: actually restoring a backup, verifying the data is there, measuring how long it takes.

Ideally, you test at least once a year. Each test will reveal problems: a backup that didn't contain what you thought, a procedure that no longer works because the system changed, a password that's been modified since the last version of the document.

After each test, update the DRP. A DRP that's 2 years old and has never been tested gives a false sense of security: it's almost worse than not having one at all.

The documentation: what to write, where to put it

The content of the DRP should be concise and practical. Not a novel: a procedures guide. Here's what it should contain:

System inventory with connection details, server locations, software versions. For each system: RPO, RTO, owner, restoration procedure.

Restoration procedures step by step, with the exact commands if it's technical. Someone who has never done the procedure should be able to follow it.

Emergency contacts with phone numbers (not just email), availability, and alternates.

Communication templates ready to send to employees, clients, partners.

And the crucial point: don't store the DRP only on the server it's supposed to protect. If your DRP is on the server that just caught fire, it doesn't help you much. Keep a printed copy accessible, a copy in the cloud (separate from your main infrastructure), and ideally a copy with the director or the IT lead.

Law 25: your obligations in case of an incident

If your disaster involves a breach of personal information (ransomware that exfiltrates data before encrypting it, a server theft, etc.), Law 25 imposes legal obligations on you.

If the incident presents a risk of serious injury to the people affected, you have to notify the Commission d'accès à l'information (CAI) as soon as possible, and notify the affected individuals. You also have to keep a register of all confidentiality incidents and retain it for 5 years.

The fines are significant: up to $10 million or 2% of worldwide revenue for administrative violations. That's no small thing for an SMB.

The connection to the DRP? Your plan should include a specific procedure for incidents involving personal information: who assesses whether it's a reportable incident, who drafts the notice to the CAI, who contacts the affected individuals. In the middle of a crisis, you don't have time to figure out how to do it.

The open-source tools that help

You don't need $50,000 software to put a solid DRP in place. Several open-source tools do the job very well:

BorgBackup: backup with deduplication and encryption. Extremely efficient for incremental backups. If you back up 500 GB of data but only 2 GB change per day, Borg only backs up the 2 GB that changed. Ideal for local and SSH backups.

Restic: similar to Borg, but with native support for cloud storage (S3, Backblaze B2, Azure, SFTP). Simpler to deploy in a multi-site context. A good choice if your off-site backup goes to the cloud.

Proxmox VE: open-source virtualization that lets you take snapshots of your virtual machines. If your server is virtualized, a snapshot before an update lets you roll back in a few minutes if it goes wrong. Proxmox Backup Server (PBS) rounds it out with incremental VM backups.

Ansible: server configuration automation. If your server is configured with Ansible, you can rebuild an identical server from scratch by running a script. Instead of spending 8 hours reinstalling everything by hand, it's done in 30 minutes. It's an upfront investment, but in a disaster, it changes everything.

These tools don't replace the DRP: they're the building blocks you use to build it.

What a DRP doesn't solve

A well-built DRP is essential. But you have to be honest about its limits:

A DRP doesn't prevent disasters. It prepares you to respond to them. Prevention comes from hardening your systems, good antimalware, and training your employees.

A DRP on paper that has never been tested gives a false sense of security. A simple, tested DRP beats an exhaustive document that no one has validated.

A DRP doesn't make up for inadequate backups. If your backups don't contain the right data, or if they aren't tested regularly, the best plan in the world won't save you.

Finally, a DRP is only as good as its last update. If you've changed servers, providers, or software since the last version, it's probably already out of date.

Minimum DRP checklist for an SMB:

Inventory of all your critical systems with RPO/RTO
Automated backups following the 3-2-1 rule (3 copies, 2 media, 1 off-site)
Documented and tested restoration procedures
Emergency contact list accessible offline (printed)
Roles and backups clearly assigned
Communication plan (employees, clients, CAI if personal information)
Full restoration test at least once a year
Review and update of the DRP after every major change

What we put in place for our clients

At Blue Fox, when we deploy an infrastructure for a client, the DRP is part of the project from the start. We use Proxmox for virtualization, BorgBackup or Restic for encrypted off-site backups, and Ansible for configuration automation. Every deployment includes a tested and documented off-site backup.

We don't promise zero downtime: it doesn't exist, except at costs that make no sense for an SMB. What we promise is a clear RPO and RTO, documented and tested. When something goes down, we know exactly what to do and how long it will take.

Your current infrastructure has no DRP? We can do an assessment and build a plan suited to your reality. Let's talk.

In summary

A DRP isn't a luxury and it isn't a 6-month project. It's a living document that answers the question: "if everything goes down tomorrow, what do we do?" If you can't answer that question in less than 5 minutes, it's a sign that it's time to get started.

Start simple: identify your 3 most critical systems, verify that your backups work (really, by testing them), and write the broad strokes of your restoration procedure. You can refine later. The important thing is to start.

Need a hand building your DRP? Let's talk about your situation.

Sources

The 3-2-1 backup rule: applied concretely for your SMB

Three copies, two media types, one offsite: how to protect the data of a 20-person SMB without breaking the bank.