My Backup Strategy

Saturday 16 May 2026


Tiered backups

I'm certain most people will have heard of the 3-2-1 backup rule. That is:

That's definitely a good rule of thumb, and one I will be following for the most vital of my data. However, I wanted to introduce tiers into my backup system to save a bit of cost.

Most of the backups (by volume, anyway) I'll be making here are not mission critical. They're designed to be used as a quick restore point should the worst happen, but they aren't the last line of defence. For example, I'll be taking a full snapshot of the VM that acts as my NAS, all the data, its OS, configuration, etc, etc. That makes for a really nice restore process, but I don't want to send an entire Rocky Linux installation up to the cloud and pay for it to be sitting there when, in reality, should the worst happen I could just grab an ISO and get a new VM running in less than an hour.

To that end, I'm going to be rolling out an on-prem backup server with high capacity for quick restores, then sending the absolutely critical parts off-site for a true SHTF situation.

Hardware + Software

My main server is PVE01. It's the 3D printed server I rolled out earlier this year. This is the main target for most of my data as I create it. Most of that syncing happens via Nextcloud, but it also houses Git repositories, and a load of other services I use from day to day.

Each VM that contains critical data has rclone installed, that will come into play later when we start making off-site backups.

The flow

I'll start where most of the data gets created, this laptop. As an example, I'm currently writing into a markdown file under /home/jake/Documents/repos/website-content/. My Documents directory is actually a symlink however:

lrwxrwxrwx 1 jake jake 31 Oct 26 20:40 Documents -> /home/jake/Nextcloud/Documents/

So each time I stop typing, VSCode auto-saves, and the Nextcloud desktop client syncs the file to my main server. That covers 2 of the 3 required by 3-2-1 immediately. That server is then backed up in two ways:

Mission critical data

Any VMs where I really don't want to lose data (things like documents, photos, passwords etc) have a local installation of rclone. That is used to encrypt the data, and send it up to a Google Cloud Storage bucket. This gets done regularly by Jenkins (a CI/CD tool, but also useful for any automated task). I won't go into huge detail here as it isn't that interesting, it's essentially just a slightly fancier rsync running in cron.

Important, but not critical data

For VMs where it'd be annoying to lose data, but not an absolute disaster, another Jenkins job runs weekly which is slightly more interesting. My old HP Microserver (now known as PVE03) sits in my home office, under my desk. So not off-site, but in a different room such that if something physical has destroyed one server I've probably got bigger things to worry about in the case that both are gone. It's switched off most of the time (electricity in the UK is expensive) but gets utilized once a week to cover the remaining backups. I have a small Python script, also managed by Jenkins, that will:

  1. Boot PVE03 using iLO, and wait for it to become available.
  2. Start a VM running Proxmox Backup server, and wait again for that to be available
  3. Start a backup job on my main PVE nodes over to PBS, and wait for them to complete
  4. Shut down PVE03

Is it enough?

I'm reasonably happy that all my critical data is following (and exceeding) the 3-2-1 rule, I have:

While I am yet to test the various recovery options I have in a disaster, I'm pretty confident that at least one of them would get me unstuck. PBS has some quirks around restoration (the actual files are inside an image of the full VM so I'd need to restore to a hypervisor and have networking in place to allow me to access it, and copy data out), and resorting to GCS would be potentially costly, but I am confident that I'd be able to get something to work out of the options if I were to lose one of the copies.

My less critical data doesn't quite meet 3-2-1, that consists of:

In reality though, those are nice to have. They're more about saving me some time not needing to rebuild the service rather than preserving data, so the worst outcome here is that I have to spend a few days spinning up new VMs and installing services before calling on one of the critical backup locations to get the data back.

Next steps

The one thing this setup is still missing is a tested recovery process. While it's definitely nice having four copies of your data, if you've never actually restored from any of them you're really just hoping for the best. I have a few ideas to test this, potentially restoring VMs to a spare host, or adding some storage to my PC to see if I can pull files, but I haven't fully figured this out yet. When I do, it may be worth a post of its own.