brian_hassinger

Software Developer

Personal backups

In early 2021 I cobbled together some scripts to create a personal backup solution for my data. I decided to use borg for creating backups and rclone for syncing to backblaze. I want to explain my approach.

What makes a good backup?

A backup is simply a copy of data, but more important is where the backup lives. If a backup lives on the same storage as the source data, both will be lost when the storage fails. That's not a good backup!

Moving the backup to another storage device is a start. Moving the backup to a different medium (like a CD or tape) is an improvement, and moving the backup to different latitude and longitude is even better. Both the medium for storage and the location are important qualities for a good backup.

Another (sometimes overlooked) quality of a good backup is what data is retained. Often not all the data is necessary. It can be easiest to copy everything, but it might not be practical. Cost and space restrictions put limits on backups, and planning for this detail can extend the usefulness of a backup plan.

Which data to backup

My plan is to create backups for my computer's local files. Let's say I accidently delete everything with rm -rf / — I should be able to restore my home directory with little pain.

My $HOME directory isn't heavily curated, so I plan to archive everything. Including:

  • Programming projects
  • Config files
  • Screenshots / Screen recordings
  • Downloads
  • Documents

Some of this isn't critical, but over many years I've enjoyed being able to look through my old filesystems as they were, reminding me of where I spent time, or what my interests were.

On the flipside, some items in my home folder are too large in size and can be removed. Things like virtual machine images, Steam games, or file caches.

Storage plan

Archive Primary Secondary Remote Frequency
Local Files Home Computer Separate HDD Backblaze Hourly

My $HOME directory is my first storage device. It is the source of truth. Anything created in there will eventually end up in my backups.

For secondary storage I chose to use a separate hard drive on the same computer. All the backups will be stored there. Ideally this would be a different type of storage than the primary, but I'm not getting into tape archives or optical storage. I'd be happy if I could replace this with local network storage in the future.

I decided for remote storage to use Backblaze B2. This is an object storage service, similar to Amazon S3, with much lower costs for both storage and data transfer.

Remote Object Storage Price Comparison

Service Storage ($/GB/Month) Download ($/GB)
Amazon S3 $0.021 $0.05
Backblaze B2 $0.005 $0.01

I'll run my backup scripts every hour using cron. Which means I could lose some data in the past hour, but that's alright with me.

Creating backups and deduplicating data

I chose to use Borg for creating backups mainly for its ability to deduplicate data. Dedeplication means borg is able to save storage space by eliminating duplicate bits of data. Because a large portion of my home directory remains the same day to day, this saves tons of space!

With Borg, first create a repository for storing archives:

borg init /path/to/repo

Next create an archive using the current date, and specify the files to store:

borg create /path/to/repo::'{now:%Y-%m-%d}' ~/Downloads ~/Documents

When run again (maybe the next hour, or the next day) it will be much quicker, borg will only store the parts of the files which have changed.

Borg can also remove old backups that are no longer needed. This will keep a daily backup for the past 7 days, a weekly backup for 4 weeks, and a monthly backup for 12 months:

borg prune --keep-daily 7 --keep-weekly 4 --keep-monthly 12 /path/to/repo

My storage savings so far

$ borg info /media/machina/archives
----------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:              878.48 GB            506.72 GB             63.87 GB

Using only 63 GB to functionally store 878 GB worth of backups is awesome!

Borg automation

I put the following into a script to create and prune the backups. Most of this is taken directly from the incredibly well-written borg documentation.

#!/usr/bin/env bash

# Backup my home folder
HOME=/home/ylambda
PATH=$PATH:/home/ylambda/.local/bin
REPOSITORY=/media/machina/archives

if pidof -x `which borg` > /dev/null; then
    echo "Borg already running."
    exit;
fi

# save $HOME directory excluding a select few
borg create -v --stats                          \
    --exclude "$HOME/.cache"                    \
    --exclude "$HOME/.local/share/Steam"        \
    --exclude "$HOME/VirtualBox VMs"            \
    --exclude "$HOME/.vagrant.d"                \
    $REPOSITORY::'{hostname}-{now:%Y-%m-%d}'    \
    $HOME

backup_exit=$?

# remove older backups
borg prune                          \
    --list                          \
    --prefix '{hostname}-'          \
    --show-rc                       \
    --keep-daily    7               \
    --keep-weekly   4               \
    --keep-monthly  12              \
    --keep-yearly   10              \
    $REPOSITORY

prune_exit=$?

# use highest exit code as global exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))

exit ${global_exit}

Remote sync

I decided to use the fantastic rclone to push the borg backups into remote storage. This tool has excellent support for remote storage APIs. It works very similarly to rsync, if you're familiar. First you configure it with a remote storage provider, which is easy enough to figure out from their documentation. Then you're ready to go!

Use the sync command to push the data out:

rclone sync /path/to/source remote:path/to/dest

I ended up with something like this:

#!/usr/bin/env bash

PATH=$PATH:/home/ylambda/.local/bin
SOURCE=/media/machina/archives
DESTINATION=desktop:bhassinger/desktop/archives
CONFIG=/home/ylambda/.config/rclone/rclone.conf
HOME=/home/ylambda

if pidof -x rclone > /dev/null; then
    echo "Rclone already running."
    exit;
fi

rclone -v --config "$CONFIG" sync "$SOURCE" "$DESTINATION"

Cron

Finally, I want both of these to run every hour. I also want to be sure borg will run and finish before I start pushing data to remote storage. To accomplish this, I'll throw one more script together:

#!/usr/bin/env bash

cd /home/ylambda/dev/personal/backups/

./backup.sh
./sync.sh

And finally, throw this into cron!

0 * * * * /home/ylambda/dev/personal/backups/run.sh

Restoration procedure

First rclone mount the remote storage, which will make the remote storage available as a local filesystem. If you have the backups locally, you can skip this step.

rclone mount remote:path/to/files /path/to/remote/mount

Next mount the specific backup (here called 02-27-22) with Borg.

borg mount /path/to/remote/mount::02-27-22 /path/to/restored/mount

Shortcomings

My solution works well enough for me but I know of a few areas it could be improved.

For starters, the remote storage is dependent on borg working. Let's say something goes horribly wrong with borg and it prunes all the backups. When my sync script runs, I'd lose my remote storage too. This is addressable, and should be something I tackle soon. Changing the sync script to avoid deleting old data could help.

Another issue is reporting. There's no way to know when any of these scripts stop working. A common thing I've seen is having an email sent for every time it works or doesn't work. This is a bit spammy, ideally I'd want to record all the times it worked, but only alert when it doesn't work. I would love to build my own solution for this!

And finally, my last pain point is portability. These scripts are highly set up for my own use, I'd like to make them a little more general purpose so I could quickly set them up on a VPS for example.

For now this is what I'm using for my own backups, if you made it this far, thank you for reading!