Backups with Duplicity

Sat 27 Jul 2019
Sun 28 Jul 2019

Duplicity is a great program that I've used in my environment to backup my projects. There is no need to reinvent the wheel or try to best well written software. It will save you a lot of headaches.

In this article I will shed some light on how my structure looks like. Key components/requirements for me are Duplicity, GPG encrypted and having the possibility to upload it all to a off site backup. I need to be able to backup my workstation, server and some client environments. This means being able to deal with this on both Windows and Linux.

In a nutshell my backup strategy needs to be:

  • OS Agnostic
  • Encrypted with GPG
  • Full/Inc strategy
  • Option to push to remote - Google Drive

We are going to setup incremental backups and a weekly full backup. The best thing about Duplicity is that it will take care of creating full and incremental backups by default. We just have to configure the interval.

There are a few pitfalls when working with Duplicity that you need to know of. In production, always work with stable. This goes without saying for me, yet it is good to point out before we start. Also, you might already had Duplicity installed. If so, removing the package and binary is important before you continue. Strange mismatches may occur when you do not properly remove a previous version.

To remove previous versions of Duplicity you need to remove the binary and the pip package. Duplicity is written in Python thus relying on it's own pip package. To show and possibly remove previous versions use:

# Show where the binary is
rob@Rathalos ~ $ which duplicity
# Remove the binary
rob@Rathalos ~ $ rm /usr/local/bin/duplicity

# Show where the pip package is located
rob@Rathalos ~ $ pip show duplicity
# Let pip do the work :-)
rob@Rathalos ~ $ pip uninstall duplicity

Then there is Python. Use Python2.7 for this. Duplicity is not compatible with Python 3 yet. If you had Python 3 as your main Python installation also change PIP alongside :)

Installing Duplicity

You can get the latest stable release from their website. I find the easiest way to just download the tarball and install from source. At moment of writing 0.7.19 is the latest version of Duplicity.

# Download the official package
rob@Rathalos ~ $ wget

# Unpack with tar
rob@Rathalos ~ $ tar -xzf duplicity-0.7.19.tar.gz
# Go into the directory
rob@Rathalos ~ $ cd duplicity-0.7.19
# Install with Python
rob@Rathalos ~ $ python install

# Check availability of Duplicity
rob@Rathalos ~ $ duplicity --version

Next up will be preparing the remote or local storage for your backups.

Generating a GPG key

Generating a GPG key is standard procedure. It's a good practice to create a dedicated key for this procedure. You will be prompted with a series of questions that will configure the key pair.

# Generate a new GPG Key
rob@Rathalos ~ $ gpg --gen-key

# ... Answer to the prompted questions
# Check your newly made key
rob@Rathalos ~ $ gpg --list-keys

Now you have a key that you can use for the backup procedure. Copy the GPG Key Id for later. We will need this when setting up the command. Always backup this key! I have made another article on a good way to backup your GPG key. If you have setup a GPG key with a password then use the PASSPHRASE environment variable to set the password.

Preparing the storage location

Lets start with the local storage option. It is the most simple option of all and does not require a lot of prepping other than setting the correct rights. You simple use file://~/backups for example in the duplicity target directory argument.

Google Drive is a whole other story. We will need to setup credentials and install pydrive. Pydrive is the program that duplicity can use to push the backups to a remote location.

First you need to head to the developer console. Here you need to create a new project and add the Drive API. When you have done this you should be able to create some credentials that we can use. In the menu, go to 'Credentials' and click the 'Create credentials' button. Choose the option 'OAuth client ID'. You might be prompted to configure the consent screen, fill in the required fields and continue with creating the OAuth Client ID. Choose 'Other', after creating this you will be prompted with a popup that contains the client ID and secret. Copy these, we will need to put them in a credentials file. The file below is the format that Duplicity understands.

client_config_backend: settings
   client_id: "CLIENT ID HERE"
   client_secret: "CLIENT SECRET HERE"
save_credentials: True
save_credentials_backend: file
save_credentials_file: gdrive.cache
get_refresh_token: True

Now, lets put the content above inside a file. This is a tutorial with a sample location. We use the home folder, this might be an obvious place to put these files. In practice we just need a location to point to when executing the backup command.

# Make a directory for this credential file
rob@Rathalos ~ $ mkdir ~/.duplicity
# Nano the file and put the above content in there
rob@Rathalos ~ $ nano ~/.duplicity/credentials

# Set appropriate rights, these are sensitive after all!
# A short and friendly reminder:
# -R stands for recursive
# -f stands for force -- always double check commands with --force!
rob@Rathalos ~ $ chmod 0600 ~/.duplicity -Rf

Now, the only thing you need is a account that we can use to store the backups. You will need to set this up once by giving consent to these credentials.

Configuring Duplicity

Now, lets bring everything together. Duplicity recognizes the GOOGLE_DRIVE_SETTINGS environment variable. We need to set this. If you are just testing then you can use export to create the variable in your current setting. If you are done with configuring you can update your ~/.bashrc file with this variable.

# Make the envirment var in current shell session
rob@Rathalos ~ $ export GOOGLE_DRIVE_SETTINGS=~/.duplicity/credentials

# Run the basic duplicity command
rob@Rathalos ~ $ duplicity --encrypt-key GPG_KEY_ID /home/rob/duplicity/test pydrive://YOUR_GOOGLE_MAIL/backups

You will be prompted the first time you run this command to give consent to the OAuth credential to be able to create and delete files. You can further lock down the rights of the credentials back in the console.

After this we need to take a look at some options to further refine our procedure. I recommend going through the man page for all available options. You might use a different remove storage method or have different requirements for your procedure.

--full-if-older-than 1M is a great option to add. 1M stands for 1 month, you can change this to the interval you want for your full backup. Do a restore cycle and see what fits best in your situation.

--file-prefix, --file-prefix-manifest, --file-prefix-archive, --file-prefix-signature are great options if you need to add some naming convention to these files when folders are not enough.

--include and --exclude are also great options to use. These can exclude or include certain folders. You will need this if there is non-essential data like a /tmp folder in the application or folder that you mean to backup. Regex can be used to further define these rules.

--dry-run is also available. Use this if you are working with quoted apis. It is sensible to do a dry run before actually configuring this in your scheduler.

Combining all the options mentioned would give you something like the following:

# More advanced duplicity command
rob@Rathalos ~ $ duplicity --encrypt-key GPG_KEY_ID --full-if-older-than 1M --file-prefix "desktop-rob-" --exclude /home/rob/duplicity/test/test01 /home/rob/duplicity/test pydrive://YOUR_GOOGLE_MAIL/backups

Cleaning up old backups

You might end up with a script that you schedule to do all these jobs at once. Cleaning up after older backups is important. It will keep your storage clean and mean!

Duplicity already has this build in. It magical really. Simply schedule this alongside your backup command to regulate how many backups are kept in storage. To remove everything older than 3 months use:

duplicity remove-older-than "3M" --file-prefix "desktop-rob-" pydrive://YOUR_GOOGLE_MAIL/backups

Recovering a backup

Now, lets go through the recovering cycle at least once. To flex our backup muscles so we won't be completely unexperienced in restoring backup files. I always find it a good practice to fully execute a procedure before putting it in place. This will let you know that everything works and gives you a feel if it needs tinkering for a smoother, easier and quicker experience.

Restoring a backup is actually quite simple due to Duplicity's grace in doing so. The -t option will let you define the date that you want to restore. 3D's being 3 days back. It will look for the closest full backup and then add incremental backups until the time given has been reached. Ideal!

duplicity restore -t 3D --file-prefix "desktop-rob-" pydrive://YOUR_GOOGLE_MAIL/backups ~/duplicity/test-restore

OS Agnostic

Now making this setup OS agnostic part. In my personal setting I will use Docker for this. I use this on my Windows machine and Linux machine. We can replicate the steps above in an image of choice. When your docker image has been completely setup we can simple run the duplicity command through a temporarily ran container.

I will update this article with a link on how to properly setup Docker on a windows host. When my Ryzen build is done I will get to that part :)

# Run command through temp container
rob@Rathalos ~ $ docker run -it -P my-own-image "/usr/local/bin/duplicity --encrypt-key GPG_KEY_ID --full-if-older-than 1M --file-prefix "desktop-rob-" --exclude /home/rob/duplicity/test/test01 /home/rob/duplicity/test pydrive://YOUR_GOOGLE_MAIL/backups"

We are done!

To summarize, Duplicity is a great tool. No need to reinvent the wheel when they have written software that does it all. You have support for encryption, all the major cloud storage platforms are supported and it only takes an hour or so to install and fine-tweak this setup.

Backups are important. You need to be comfortable to create setups like this and put procedures into place. Restoring backups should be an easy procedure too, with Duplicity you can virtually train anyone to restore a backup for when disaster strikes!

A good rule of thumb is to keep everything simple! Don't over complicate your setup. Keep it stupid, simple! It is a good mantra. You will get complexity soon enough by adding components, no need to add complexity yourself.