Sunday, December 30, 2012

A shell recipe for backups with logs and history

I wrote a shell script for a cron job that grabs backups of some remote files. It has a few nice features:

  • Output from the backup commands is logged, with timestamps.
  • cron will send me email if one of the commands fails.
  • The history of each backup is saved in Git. Nothing sucks more than corrupting an important file and then syncing that corruption to your one and only backup.

Here's how it works.

#!/bin/bash -e

cd /home/keegan/backups
log="$(pwd)"/log

exec 3>&2 > >(ts >> "$log") 2>&1

You may have seen exec used to tail-call a command, but here we use it differently. When no command is given, exec applies file redirections to the current shell process.

We apply timestamps by redirecting output through ts (from moreutils), and append that to the log file. I would write exec | ts >> $log, except that pipe syntax is not supported with exec.

Instead we use process substitution. >(cmd) expands to the name of a file, whose contents will be sent to the specified command. This file name is a fine target for normal file output redirection with >. (It might name a temporary file created by the shell, or a special file under /dev/fd/.)

We also redirect standard error to the same place with 2>&1. But first we open the original standard error as file descriptor 3, using 3>&2.

function handle_error {
    echo 'Error occurred while running backup' >&3
    tail "$log" >&3
    exit 1
}
trap handle_error ERR

Since we specified bash -e in the first line of the script, Bash will exit as soon as any command fails. We use trap to register a function that gets called if this happens. The function writes some of the log file to the script's original standard output. cron will capture that and send mail to the system administrator.

Now we come to the actual backup commands.

cd foo
git pull

cd ../bar
rsync -v otherhost:bar/baz .
git commit --allow-empty -a -m '[AUTO] backup'
git repack -da

foo is a backup of a Git repo, so we just update a clone of that repo. If you want to be absolutely sure to preserve all commits, you can configure the backup repo to disable automatic garbage collection and keep infinite reflog.

bar is a local-only Git repo storing history of a file synced from another machine. Semantically, Git stores each version of a file as a separate blob object. If the files you're backing up are reasonably large, this can waste a lot of space quickly. But Git supports "packed" storage, where the objects in a repo are compressed together. By repacking the repo after every commit, we can save a ton of space.

3 comments:

  1. #!/bin/bash -e
    already buggy; use "#!/bin/sh" and "set -e".
    If you must use bash prefer "#!/usr/bin/env bash"

    function handle_error {
    buggy. The correct syntax is
    handle_error() {
    "function" is not a POSIX keyword

    "By repacking the repo after every commit, we can save a ton of space."
    Also buggy and slow. Git should do this automatically if required.

    ReplyDelete
    Replies
    1. POSIX says that a script starting with "#!" has implementation defined behavior. POSIX does not provide a way to force a script to be run with a POSIX shell. "#!/bin/sh" is likely to work, but doesn't on some systems. "set -e" is required to work by POSIX; so I agree that it should be preferred over using -e in the she-bang line.

      The env trick fails on esoteric systems. A different set than the ones that fail on "#!/bin/bash -e" sure, but it's roughly the same number of systems in the wild. It's a wash and "#!/bin/bash" is shorter, so whatever.

      Agreed on POSIX shell function syntax. But, the she-bang did request bash and bash takes the syntax in the script. I think it's a wash, but it would be nice if these non-bash-specific techniques used non-bash-specific syntax.

      Yeah, repacking the repo after every commit is unlikely to save much space at all. Git automatically repacks as needed during normal operations. If you do massive history re-writing with filter-branch or something like that, you might want to repack, but only if you are very sure of the results and have properly removed the backup refs and the pruned old refs from the reflogs. The repack in the article is more likely to waste resources than to save them.

      Delete
  2. I think that `git commit [..] -a [..]' is not enough and you should add `git add .'? Otherwise, I'm not sure if I understand the relation between "foo", "bar" and "baz".

    ReplyDelete