I wrote a shell script for a cron job that grabs backups of some remote files. It has a few nice features:
- Output from the backup commands is logged, with timestamps.
- cronwill send me email if one of the commands fails.
- The history of each backup is saved in Git. Nothing sucks more than corrupting an important file and then syncing that corruption to your one and only backup.
Here's how it works.
#!/bin/bash -e
cd /home/keegan/backups
log="$(pwd)"/log
exec 3>&2 > >(ts >> "$log") 2>&1You may have seen exec used to tail-call a command, but here we use it differently. When no command is given, exec applies file redirections to the current shell process.
We apply timestamps by redirecting output through ts (from moreutils), and append that to the log file. I would write exec | ts >> $log, except that pipe syntax is not supported with exec.
Instead we use process substitution. >(cmd) expands to the name of a file, whose contents will be sent to the specified command. This file name is a fine target for normal file output redirection with >. (It might name a temporary file created by the shell, or a special file under /dev/fd/.)
We also redirect standard error to the same place with 2>&1. But first we open the original standard error as file descriptor 3, using 3>&2.
function handle_error {
    echo 'Error occurred while running backup' >&3
    tail "$log" >&3
    exit 1
}
trap handle_error ERRSince we specified bash -e in the first line of the script, Bash will exit as soon as any command fails. We use trap to register a function that gets called if this happens. The function writes some of the log file to the script's original standard output. cron will capture that and send mail to the system administrator.
Now we come to the actual backup commands.
cd foo
git pull
cd ../bar
rsync -v otherhost:bar/baz .
git commit --allow-empty -a -m '[AUTO] backup'
git repack -dafoo is a backup of a Git repo, so we just update a clone of that repo. If you want to be absolutely sure to preserve all commits, you can configure the backup repo to disable automatic garbage collection and keep infinite reflog.
bar is a local-only Git repo storing history of a file synced from another machine. Semantically, Git stores each version of a file as a separate blob object. If the files you're backing up are reasonably large, this can waste a lot of space quickly. But Git supports "packed" storage, where the objects in a repo are compressed together. By repacking the repo after every commit, we can save a ton of space.
