I wrote a shell script for a cron job that grabs backups of some remote files. It has a few nice features:
- Output from the backup commands is logged, with timestamps.
cron
will send me email if one of the commands fails.- The history of each backup is saved in Git. Nothing sucks more than corrupting an important file and then syncing that corruption to your one and only backup.
Here's how it works.
#!/bin/bash -e
cd /home/keegan/backups
log="$(pwd)"/log
exec 3>&2 > >(ts >> "$log") 2>&1
You may have seen exec
used to tail-call a command, but here we use it differently. When no command is given, exec
applies file redirections to the current shell process.
We apply timestamps by redirecting output through ts
(from moreutils), and append that to the log file. I would write exec | ts >> $log
, except that pipe syntax is not supported with exec
.
Instead we use process substitution. >(cmd)
expands to the name of a file, whose contents will be sent to the specified command. This file name is a fine target for normal file output redirection with >
. (It might name a temporary file created by the shell, or a special file under /dev/fd/
.)
We also redirect standard error to the same place with 2>&1
. But first we open the original standard error as file descriptor 3, using 3>&2
.
function handle_error {
echo 'Error occurred while running backup' >&3
tail "$log" >&3
exit 1
}
trap handle_error ERR
Since we specified bash -e
in the first line of the script, Bash will exit as soon as any command fails. We use trap
to register a function that gets called if this happens. The function writes some of the log file to the script's original standard output. cron
will capture that and send mail to the system administrator.
Now we come to the actual backup commands.
cd foo
git pull
cd ../bar
rsync -v otherhost:bar/baz .
git commit --allow-empty -a -m '[AUTO] backup'
git repack -da
foo
is a backup of a Git repo, so we just update a clone of that repo. If you want to be absolutely sure to preserve all commits, you can configure the backup repo to disable automatic garbage collection and keep infinite reflog.
bar
is a local-only Git repo storing history of a file synced from another machine. Semantically, Git stores each version of a file as a separate blob object. If the files you're backing up are reasonably large, this can waste a lot of space quickly. But Git supports "packed" storage, where the objects in a repo are compressed together. By repacking the repo after every commit, we can save a ton of space.