Snapshots using rsync and hardlinks
At the moment I'm using rsync and hardlinks. This allows you to take a lot of snapshots without needing too much storage... on the other hand, since hardlinks are used, you really only have one backup, e.g. if the disk fails all versions of that file are gone. So this is not really a backup solution.
What are hardlinks? If you make a hardlink copy of a file, you don't actually copy the content, you just create another pointer to the same area on your harddrive. So say you have a file "a" and make a hardlink copy of it called "b" if you now delete "a", you can still access the file "b". The actual file won't be deleted or overwritten until you delete all the hardlinks to it. Also if you change either "a" or "b" the other file will change too, because on your hard disc there is only one file... why is this usefull? because it makes creating copies of files that have not changed really cheap regarding hard disc space and also the time that is needed for copying.
Rsync
is a program that enables you to make a copy of two directories, but it can be set up, so that it will make only copies of files that actually have changed. You can also tell it to compare the files to an already existing 3rd directory and create hardlinks to files that exist there.
With the above it is fairly easy to set a script up that will create a backup every N
minutes/hours/days/weeks/month, so that you not only have the last backup at hand, but also M
backups before that. So say you do backups every hour, you will have a copy of all files how the looked an hour ago, but also how the looked two hours, 3, 4, 5, 6, ..., M hours ago. And recovering a file just takes a copy command.
So here is the script to set up an hourly backup, keeping the last 3 backups before that too:
#!/bin/bash # paths and progs I use DEST=/path/to/where/the/backup/should/go ORIG=/dir/that/should/be/backed/up RM=/bin/rm MV=/bin/mv RSYNC=/usr/bin/rsync TOUCH=/usr/bin/touch #don't use anything else unset PATH # delete oldest snapshot if [ -d $DEST/hourly.3 ] ; then \ $RM -rf $DEST/hourly.3 ; \ fi ; # rotate other snapshots if [ -d $DEST/hourly.2 ] ; then \ $MV $DEST/hourly.2 $DEST/hourly.3 ; \ fi; if [ -d $DEST/hourly.1 ] ; then \ $MV $DEST/hourly.1 $DEST/hourly.2 ; \ fi; if [ -d $DEST/hourly.0 ] ; then \ $MV $DEST/hourly.0 $DEST/hourly.1 ; \ fi; # create new snapshot, use hard links to hourly.1 if possible if [ -d $DEST/hourly.1 ] ; then \ $RSYNC -a -v --numeric-ids --delete --link-dest=$DEST/hourly.1 $ORIG $DEST/hourly.0; \ else \ $RSYNC -a -v --numeric-ids --delete $ORIG $DEST/hourly.0; \ fi; # update time stamp $TOUCH $DEST/hourly.0
The magic is in the rsync
command, it will backup all files, but only copy those that are not already in the DEST
directory and in case files are the same as in DEST/hourly.1
it will just use hardlinks. This means only files that changed are actually copied each backup.
Once you have this. It is easy to also do daily backups for the last N
days:
#!/bin/bash # paths and progs I use DEST=/path/to/backup/dir RM=/bin/rm MV=/bin/mv CP=/bin/cp TOUCH=/usr/bin/touch #don't use anything else unset PATH # delete oldest snapshot if [ -d $DEST/daily.3 ] ; then \ $RM -rf $DEST/daily.3 ; \ fi ; # rotate other snapshots if [ -d $DEST/daily.2 ] ; then \ $MV $DEST/daily.2 $DEST/daily.3 ; \ fi; if [ -d $DEST/daily.1 ] ; then \ $MV $DEST/daily.1 $DEST/daily.2 ; \ fi; if [ -d $DEST/daily.0 ] ; then \ $MV $DEST/daily.0 $DEST/daily.1 ; \ fi; if [ -d $DEST/hourly.3 ] ; then \ $CP -al $DEST/hourly.3 $DEST/daily.0 ; \ fi;
It just uses the backups in hourly and once a day makes a hardlink copy of one of them.
Now all you need to do is set up a cronjob and get those scripts running at the correct time:
13 */4 * * * /path/to/bin/make_hourly_web_backup >/dev/null 15 13 * * * /path/to/bin/make_daily_web_backup >/dev/null 27 2 * * 0 /path/to/bin/make_weekly_web_backup >/dev/null