A Wired.com user account lets you create, edit and comment on Webmonkey articles. You will also be able to contribute to the Wired How-To Wiki and comment on news stories at Wired.com.
It's fast and free.
processing...Retrieve Sign In
Please enter your e-mail address or username below. Your username and password will be sent to the e-mail address you provided us.
processing...Welcome to Webmonkey
- edit articles
- add to the code library
- design and write a tutorial
- comment on any Webmonkey article
Sign In Information Sent
Back Up a Web Server
/skill level/
/viewed/
So you're a good little monkey and you have a backup strategy for all your local PCs. Perhaps you use rsync, Time Machine or similar backup software mirroring your files to external drives on a regular basis. That is some safe thinking that deserves a nice big pat on the back. But wait, what about your remote web server?
Today we'll take a look at ways you can back up the HTML files, stylesheets, application files and databases on your remote web host.
The only thing you'll need is a remote web hosting service allowing SSH connections to the remote shell.
Ready to lay your fears of web server annihilation to rest? Good, let's dive in.
This page is a wiki. If you have any other methods of backing up your web server, log on in and add them yourself.
Contents |
The Backup Tools
We're going to focus on web hosts using Linux machines. What can we say? They're popular.
If your host uses Solaris, there are equivalents to all the Linux apps we're going to use. Likewise, if you have a Windows-based web hosting service, feel free to log in and add the equivalent tools for Windows users to the wiki.
The first thing we'll do is use tar and bzip2; two command lines tools for making compressed file copies, to back up HTML, CSS, Javascript or any other text files. Let's say your web host stores all your public files in a directory named /path/to/html_folder. In that case, we're going to do something like this:
tar -cf `date +%F`.tar /path/to/html_folder
and then
bzip2 `date +%F`.tar
That's all good and well, but we don't want to type those commands in the shell all the time so let's make a shell script. Log into your web server via SSH and enter this command:
emacs backup.sh
We're using the emacs text editor in this example. Substitute the editor of your choice. If you are more comfortable with a GUI editor, just log in via FTP, create and edit the file there.
Now paste this code into your backup.sh file, adjusting the file paths to work with your setup:
#!/bin/bash DATE=`date +%F` TARFILE=$DATE.tar tar -cf $TARFILE /path/to/html_folder bzip2 $TARFILE
The first line is just the standard bash script header. Adjust this line if you're using a tsch shell. From there we grab the date and then create a tar file using the date and appending the directory name to the end. Technically this could all be one line, but I split it up for readability.
Then we just create the actual .tar archive and compress it with bzip2 (of course, feel free to use gzip or any other compression tool you like).
Now we have a backup of our flat files, but what about the database? Most shared web hosts have decent database redundancy setups, but I still prefer to have a flat-file backup. Here's another bash script to backup a PostgreSQL database using pg_dump. Copy this text into a new file named backup_db.sh:
#!/bin/bash /path/to/pg_dump -x -D -Uusername -f path/to/`date +%F`.sql
This just calls pg_dump and outputs the database (including insert statements) to a file named today's date.sql in whatever directory you specify. For those using MySQL, mysql_dump (mysqldump on some servers) can do roughly the same thing. Also note you can dump compressed files using either tool by adding .gz to the filename.
One potential gotcha here, invoking pg_dump without entering a password won't work unless your password is stored somewhere. For Postgres you would use the ~/.pgpass file. See the manual for more info on the format and permission of .pgpass.
The last step is to make our shell scripts executable. Make sure to change the permissions so cron can execute them. In most shells it looks something like this:
chmod u+x filename
Automation
Now we have a couple of bash scripts we can invoke from our terminal prompt and backup our files. Great, but who wants to do it manually? Instead, let's set them up to run automatically once a day.
From the terminal open up your crontab using this command:
crontab -e
Now add these lines:
0 1 * * * path/to/backup.sh > /path/to/log_files/backup.out 2>&1 30 1 * * * /path/to/backup_db.sh > /path/to/log_files/backup_db.out 2>&1
Hit save and we're done: automated backups. The crontab above will run your backup scripts once a day at 1 AM for the flat-file script and at 1:30 AM for the database script. If there are any problems or the scripts aren't running, check the output in the .out files.
Locking
Long-running scripts like backups started through cron have a common problem. A new one may get fired off before the previous one is complete. This problem can be solved using a simple lock file as shown in the example below.
#!/bin/bash # Where should we store the lock file (Eg: /tmp) LOCKDIR=/tmp # Figure out the name of this script (without the path, etc) SCRIPTNAME="$(basename $0)" # The lock file uses the name of the script with extension .lck in the /tmp directory LOCKFILE="$LOCKDIR/$SCRIPTNAME.lck" # See whether or not we have a lock file. If we do then abort if [ -f $LOCKFILE ]; then exit 1 fi # Create the lock file /bin/touch $LOCKFILE # ALL SCRIPT PROCESSING HAPPENS HERE # Done, remove lock file rm -f $LOCKFILE
Note: This implementation is somewhat naive. See [1] and [2] for a couple of more sophisticated options.
Logging
Automated scripts should keep a log of what they do in case things go awry. This is an example of writing to a log file, with a start and end time.
#!/bin/bash # Where to store the log files. Eg: /var/log/scripts LOGDIR=/tmp # Figure out the name of this script (without the path, etc) SCRIPTNAME="$(basename $0)" # Make sure that the directory exists mkdir -p "$LOGDIR/$SCRIPTNAME" # The file goes in that directory LOGFILE="$LOGDIR/$SCRIPTNAME/"`date +%Y_%m_%d`'.log' # Append start date to the log file date +[START\ TIME:\ %H:%M] >> $LOGFILE # Do your thing here and log any output by adding '>> $LOGFILE' after the command # Example: # ls -lah >> $LOGFILE # Append end time to the log date +[END\ TIME:\ %H:%M] >> $LOGFILE
For a greater degree of automation you can also add the ability to clean up old log files. Having a ton of old log files lying around may cause performance problems and can be confusing too. Here is a simple way to clean up old logs
#!/bin/bash # At max - how many days old can the log files be? MAX_DAYS_OLD ="7" # Where are the log files stored. Eg: /var/log/scripts LOGDIR=/tmp # Find and delete .log files older than $LOG_MAX_DAYS_OLD days find $LOGDIR/ -type f -name *.log -mtime +$MAX_DAYS_OLD -delete
This need not be a separate script - It can easily be included with the previous one.
Fancier Automatic Backups
Want to get really fancy and have your home machine automatically log in to your server and download those backup files for safe, off-site keeping?
It's not too hard to do. The first step is to write the script. Create a new text file wherever you'd like and name it grab_backup.sh or something similar. Now copy and paste this code, adjusting the setting to match the location of the backup files we created in the last step.
#!/bin/bash DATE=`date +%F` FILE1=$DATE.tar.bz2 FILE2=$DATE.sql # we'll connect with SCP to copy the files scp username@example.com:path/to/$FILE1 ~/web/backup/folder scp username@example.com:path/to/$FILE2 ~/web/backup/folder # if you want to delete the backup files from the server just uncomment these lines: #ssh username@example.com rm -f path/to/$FILE1 #ssh username@example.com rm -f path/to/$FILE2
We need to make this script executable so go ahead and make it executable the same way we did above. Give it a test run from the command line - once you enter your remote login password, scp should start downloading the files.
Note: if you get a message like scp: .: not a regular file, which is a fairly common error, make sure there aren't any spaces between the colon at the end of the login info and the file you're trying to copy.
What to do about the password
Astute readers are probably wondering how we're going to automate a script needing a password before it can do its job.
Well, there's actually two ways to handle it, and both of them involve using SSH public/private key authentication. Essentially we need to create a key pair and then add the public key on our remote server's list of authorized keys.
But it doesn't entirely sidestep the password problem, since SSH keys themselves require a password.
There are two ways a round it: one is a really bad idea (but it works) and the other is a bit more complex (but much more secure).
Let's walk through the first method, creating a passwordless SSH key-pair, and then we'll talk about why it's a bad idea.
First create an SSH key like so:
ssh-keygen -t rsa
When prompted for a password just hit enter and leave it blank.
When ssh-keygen is done you should see a message like:
Your identification has been saved in /home/yourusername/.ssh/id_rsa. Your public key has been saved in /home/yourusername/.ssh/id_rsa.pub.
We need to add the public key (id_rsa.pub) to our web server. You can either do it using FTP and cut and paste the info into ~/.ssh/authorized_keys - or since you are still in the shell, try this line (substituting your login info):
cat ~/.ssh/id_rsa.pub | ssh username@server.com 'cat >> .ssh/authorized_keys'
It will add the SSH key we just generated to your web server's list of authorized keys, which means you can now log in to your remote server from your home machine without needing to enter a password -- perfect for an automated script.
However, it's not so perfect from a security standpoint. The way things stand now, if an attacker gets hold of your home machine they have unlimited access to your remote machine as well. Your private key is compromised and there's no password to protect it.
We can limit potential damage somewhat by adding restrictions to what we can do with our remote login. Open up the file .ssh/authorized_keys on your web server and you should see something like:
ssh-dss AAAAB3Nza[..huge string of gibberish..] = user@localhost
Just before the ssh-dss bit add this:
from="0.0.0.0",command="/home/user/path/to/backup_folder"
Just change the "from" IP Address to the IP of your home network or computer. Make sure to change the path to match wherever you're storing the backup files created by our earlier cron script. Also chop off the = user@localhost bit at the end of the line.
There now. Do you feel safer? Good.
Actually, it's a little safer, but still not good enough for many. In any security scenario there is always a weakness. If you use a password, it's also the weakness. In our case the private SSH key is the weakness. If you're confident you can keep your private key secure then what we've done may satisfy you.
If you think the whole set of instructions above is insane (and generally speaking it is), there is a far more secure option. The trick is to use ssh-agent, which is complex enough it warranted its own tutorial. Don't worry, despite being complex in theory, ssh-agent actually isn't all that hard to implement. Have a read-through Tutorial:Automate_a_Remote_Login_Using_SSH-Agent our tutorial and once you're up and running, jump back over here and we'll hook up our script.
Finishing Touches
Now you should have a shell script set up and a way to log in to your remote server sans password (whether by the insecure method above or the ssh-agent method). The last step in our automation process is to create a cron job on our local machine.
Open up a shell window and repeat the steps above like so:
crontab -e
Now add these lines:
0 2 * * * /path/to/grab_backup.sh > /path/to/log_files/grab_backup.out 2>&1
We now have a cron job on our local machine that will reach out to our remote server and grab a copy of all our backup files and dump them on our local machine. In our case the dump will happen at 2 AM, though you can adjust it to a good time for your setup. Just change the second number to whatever hour you desire. (24-hour time is used - 1pm is equal to 13.) Note: for this to work, obviously, your local machine needs to be running.
There we go. That pretty much covers it. You've now got two database and flat-file backups, one local, one remote. Now make sure you back up that local copy on another drive along with the rest of the files. And then back that drive up to one in a blast vault built to withstand a nuclear hit, and you should be able to keep your website running smoothly throughout the apocalypse.
- This page was last modified 18:46, 8 August 2008.
Special Offer For Webmonkey Users
WIRED magazine:
The first word on how technology is changing our world.
