Wednesday, November 17, 2010

Unix folder synchronization

Ok...This is my first tech article. Not a huge thing, but someone trying to do the same may find this useful because the time spent for reading help pages and trying out things can be saved. I will try to make this a practical recipe.
As the title explains, this article is about automatically synchronizing contents of two Unix folders (probably in seperate machines). This mechanism is ideal to be employed in a periodic backup mechanism. The Unix utility used here is rsync.

Let's assume that the two folders are in two seperate machines (which is the most probable case), machine A and B, and we have the folder that needs to be backed up in A. Then B is the backup machine. First step is verifying the required utilities are installed in the respective machines. In this case we need rsync, ssh and to be installed installed in machine A and rsync and ssh to be installed in B. To verify just type the command and see whether it is available as a command. If not install the utility using the command apt-get install <package-name>. The package name is suggested by the OS when you try to run a command that is not available.
The way we're going to implement this is to run rsync as a daemon (inside inet daemon) in machine B and periodically run a rsync command in machine A. This will transfer any changes in the folder in machine A to the specified backup folder in machine B. Following are the steps.

Step1

Implement the following shell script in machine A.

#!/usr/bin/expect -f
spawn rsync --rsh=ssh user@B --compress --recursive --times --perms --links --delete <source_folder> <ip_of_B>:<backup_folder_in_B>
expect "*password:*"
send "<passwd>\n"
expect eof

Lets examine the purpose of each command element in this. --rsh=ssh user@B option for rsync says it to used secure data transfer between machines. user should be replaced by the root user name (probably root) in B and B should be replaced by the ip address of B. --compress is to compress data before transfer and --recursive will cause all the subfolder contents to be backed up recursively. --times preserves time data in files and --perms preserves permissions. --links is to synchronize file links and --delete tells rsync to delete files from backup when they are deleted from the original folder. You can omit any of these options according to your requirement.
After that we use expect to automatically input ssh password (root password). Replace <passwd> with the actual password.
Save the script and give it execute permissions (chmod).

Step 2

In this step we schedule the script as a cron job in Unix. For this, type crontab -e and enter a line to the crontab to execute the script. For example, in my case I wanted to run the synchronization script hourly and I entered 
1 * * * * expect ~/p/scripts/backup_files
backup_files is my script name.

Step 3

Now we need to run rsync as a daemon in machine B. For this type rsync --daemon. Also to run this during the startup, create a shell script containing this command and put it to /etc/init.d folder in machine B and give it execute permission. Then type 
update-rc.d <script_name> defaults

No comments:

Post a Comment