Wednesday, November 17, 2010

Unix folder synchronization

Ok...This is my first tech article. Not a huge thing, but someone trying to do the same may find this useful because the time spent for reading help pages and trying out things can be saved. I will try to make this a practical recipe.
As the title explains, this article is about automatically synchronizing contents of two Unix folders (probably in seperate machines). This mechanism is ideal to be employed in a periodic backup mechanism. The Unix utility used here is rsync.

Let's assume that the two folders are in two seperate machines (which is the most probable case), machine A and B, and we have the folder that needs to be backed up in A. Then B is the backup machine. First step is verifying the required utilities are installed in the respective machines. In this case we need rsync, ssh and to be installed installed in machine A and rsync and ssh to be installed in B. To verify just type the command and see whether it is available as a command. If not install the utility using the command apt-get install <package-name>. The package name is suggested by the OS when you try to run a command that is not available.
The way we're going to implement this is to run rsync as a daemon (inside inet daemon) in machine B and periodically run a rsync command in machine A. This will transfer any changes in the folder in machine A to the specified backup folder in machine B. Following are the steps.

Step1

Implement the following shell script in machine A.

#!/usr/bin/expect -f
spawn rsync --rsh=ssh user@B --compress --recursive --times --perms --links --delete <source_folder> <ip_of_B>:<backup_folder_in_B>
expect "*password:*"
send "<passwd>\n"
expect eof

Lets examine the purpose of each command element in this. --rsh=ssh user@B option for rsync says it to used secure data transfer between machines. user should be replaced by the root user name (probably root) in B and B should be replaced by the ip address of B. --compress is to compress data before transfer and --recursive will cause all the subfolder contents to be backed up recursively. --times preserves time data in files and --perms preserves permissions. --links is to synchronize file links and --delete tells rsync to delete files from backup when they are deleted from the original folder. You can omit any of these options according to your requirement.
After that we use expect to automatically input ssh password (root password). Replace <passwd> with the actual password.
Save the script and give it execute permissions (chmod).

Step 2

In this step we schedule the script as a cron job in Unix. For this, type crontab -e and enter a line to the crontab to execute the script. For example, in my case I wanted to run the synchronization script hourly and I entered 
1 * * * * expect ~/p/scripts/backup_files
backup_files is my script name.

Step 3

Now we need to run rsync as a daemon in machine B. For this type rsync --daemon. Also to run this during the startup, create a shell script containing this command and put it to /etc/init.d folder in machine B and give it execute permission. Then type 
update-rc.d <script_name> defaults

About Me...

I am a software engineer living in Sri Lanka, a beautiful South Asian Island. My full name is Parana Widanaralalage Dileepa Chiranthana Jayathilake; just to show how descriptive Sri Lankans are in using symbols. I was born in a beautiful small village located in the southern coastal part of the island where I spent the first ten years of my life. I had my primary education in Devananda Vidyalaya in Ambalangoda. Then I moved to Colombo which is the economic and political center of the country for my secondary education. There I attended to Nalanda Vidyalaya. I did well in the GCE advanced level examination (which may be the most competitive examination in the world – only the best 2% gets selected for government-funded university education) to be placed the 5th in the island. It opened me the path to the best engineering academy in Sri Lanka, University of Moratuwa.  After being graduated as a computer engineer I entered the software industry where I have been doing stuff for past 7 years.
As a student, math was my first love, and you know, the first love is special. Though I do not get much math stuff in my current professional life I still have a hard on for it. I am a bit worried that current day software engineering is too distant from computer science where math plays a major role. I love low level programming, algorithm tuning and formal software design where I can at least do something like math. I am pretty much interested on computer languages and natural languages too. I have done some work in these areas which I will be explaining in this blog later on.
In the spare time I love to do gymming, dancing and swimming..and I put my best effort to do them regularly. I like to enjoy with friends occasionally (well, this word is subjective) and sometimes find myself trying to go to extremes on those occasions. Apart from that, I enjoy reading short stories, novels and poems and watching stage plays, operas and movies. My favourite author is Franz Kafka and the local best is Ajith Thilakasena. I enjoy reading Nishshanka Wijemanna, Samara Wijesinghe and many others too. I have my taste on David Lynch, David Cronenberg, krzysztof Kieslowski type directors’ movies. The best ones I have watched (well, there are quite a lot) are Mulholand Drive, Lost Highway, M-Butterfly, Crash and Blue.