Thoughts, Feelings, Confessions & some Tech Stuff - Dileepa Jayathilake's Blog: March 2011

Monday, March 28, 2011

If you don't swim, you miss!!

I'm trying to write an inspirational story. It is based on a recent experience of mine. I stole my theme from Osho. The heading of his one beautiful article was "If you swim, you miss". I twisted the title to match my topic.

I have been trying to learn swimming at least for past 7 years. I made several attempts from time to time. Each time I started with much enthusiasm and tried hard to learn the basics during the first few days. However, I hardly made any progress beyond floating in the water and drifting couple of meters as long as I could hold one breadth. I thought that swimming is not my thing and there's something terribly wrong in my body which hinders me from becoming a swimmer. This thought lead me to give up every time.

However, many incidents compelled me to give another try to learn it. Whenever I got into a swimming pool, may be after a pool side party or in a hotel during a trip, I had to keep my self in the shallow end standing on the pool like a kid which posed me in utter disgrace. Last year I bought a new swimming pool membership and thought to me, "This time you are either going to learn swimming or drown in the pool and die". This time I made a change; I got the help of a professional trainer.

I started with the same desperate condition. However, my coach gave me the most important advice in learning swimming: "Don't expect progressive results and never give up". This meant that I should not be discouraged if I do not see any progress within days of training. I asked him how long should I be trying to become a swimmer with basic skills. He said "about 30 days". At that point I realized why I was not successful in my previous attempts. In any of those attempts I did not try even closer to 30 days.

Having received the correct advices I started training with a huge will power. I did not care whether people laugh at me or whether I look ridiculous making lot of sounds and drinking pool water. I believed that there should be a light in the end of the tunnel. Even in the twentieth day of my training I did not perform much better than my first day's workout. However, my coach was correct!! When it came close to thirty days...suddenly and totally unexpectedly...I could swim a good length while performing all the basics well. I could not believe it for a moment. It was one of my happiest days in that year. After all those years of struggling I became a swimmer.

It is interesting to note that many changes in the human body take place in leaps but not as ramps. Many people misread this behavior as unresponsiveness. A medical book I recently read, "The secrets of miracle doctors", suggests that this fact is true for many health aspects. What you need to do is keep making small quantitative changes even if you don't see any response. Those small changes add up and trigger a significant qualitative change somewhere down the line. It might be an amazing coincidence that Marx and Engels say that "Continuous quantitative changes lead to sudden qualitative changes in the society". This might be what keeps the agitation of courageous socialist leaders even when they get almost no response from the society.

Learning swimming inspired me to try out the same strategy in other things too. Even if none of them works, I'm a happy regular swimmer now.

Saturday, March 19, 2011

Automatic log file analysis

keywords: log data extraction, record expert knowledge, mind maps, expert systems, Application Verifier

I'm currently engaged in a research on automatic log file analysis. I came across this idea during my MSc research on software quality verification. When it comes to black box testing, there are many handy tools that analyse a certain aspect of an application. These aspects may be CPU utilization, memory consumption, IO efficiency or low level API call failures. One prominent problem associated is the requirement for expertise for using these tools. Even for experts the process takes a lot of time. For example, I have been using a free Microsoft tool called Application Verifier which keeps an eye on an application's virtual memory errors, heap errors, access failures due to improper access rights, incorrect usage of locks (which may result in hangs or crashes), exceptions, corrupt Windows handles, etc. It is a very useful tool to capture application errors that are impossible or extremely difficult to identify in a manual QA process. Even with experience it takes me about 2 days to test a product with this tool before a release. Given the hectic schedules close to a release, what happens more often than not is that I do not get a chance to do this test. One other problem is that there is no good way to record my analysis knowledge so that someone else or "something" else can perform the analysis if I'm busy with other stuff. Sequential text, which is the popular form of recording knowledge is not a good option in this case due to several reasons. First, it is difficult to write documents in sequential text form (I think most developers agree with me in this). Then it is difficult for someone to understand it due to the inherent ambiguous nature of natural language. Furthermore, a program (this is the "something" I was referring to) cannot understand it for performing an automated analysis.

Almost all the analysis tools that are out there generate some form of a log file. Big majority of them are text files; either xml or flat text. If we can come up with a mechanism to extract the information from these log files then the analysis procedure can be partly automated. The challenge here is to devise a scheme that can deal with the wide variety of proprietary structures of these log files. Though there are a bunch of tools available for log data extraction all of them are bound to a specific log file structure. All the log analysis tools I found are web log analyzers. They analyze the logs generated by either Apache web server or IIS. One cannot use them to analyze any other log file. An additional restriction is that the reports generated after the analysis are predefined. One cannot craft customized reports for a specific need.

There's one more dimension to highlight the importance of automated log file analysis. The majority of software products themselves generate log files. These logs are analyzed by product experts in troubleshooting. Each product has its own log file format and the knowledge required for reading the logs and making conclusions lies only within a limited group of product experts. With the maturity of a product, it is highly likely that some troubleshooting patterns emerge over time. However, there is no means for recording the knowledge on these recurring patterns for later use of the same expert, others or an automation program.

The tasks associated with log file analysis are information extraction, inference, report generation and expert knowledge recording. What I'm working on is a unified mechanism to automate all these tasks. I'm trying to do it with a new simple scripting language based on mind maps. I will write more about the solution in future with the progress of my research. Please keep me posted (dilj220@gmail.com) about:

Any automated log analysis tool known to you
Any other reason or scenario that comes to your mind for automated log file analysis
The features that you expect as a developer / QA engineer / product expert / manager from an automatic log file analysis tool

Friday, March 4, 2011

Software Quality Verifier Framework

I completed my MSc thesis couple of weeks back. My project was developing a Software Quality Verification Framework. Given the value of early bug detection in the software life cycle, the framework addresses both white box and black box testing.

White box testing
White box testing is implemented in two phases.
1. Commit time analysis - In this, the code is automatically analyzed when the developer tries to add new code or code changes to the code repository. Quality verification is done employing tools against a predefined set of rules. The commit is rejected if the code does not conform to the rules. The developer is informed with the reasons for rejection in the svn client interface. This functionality is implemented using svn hooks. Example output is as follows.

2. Offline analysis - A more thorough analysis is performed in an offline fashion, in the context of a nightly build, for example. Results of the analysis is displayed in a dashboard which shows various analytics and provides violation drill-downs to code level. Automatic emails can be configured to acknowledging various stakeholders on the overall health of the system and developer technical debt. This is implemented using a tool named Sonar (http://www.sonarsource.org/).

Black box testing
It was identified during the research that quite a number of tools have become available recently for evaluating a software product during its run without looking at the source code which generated it. Different tools evaluate a product on different aspects such as memory usage (corruptions, leaks), IO usage, operating system calls, performance, access right violations, etc. However, there's no tool that combines the results generated by these individual tools to automatically generate a product health profile like Sonar does with the white box testing tools. There are two main problems associated with the approach of using individual tools manually to perform tests.
1. Tool usage requires expertise and also is laborious
2. There's no way to record or automate a once identified troubleshooting (or evaluating) procedure

I thought about different solutions for this. Noting that almost all the tools generate a textual output in the form of a log file, I decided to implement a way to automatically extract the information of interest in a given context from those log files and generate reports for consumption by various parties like project managers, developers and technical leads. The output was a simple scripting language based on mind maps. The developers can write scripts in this language to extract information from various log files, derive conclusions based on them and generate reports.

Following is the architecture of the framework. I will blog more about the framework later.