Thoughts, Feelings, Confessions & some Tech Stuff - Dileepa Jayathilake's Blog: 99xt

Showing posts with label 99xt. Show all posts

Sunday, October 23, 2011

The toilet model of life

One thing that fascinates me in modern civilized world is the use of commode toilets. They are radically different from the means people used for toilet in olden days. What happens when a person wants to use a commode toilet? He has a need and sits on the toilet seat, does the thing and presses a button (or pushes a handle)...and the stuff are gone...they disappear into another reality after which he is NOT RESPONSIBLE about it. Nobody can point at something and say that "Hey, look, this is your stuff". However, the case with earlier practices would be so different. If we take, for example, the practice of going into jungle, "it" is there in the same reality after he does it. He sees it, smells it, and may even step on it the next time he walks into the jungle.

When observing the changes happening in the current days' world with respect to politics, education, technology and human relationships I have been smelling a paradigm shift. Lot of things are changing their shape radically. The change in technology is more prominent. If you ask an analyst he will load you with a lot of facts on the new trends and how you should be dealing with them. However, these are mere facts only. A logical man needs something more than facts. He needs a concrete philosophy through which the new changes can be understood, described and predicted. I have been searching for this philosophy for quite a time, and finally, I discovered it...not in books...not even in the web...but in the toilet. I call this new philosophy, "The Toilet Model of Life".

The tiny toy software

Let's take few examples from modern technology first. In the olden days software came as big packages. The installation was heavy and the users were supposed to go through a significant learning curve before start using them. In order to make a certain window appear, a user had to go to Options menu, select "Advanced" screen and navigate to the "Advanced" tab in it (which some people call the super-duper advanced screen) and check a configuration. New versions came in 2-3 years and had a whole bunch of new features added. Users of these software were specialists in them who knew through years of experience, the recipe needed to get something useful done from them. They were very much dependent on the few software they working with and were emotionally attached to them. Software had to be used with great RESPONSIBILITY and discipline. A long uninstallation procedure had to be followed when the user no longer needed the software. Even after the uninstallation, software usually left lot of traces behind in the system. Briefly said, the software was in the center and the users were in the periphery. However, what we see today are software with very short release cycles (as short as half an hour) that shape the product gradually according to fast changing market needs. Since the world wide web has opened up a super fast channel for software delivery, the users no longer need to wait for months or years for a particular software to show up. There are hundreds of software available to serve a certain need. Therefore the user has acquired the center and the software are pushed into periphery. A typical user consumes at least a few dozen software in a daily basis. Software vendors, therefore, cannot assume an expert user. Their product can be just the 'costume-of-the-day' for the user. In order to tackle these new conditions, the features need to be provided at the finger tips. Any configuration setting should be displayed only at the appropriate instance and the behavior of the software should adapt according to the user. Installation and uninstallation should be quick, light and clean.

Smart phone software are typical examples for this new domain of software. They are light in size and are easily installed. They come up with only a handful of features that are readily available through simple gestures. Removing them is just a matter of few taps. Even better is that they do not leave any trace behind. Any person having a smart phone can download an app, use it for a while and flush it from the device. No responsibility is left; just like in the commode toilet.

Things get more aligned with the toilet mechanism when it comes to cloud. What the cloud means is that the users no longer need to keep their heavy data in their devices. They are simply irresponsible on how and where they are kept, backed up and so on. They are readily available for them to CONSUME, DISPOSE AND FORGET. The same toilet theory applies when you are the software vendor who delivers cloud based products. Not only that you do not need to get the trouble of being responsible in managing the data, but also you do not have to hold the weight of a subscription to a data hosting service. You pay only when you consume it. CONSUME-DISPOSE-FORGET is in action even in the software vendors' domain.

No more experts

In the past, not only the software users were experts, but also the software producers. They were specialists in certain technologies. A C++ guy, a Java guy and so on. They were emotionally attached to their technology and were ready to go for wars for the protection of them. However, with the current rate of technology mutations, one can hardly become an expert in a technology. By the time somebody acquires expertise in a technology, it becomes obsolete. Therefore the viable model for software producers is to be open minded and be flexible to move between technologies rapidly. Quick learning ability, adaptability and flexibility are becoming the defining qualities of successful professionals. The cadres in this new workforce will not be emotionally attached to any piece of technology like their predecessors. The new model for the producer is LOAD-PRODUCE-UNLOAD-FORGET.

Frienditutes

To show the toilet model in action in the field of modern human relationships is a trivial task. I will take a simple example. Guys usually like to have a lot of good looking female friends. And there are times that a guy really needs to show this to others. However, similar to any other valuable thing in the world, this too doesn't come for free. To have good looking female friends a guy has to pay a certain price because females expect a lot from their male friends. He has to buy them expensive gifts, be a driver at times, keep on spending hours jabbering girly crap with them, etc. Dilbert creator Scott Adams suggested a modern solution for this which is based on the fact that most friend stuff are now happening in Facebook. The concept is named "frienditute" who is either a good looking female in Facebook or anyone who is smart enough to appear like a good looking female in Facebook. If you are a guy, all you need to do is hiring a frienditute for a period when you need to show that you have gorgeous girl friends. They will comment on your messages, write on your FB wall, say nice things about your pictures and so on. They will pretend to be good friends of you during the hired period. No long term costs in keeping relationships with beautiful girls...a simple analogy for CONSUMABLE-DISPOSABLE-FORGETTABLE human relationships which is becoming a norm in the modern world.

China - the rising sun

If there is a paradigm shift happening in the world affecting every facet of human life, shouldn't it be triggered and backed by a political body? It should and, of course it does. In order to identify this we only need to answer an easy question. What is the political regime that will dominate the world both politically and economically after few decades? Undoubtedly, it's the Chinese regime...and the Chinese politics is based on their most popular philosophy: Taoism. What does Taoism say? It asks us to 'live in this moment'; not the moment before, not the moment later. Taoism asks you to do whatever you are doing right now in your fullest potential...and then forget it...do not get emotionally attached to it...do not be responsible because by that time what you are currently engaged will be the past which is insignificant. This is exactly what the Chinese are doing all around the world and that is why they are so much successful. Take a Chinese product and you will see this philosophy in it. It comes for a very low price with almost all the features known for that kind of products. However, it is not attached to a big trade name and you should not talk about the durability of the product. Simple enough, isn't it? Now compare this with the Western world based on Christianity which says "the god is watching everything you do and you are responsible for what you do". This western dominance is now becoming the history with the speed of light. China is the rising sun.

Let's summarize. When you are the consumer CONSUME-DISPOSE-FORGET. When you are the producer LOAD-PRODUCE-UNLOAD-FORGET. As a final remark I ask my reader to be ready for this new world, both emotionally and intellectually. Do not get surprised if the friend who warmly shares your feelings today will behave like a complete stranger tomorrow.

Saturday, August 27, 2011

Should QA learn programming concepts?

Recently one of the QA engineers in Eurocenter started an interesting discussion on the skills required from a current day QA engineer. The discussion was opened in the linkedIn group of Eurocenter which we call "InsideOut". It is an open group and you can join from: http://alturl.com/qtron
One of the questions raised was whether the QA engineers should learn programming concepts in order to dig into the code written by developers to review the quality. Though this idea was favored by some other colleagues, my opinion was that this is not required and there are lot more valuable areas a modern QA engineer can look into, for improving the quality of a product. Following is my elaboration on this. I directly copied the following part from my comment in the group.

It is agreeable that the industry demands more from QA than doing conventional testing. Major reason behind this demand is that many quality aspects required from a software product are not covered in conventional testing which includes functional testing and traditional non-functional testing like load testing. In native world, for example, memory corruptions might not surface during normal testing. However, they can cause disasters in production environment. This puts the need for a QA engineer to have more insight into the operation of a software application than he observes in the application interface. There are two main avenues in achieving this.
1. Code level analysis
2. Operation level analysis
The former can be called white box testing and the later, black box testing. The important thing to note here is that peeping into the code or writing supplementary code is not the only way of getting more insight into an application.

Operation level analysis deals with analyzing the operation of a software in the context of a system (operating system for example) where the software is evaluated with respect to the changes it cause on the system, how it is affected by changes in the system and how it performs under various conditions in a system. The good thing about this domain is that many sophisticated tools for doing these started appearing with latest versions of operating systems. Microsoft Application Compatibility Toolkit, for example, can monitor the operation of a software and provide a detailed analysis into security problems, user access control problems, memory problems, etc. One other good example is Microsoft Application Verifier which can detect memory corruptions, memory leaks, low level operating system call failures, I/O overhead, resource allocation problems, improper locks, unsafe parallel access of data, dangerous API calls, etc. These are vital information that help in deciding the quality of software without looking at a single line of code. Being a native developer for few years I still cannot detect most problems revealed by Application verifier by examining code. There are a bunch of other useful tools of this kind that are bundled with the operating system itself. These tools are little known and are not given the attention they deserve. Even when they are used it is done by developers. However, I think QA engineers are the best people to use them to evaluate software.

Even if we think about tools like Sonar they provide a lot of metrics that manifest the quality of software without going into code level. Once we go into the code level there are hell of a lot of peculiarities like design patterns, compiler optimizations, API tweeks, hacks, etc. Since software product is more important than the code I think it will be more productive to analyze the quality of the software itself utilizing modern tools. Having said that I repeat that it's vital for a QA engineer to have knowledge to automate things using simple programming.

Wednesday, July 6, 2011

Speech in ICSCA 2011

I presented a paper in "2011 International Conference on Software and Computer Applications". Following is my speech along with corresponding slides.

Good Afternoon! I’m Dileepa from university of Moratuwa, Sri Lanka and I’m going to talk about a framework I developed for automated log file analysis.

First, I’ll explain the background and then the problem identification. After that, I’ll talk about the overview of the solution which is the new framework, and then the design and implementation of it. This section will include an experiment I did as a proof of concept. Finally I will conclude the work.

Software log files are analyzed for many reasons by different professionals. Testers use them to check the conformance of a software to a given functionality. For example, in a system where messages are passed between different processes, a QA engineer can perform a certain action and then check the log to see whether the correct messages are generated. The developers analyze logs mainly for troubleshooting. When something goes wrong in production sites or even when a bug is reported by an outsourced QA firm, the most useful resource available for the developer to troubleshoot is the application log file most of the time. Domain experts also use logs sometimes for troubleshooting and the system admins monitor logs to confirm that everything is working fine in the overall system level.

Now we see that it’s always a human user who analyzes a log file in a given scenario. However, with the increasing complexity of software systems and the demands for high speed high volume operations this complete manual process has become a near impossibility. First, one needs an expert for log file analysis which inflicts a cost and even with expertise it’s a labor intensive task. More often than not log file analysis is a repetitive and a boring task resulting in human errors. It’s highly likely that when analyzing a certain log for a period of time one can identify recurring patterns. Ideally those patterns should be automated. In most cases it is essential to automate at least a part of the analysis process.

However, automation is not free of challenges. One big problem is that log files have different structures and format. To make things worse, the structure and format change over time. There’s no platform to automate log analysis in a generic way. When automating analysis, one needs to create some rules and put them in a machine readable way. Then, to manage those rules or to reuse them, they need to be kept in a human readable way too. Keeping things both machine and human readable is not an easy task. Because of these challenges, most organizations completely abandon automation and others go for proprietary implementations in general purpose languages. That inflicts a significant cost because every log analysis procedure needs to be implemented from scratch without reuse. When implemented in a general purpose language the rules are not readable particularly for non-developers. If not designed properly to deal with changes with an additional cost, it will be difficult to add new rules later and handle log file format and structure changes. Another significant problem is that proprietary automations come up with fixed reports which cannot be customized.

So there are many facts that stand for the need for a common platform for generic log file analysis. Some level of support already exists. For example, we have xml which is a universal format used everywhere. It’s a good candidate for keeping log information. Many tools are freely available to process xml. However, xml comes with a cost; the spatial cost for meta data. This makes it inappropriate for certain kinds of logs. In addition it is not very human readable. There are many languages available for processing, but they look almost like other general purpose languages. They are not for non-developer. Not every log file is in xml. There are lot of other text formats plus binary formats.

Researchers have done some work on creating formal definitions for log files. They are based on regular expressions and assume a log file consisting of line entries. Therefore these existing definitions do not help with log files with complex structures which is very common. Also they are unable to handle difficult syntax that cannot be resolved with a regular grammar even in line logs. Another flaw is that these definitions do not take any advantage from xml.

What are the expected features from a framework for generic log file analysis? First, it needs to be able to handle the different and changing log file structures and formats. It also needs to come up with a knowledge representation schema which is both human and machine readable. Also it is important to have the ability to convert to and from xml for exploiting the power of existing xml tools. Due to the reasons I mentioned earlier, the new framework must be friendly to non-developers and be capable of generating custom reports.

Ok; this is the high level picture of the solution. Mainly it comprises three modules that lie on top of the new knowledge representation schema. The input to the system is a set of log files and the output is a set of reports. The first module, which is the Interpretation module is supposed to provide a “Unified mechanism for extracting information of interest from both text and binary log files with arbitrary structure and format”. In other words it is the part of the framework that helps one to express the structure and format of his log file and point to the information of interest. Output of this module will be the extracted information expressed in the knowledge representation mechanism. The Processing module is the one that keeps the expert knowledgebase to make inferences from this information. As mentioned here it is supposed to provide an “Easy mechanism to build and maintain a rule base for inferences”. What comes out of this module is a set of conclusions drawn on the information. After that it is a matter of presenting these findings to various stakeholders. This is exactly the responsibility of the next module, the “Presentation” module. It should provide “Flexible means for generating custom reports from inferences”.

One important selection here is the way of representing knowledge. This decision must be made carefully because the rest of the solution depends heavily on that. If there is a single factor that determines the success or failure of the entire solution it should be this. After analyzing the drawbacks of existing knowledge representation schemas and the current day’s requirements I decided to use mind map as the knowledge unit in the framework. Mind mapping is a popular activity used by people to quickly organize day-to-day actions, thoughts, plans and even lecture notes. Research proves that mind maps resemble the organization of knowledge in human brain than sequential text does. Therefore it is a good form for human readability. Because of its factual form it is easy to change and visualize contents of a mind map. On the other hand, computers also can process mind maps easily because they can be represented by tree which is a popular data structure that has been there from the beginning of computer programming. All the power of existing tree algorithms can be exploited when processing them. Since xml too can be mapped to a tree, mind maps are easily convertible to and from xml which opens up the door to utilize existing xml tools in processing. In addition, mind maps can be combined with each other in node level which is a desirable feature in mixing data from different sources.

This diagram shows the architecture of the entire system. Parser, Execution Engine, Meta data and the Data types constitute the new scripting language which I will be explaining later. Text and binary file readers serve for the Interpretation module. The system exposes its functionality via a programming interface which is marked here as the Control Code. In addition to the users of the generated reports, External systems also can interact with the system to use the analyzed data.

The framework includes a new scripting language targeting the three main phases in log file analysis. It is centered on mind maps and offers many convenient operations to handle them easily. All the syntax is configurable which means one can define his own syntax to make it look like a totally new language. One main application of this can be localized syntax. Configs for syntax is kept in a separate file in a per script basis. Since mind maps can grow into very big sizes when used for analyzing huge logs it is desirable to have strong filtering capabilities to bring out a set of nodes of interest at a glance. Our new language comes with advanced filtering capabilities for this. Most of them are similar to filtering features in jQuery. One other interesting feature is the statement chaining. With this one can write a long statement like a story in one line and perform operations in many nodes with a single function call. I’ll demonstrate this in the next slide. Then the new language supports built-in and custom data types, functions like all other languages.

The scripting language is specially designed to promote a programming model which I call the “Horizontal Programming Model”. This is inspired by the pattern of referencing in natural language. In a text written in natural language, each sentence can refer something mentioned in the previous sentence, but not something said many sentences before. This neighbor referencing model results in a human friendly flow of ideas much like a story. Horizontal programming is implemented by statement chaining coupled with filtering. A complete idea is expressed in only one or two lines of code. This small snippet is independent of the rest of the script. If we consider the script as the complete rule base then a snippet can be a single inference rule. This is more favored for a non-developer because it is closer to how an idea is expressed in human language. However, the typical general purpose language programming style which I call the “Vertical Programming Model” is also supported in case someone prefers it. This model is different because it promotes distant memory calls and growth of code in vertical direction. In the example provided in the blue box, the variable “Found” is defined in the 1^st line and referred only in the 10^th line thereafter. This model is better for expressing advanced logic since not everything can be done using the horizontal model.

This diagram briefs the final solution with respect to the solution overview we saw earlier. We have selected mind maps as the knowledge representation schema and the three modules of the solution are going to offer these mentioned features. All the three modules are driven by the new programming language and a set of complementary tools. It’s important to note that the same unified mechanism is capable of serving for significantly different needs that arise inside these three modules.

This diagram illustrates an example use case for the system. Software applications and monitoring tools generate log files and each log file is interpreted through a script. As a result we get a mind map for each log file containing the data extracted from it. Then another script is used to aggregate these data in a meaningful way into a single mind map. We can call this the data map. Now we apply the rule base on this data map to generate inferences. This may result in an inference mind map which can then be used either by external systems for their use or by the presentation script to generate a set of reports to be used by various stakeholders. Though this is not the only way to use the framework, this scenario covers most actions that are involved in a typical log analysis procedure.

With this we can conclude that “The new framework provides a unified platform for generic log analysis. It enables users to perform different tasks in a homogeneous fashion. In addition it formulates infrastructure for a shared rule base”. The possibility of a shared rule base is important because it gives so much power to organizations and communities dealing with same tools and software to reuse expert knowledge.

There are few possible improvements for the framework to make it more useful in the domain. Since some software applications and tools are widely used in software development, the framework can be accompanied with a set of scripts to interpret them so that not everyone has to come up with their own version. One drawback in using the framework’s scripting language for interpreting log files is that the script does not reflect the format and structure of the log file and its mapping to the mind map. Therefore the readability is poor. A solution for this would be developing a new declarative language to map the information of interest in a log file into a mind map and generate the script from the declaration under the hood. I have already done some work on this and have submitted a paper to another conference. Apparently most expert rules are easier put in vague terms than expressing in crisp logic. Therefore it would be a good idea to add the capability to the framework to work with fuzzy rules as well. Although it’s possible in the current implementation to write a script to generate custom reports, the task will be much more intuitive if the report format can be designed in a integrated development environment with a designer. Developing such a designer is one more interesting future improvement.

That ends the presentation and thanks for listening.

Wednesday, June 15, 2011

Smart with a smartphone

Recently I started using an iPhone 4 as we're into the business of iPhone apps development. I really like the technology stack used for developing iPhone apps. However, I was wandering whether there is a real value of the phone in day-to-day life compared to the relatively high cost of the device. Ability to check mails any time, take a picture and shoot a video, listening to music while on the road and sparing otherwise unproductive time engaged in an interesting activity (read a book, play a game) sound interesting, but do not account for the high price. There should be lot more uses for a smartphone in the current day's life. Gradually I started downloading more and more apps and inventing ways of integrating them to my way of living which, as I think, improved my quality of life at least by a small fraction. Following are my favorite apps and how I practically use them.

1. Notes

Now I do not borrow papers to meetings, trainings, etc. "Notes" app in iPhone provides a simple interface to type important stuff and recall easily. However, I need more integration to this such as an easy search and link with my mind maps.

2. SimpleMind

A great mind mapping app to quickly put thoughts into mind maps. Mind mapping is a very productive modern way of organizing and recording knowledge for easier recall and processing. I expect more from mind mapping apps. It will be really nice to have ability to combine mind maps and selectively show / hide content. A community mind map may be a great idea to share knowledge.

3. Mom Planner

Simple app to write tasks on start of day and mark as and when its done. Before using the iPhone I used to write down the tasks (both professional and personal) I'm supposed to complete within the day in a paper. This is a very effective way of managing daily tasks due to 2 reasons. First, it helps you to remind the tasks. Second, something written is always more effective and likely to be done than one just kept in mind. Now the smartphone provides an easier interface for this with many more capabilities than a paper.

4. Daily Deeds

This is an app targeted on people who want to develop little good habits and track the improvements. You can track what you did each day in favor of the new good habit and keep a record in a goal-oriented fashion.

5. Google maps

Very useful to find places when driving. One flaw is that the places in Sri Lanka are not properly geo-coded yet, so you cannot enter an address and find the geographic location correctly. However, it is still useful for finding roads.

6. Eurocenter maps

This is a map application we developed for offline maps. Great when there is no internet connection. In addition it has the capability to record paths you traversed. I tend to forget places I visit just once, so this is really smart functionality.

7. Flash light

Usage is obvious.

8. Calorie counter

Useful to find the approximate calorie count in restaurant food. This might be more useful in a country with a better food culture where you get exactly what you ordered in a restaurant. However, it is still useful in Sri Lanka too, to get a rough idea about the caloric load of food before ordering in a good restaurant.

9. Dictionary

I use a normal dictionary app called "English" and a wordnet dictionary app named "Wordweb". Both are highly useful. The wordnet dictionary helps to view the semantic relationships between words in the form of a taxonomy.

10. RunKeeper

This is an app to record the duration, length and calorie consumption in jogging sessions. It uses GPS to record the path and provides a simple interface to save the data both locally and into the web. Most interesting part is that it is done as a social networking activity so that I get connected with the other guys living in geographically closer locations who use the same app. You see the progression of each other's activities. This is a great way to motivate yourself to go for a workout.

11. Knots

A quick guide for tying various knots.

Although there are hundreds of thousands of apps available, only a very small fraction of them is useful for a particular person's life obviously. It highly depends on his or her interests and the way of living. I'm eager to know how others use smartphones productively in their lives. Currently I use only utility apps, but the business apps can add a lot too. Apparently smartphones together with social networking will revolutionize the people's lives. It is logical to expect to see more and more people using smartphones as an essential tool in their daily lives. The prices should go down and the data connections have to be more ubiquitous. Meanwhile, I will try do my homework to be ready for that new shift.

C++ Rocks!!

Windows 8 is on the horizon
while sad .Net fades away
Java fallen into devils hands
The devil they worship as Oracle

I'm still not completely hopeless
In the forsaken land of collapsed giants
'cos I'm under the shadow of a behemoth
whoz fertile from gone to unborn

Immortal native to every platform
where others come and go
They got no druthers but only you
when performance is on the go

Ending the wait for Godot
0x flaunts colors
Mesmorize I scream in joy
Here's the way to fly

Learnt to dive without scratchin'
you gave light through darkness
heart of a faithful developer
bestow I on you

Saturday, March 19, 2011

Automatic log file analysis

keywords: log data extraction, record expert knowledge, mind maps, expert systems, Application Verifier

I'm currently engaged in a research on automatic log file analysis. I came across this idea during my MSc research on software quality verification. When it comes to black box testing, there are many handy tools that analyse a certain aspect of an application. These aspects may be CPU utilization, memory consumption, IO efficiency or low level API call failures. One prominent problem associated is the requirement for expertise for using these tools. Even for experts the process takes a lot of time. For example, I have been using a free Microsoft tool called Application Verifier which keeps an eye on an application's virtual memory errors, heap errors, access failures due to improper access rights, incorrect usage of locks (which may result in hangs or crashes), exceptions, corrupt Windows handles, etc. It is a very useful tool to capture application errors that are impossible or extremely difficult to identify in a manual QA process. Even with experience it takes me about 2 days to test a product with this tool before a release. Given the hectic schedules close to a release, what happens more often than not is that I do not get a chance to do this test. One other problem is that there is no good way to record my analysis knowledge so that someone else or "something" else can perform the analysis if I'm busy with other stuff. Sequential text, which is the popular form of recording knowledge is not a good option in this case due to several reasons. First, it is difficult to write documents in sequential text form (I think most developers agree with me in this). Then it is difficult for someone to understand it due to the inherent ambiguous nature of natural language. Furthermore, a program (this is the "something" I was referring to) cannot understand it for performing an automated analysis.

Almost all the analysis tools that are out there generate some form of a log file. Big majority of them are text files; either xml or flat text. If we can come up with a mechanism to extract the information from these log files then the analysis procedure can be partly automated. The challenge here is to devise a scheme that can deal with the wide variety of proprietary structures of these log files. Though there are a bunch of tools available for log data extraction all of them are bound to a specific log file structure. All the log analysis tools I found are web log analyzers. They analyze the logs generated by either Apache web server or IIS. One cannot use them to analyze any other log file. An additional restriction is that the reports generated after the analysis are predefined. One cannot craft customized reports for a specific need.

There's one more dimension to highlight the importance of automated log file analysis. The majority of software products themselves generate log files. These logs are analyzed by product experts in troubleshooting. Each product has its own log file format and the knowledge required for reading the logs and making conclusions lies only within a limited group of product experts. With the maturity of a product, it is highly likely that some troubleshooting patterns emerge over time. However, there is no means for recording the knowledge on these recurring patterns for later use of the same expert, others or an automation program.

The tasks associated with log file analysis are information extraction, inference, report generation and expert knowledge recording. What I'm working on is a unified mechanism to automate all these tasks. I'm trying to do it with a new simple scripting language based on mind maps. I will write more about the solution in future with the progress of my research. Please keep me posted (dilj220@gmail.com) about:

Any automated log analysis tool known to you
Any other reason or scenario that comes to your mind for automated log file analysis
The features that you expect as a developer / QA engineer / product expert / manager from an automatic log file analysis tool

Friday, March 4, 2011

Software Quality Verifier Framework

I completed my MSc thesis couple of weeks back. My project was developing a Software Quality Verification Framework. Given the value of early bug detection in the software life cycle, the framework addresses both white box and black box testing.

White box testing
White box testing is implemented in two phases.
1. Commit time analysis - In this, the code is automatically analyzed when the developer tries to add new code or code changes to the code repository. Quality verification is done employing tools against a predefined set of rules. The commit is rejected if the code does not conform to the rules. The developer is informed with the reasons for rejection in the svn client interface. This functionality is implemented using svn hooks. Example output is as follows.

2. Offline analysis - A more thorough analysis is performed in an offline fashion, in the context of a nightly build, for example. Results of the analysis is displayed in a dashboard which shows various analytics and provides violation drill-downs to code level. Automatic emails can be configured to acknowledging various stakeholders on the overall health of the system and developer technical debt. This is implemented using a tool named Sonar (http://www.sonarsource.org/).

Black box testing
It was identified during the research that quite a number of tools have become available recently for evaluating a software product during its run without looking at the source code which generated it. Different tools evaluate a product on different aspects such as memory usage (corruptions, leaks), IO usage, operating system calls, performance, access right violations, etc. However, there's no tool that combines the results generated by these individual tools to automatically generate a product health profile like Sonar does with the white box testing tools. There are two main problems associated with the approach of using individual tools manually to perform tests.
1. Tool usage requires expertise and also is laborious
2. There's no way to record or automate a once identified troubleshooting (or evaluating) procedure

I thought about different solutions for this. Noting that almost all the tools generate a textual output in the form of a log file, I decided to implement a way to automatically extract the information of interest in a given context from those log files and generate reports for consumption by various parties like project managers, developers and technical leads. The output was a simple scripting language based on mind maps. The developers can write scripts in this language to extract information from various log files, derive conclusions based on them and generate reports.

Following is the architecture of the framework. I will blog more about the framework later.

Sunday, January 23, 2011

Installer for a IE browser helper object

Recently I developed an unmanaged browser helper object (BHO which is commonly known as plugin) for Internet Explorer. A good tutorial to follow for this is: http://msdn.microsoft.com/en-us/library/bb250489(v=vs.85).aspx#setup

However, this tutorial or any other document that I could found in the internet did not provide a direct guidance on creating an installer for the BHO. The challenging part is adding all the registry keys required for Internet Explorer to identify our BHO. Although installer templates are provided with Visual Studio for Microsoft Office plugins, I could not find an installer template for an IE BHO.

When the Visual Studio project for the BHO is built for the first time, VS automatically registers the BHO in the developer machine. Therefore what I did was identifying the registry changes caused by this process using a registry change tracker tool. The tool I used was regshot (http://sourceforge.net/projects/regshot/) which is an open source tool.

Then I used Wix (http://wix.sourceforge.net/) for building the installer. Following is the full code for the installer. Someone using Wix to build the installer will be able to directly use this code after changing the stuff specific to his BHO (name, location, etc). If some other installer builder tool is used, this will be still useful to find the exact registry keys that need to be added during installation.

Do not forget to replace the strings in all caps to the appropriate values for your BHO. Also change all the guids to new values. Replace 99C27324-6DA7-4340-A44F-2A57F3EF9A63 with the guid given for your BHO class and D37FECE4-5E7C-4956-A2E1-23605AFFC70C with the guid given for your type library. Both these values can be found in the .idl file for your plugin.

In addition to the plugin I’m installing the C runtime using the merge modules provided by visual studio. These merge module packages can be found in YOUR_PROGRAM_FILES_FOLDER\Common Files\Merge Modules.

<?xml version='1.0'?>

<?define VC_Runtime_Path = ‘FOLDER_CONTAINING_VC_RUNTIME_MERGE_MODULE_PACKAGES’ ?>

<Product Id='038D6D6B-FD9C-4ea2-A82C-EE77F3FA0D87' Name=’PRODUCT_NAME’ Language='1033'

Version='1.0.0.0' Manufacturer=YOUR_NAME UpgradeCode=’A_GUID_FOR_UPGRADE_CODE’>

<Package Description=’DESCRIPTION’

Comments=’COMMENTS’

Manufacturer=’YOUR_NAME’ InstallerVersion='300' Compressed='yes'/>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</RegistryKey>

</Component>

</Directory>

</DirectoryRef>

</Feature>

</Feature>

<Condition Level="1">VersionNT64</Condition>

</Feature>

</Product>

</Wix>