Thoughts, Feelings, Confessions & some Tech Stuff - Dileepa Jayathilake's Blog: 99xq

Showing posts with label 99xq. Show all posts

Saturday, August 27, 2011

Should QA learn programming concepts?

Recently one of the QA engineers in Eurocenter started an interesting discussion on the skills required from a current day QA engineer. The discussion was opened in the linkedIn group of Eurocenter which we call "InsideOut". It is an open group and you can join from: http://alturl.com/qtron
One of the questions raised was whether the QA engineers should learn programming concepts in order to dig into the code written by developers to review the quality. Though this idea was favored by some other colleagues, my opinion was that this is not required and there are lot more valuable areas a modern QA engineer can look into, for improving the quality of a product. Following is my elaboration on this. I directly copied the following part from my comment in the group.

It is agreeable that the industry demands more from QA than doing conventional testing. Major reason behind this demand is that many quality aspects required from a software product are not covered in conventional testing which includes functional testing and traditional non-functional testing like load testing. In native world, for example, memory corruptions might not surface during normal testing. However, they can cause disasters in production environment. This puts the need for a QA engineer to have more insight into the operation of a software application than he observes in the application interface. There are two main avenues in achieving this.
1. Code level analysis
2. Operation level analysis
The former can be called white box testing and the later, black box testing. The important thing to note here is that peeping into the code or writing supplementary code is not the only way of getting more insight into an application.

Operation level analysis deals with analyzing the operation of a software in the context of a system (operating system for example) where the software is evaluated with respect to the changes it cause on the system, how it is affected by changes in the system and how it performs under various conditions in a system. The good thing about this domain is that many sophisticated tools for doing these started appearing with latest versions of operating systems. Microsoft Application Compatibility Toolkit, for example, can monitor the operation of a software and provide a detailed analysis into security problems, user access control problems, memory problems, etc. One other good example is Microsoft Application Verifier which can detect memory corruptions, memory leaks, low level operating system call failures, I/O overhead, resource allocation problems, improper locks, unsafe parallel access of data, dangerous API calls, etc. These are vital information that help in deciding the quality of software without looking at a single line of code. Being a native developer for few years I still cannot detect most problems revealed by Application verifier by examining code. There are a bunch of other useful tools of this kind that are bundled with the operating system itself. These tools are little known and are not given the attention they deserve. Even when they are used it is done by developers. However, I think QA engineers are the best people to use them to evaluate software.

Even if we think about tools like Sonar they provide a lot of metrics that manifest the quality of software without going into code level. Once we go into the code level there are hell of a lot of peculiarities like design patterns, compiler optimizations, API tweeks, hacks, etc. Since software product is more important than the code I think it will be more productive to analyze the quality of the software itself utilizing modern tools. Having said that I repeat that it's vital for a QA engineer to have knowledge to automate things using simple programming.

Wednesday, July 6, 2011

Speech in ICSCA 2011

I presented a paper in "2011 International Conference on Software and Computer Applications". Following is my speech along with corresponding slides.

Good Afternoon! I’m Dileepa from university of Moratuwa, Sri Lanka and I’m going to talk about a framework I developed for automated log file analysis.

First, I’ll explain the background and then the problem identification. After that, I’ll talk about the overview of the solution which is the new framework, and then the design and implementation of it. This section will include an experiment I did as a proof of concept. Finally I will conclude the work.

Software log files are analyzed for many reasons by different professionals. Testers use them to check the conformance of a software to a given functionality. For example, in a system where messages are passed between different processes, a QA engineer can perform a certain action and then check the log to see whether the correct messages are generated. The developers analyze logs mainly for troubleshooting. When something goes wrong in production sites or even when a bug is reported by an outsourced QA firm, the most useful resource available for the developer to troubleshoot is the application log file most of the time. Domain experts also use logs sometimes for troubleshooting and the system admins monitor logs to confirm that everything is working fine in the overall system level.

Now we see that it’s always a human user who analyzes a log file in a given scenario. However, with the increasing complexity of software systems and the demands for high speed high volume operations this complete manual process has become a near impossibility. First, one needs an expert for log file analysis which inflicts a cost and even with expertise it’s a labor intensive task. More often than not log file analysis is a repetitive and a boring task resulting in human errors. It’s highly likely that when analyzing a certain log for a period of time one can identify recurring patterns. Ideally those patterns should be automated. In most cases it is essential to automate at least a part of the analysis process.

However, automation is not free of challenges. One big problem is that log files have different structures and format. To make things worse, the structure and format change over time. There’s no platform to automate log analysis in a generic way. When automating analysis, one needs to create some rules and put them in a machine readable way. Then, to manage those rules or to reuse them, they need to be kept in a human readable way too. Keeping things both machine and human readable is not an easy task. Because of these challenges, most organizations completely abandon automation and others go for proprietary implementations in general purpose languages. That inflicts a significant cost because every log analysis procedure needs to be implemented from scratch without reuse. When implemented in a general purpose language the rules are not readable particularly for non-developers. If not designed properly to deal with changes with an additional cost, it will be difficult to add new rules later and handle log file format and structure changes. Another significant problem is that proprietary automations come up with fixed reports which cannot be customized.

So there are many facts that stand for the need for a common platform for generic log file analysis. Some level of support already exists. For example, we have xml which is a universal format used everywhere. It’s a good candidate for keeping log information. Many tools are freely available to process xml. However, xml comes with a cost; the spatial cost for meta data. This makes it inappropriate for certain kinds of logs. In addition it is not very human readable. There are many languages available for processing, but they look almost like other general purpose languages. They are not for non-developer. Not every log file is in xml. There are lot of other text formats plus binary formats.

Researchers have done some work on creating formal definitions for log files. They are based on regular expressions and assume a log file consisting of line entries. Therefore these existing definitions do not help with log files with complex structures which is very common. Also they are unable to handle difficult syntax that cannot be resolved with a regular grammar even in line logs. Another flaw is that these definitions do not take any advantage from xml.

What are the expected features from a framework for generic log file analysis? First, it needs to be able to handle the different and changing log file structures and formats. It also needs to come up with a knowledge representation schema which is both human and machine readable. Also it is important to have the ability to convert to and from xml for exploiting the power of existing xml tools. Due to the reasons I mentioned earlier, the new framework must be friendly to non-developers and be capable of generating custom reports.

Ok; this is the high level picture of the solution. Mainly it comprises three modules that lie on top of the new knowledge representation schema. The input to the system is a set of log files and the output is a set of reports. The first module, which is the Interpretation module is supposed to provide a “Unified mechanism for extracting information of interest from both text and binary log files with arbitrary structure and format”. In other words it is the part of the framework that helps one to express the structure and format of his log file and point to the information of interest. Output of this module will be the extracted information expressed in the knowledge representation mechanism. The Processing module is the one that keeps the expert knowledgebase to make inferences from this information. As mentioned here it is supposed to provide an “Easy mechanism to build and maintain a rule base for inferences”. What comes out of this module is a set of conclusions drawn on the information. After that it is a matter of presenting these findings to various stakeholders. This is exactly the responsibility of the next module, the “Presentation” module. It should provide “Flexible means for generating custom reports from inferences”.

One important selection here is the way of representing knowledge. This decision must be made carefully because the rest of the solution depends heavily on that. If there is a single factor that determines the success or failure of the entire solution it should be this. After analyzing the drawbacks of existing knowledge representation schemas and the current day’s requirements I decided to use mind map as the knowledge unit in the framework. Mind mapping is a popular activity used by people to quickly organize day-to-day actions, thoughts, plans and even lecture notes. Research proves that mind maps resemble the organization of knowledge in human brain than sequential text does. Therefore it is a good form for human readability. Because of its factual form it is easy to change and visualize contents of a mind map. On the other hand, computers also can process mind maps easily because they can be represented by tree which is a popular data structure that has been there from the beginning of computer programming. All the power of existing tree algorithms can be exploited when processing them. Since xml too can be mapped to a tree, mind maps are easily convertible to and from xml which opens up the door to utilize existing xml tools in processing. In addition, mind maps can be combined with each other in node level which is a desirable feature in mixing data from different sources.

This diagram shows the architecture of the entire system. Parser, Execution Engine, Meta data and the Data types constitute the new scripting language which I will be explaining later. Text and binary file readers serve for the Interpretation module. The system exposes its functionality via a programming interface which is marked here as the Control Code. In addition to the users of the generated reports, External systems also can interact with the system to use the analyzed data.

The framework includes a new scripting language targeting the three main phases in log file analysis. It is centered on mind maps and offers many convenient operations to handle them easily. All the syntax is configurable which means one can define his own syntax to make it look like a totally new language. One main application of this can be localized syntax. Configs for syntax is kept in a separate file in a per script basis. Since mind maps can grow into very big sizes when used for analyzing huge logs it is desirable to have strong filtering capabilities to bring out a set of nodes of interest at a glance. Our new language comes with advanced filtering capabilities for this. Most of them are similar to filtering features in jQuery. One other interesting feature is the statement chaining. With this one can write a long statement like a story in one line and perform operations in many nodes with a single function call. I’ll demonstrate this in the next slide. Then the new language supports built-in and custom data types, functions like all other languages.

The scripting language is specially designed to promote a programming model which I call the “Horizontal Programming Model”. This is inspired by the pattern of referencing in natural language. In a text written in natural language, each sentence can refer something mentioned in the previous sentence, but not something said many sentences before. This neighbor referencing model results in a human friendly flow of ideas much like a story. Horizontal programming is implemented by statement chaining coupled with filtering. A complete idea is expressed in only one or two lines of code. This small snippet is independent of the rest of the script. If we consider the script as the complete rule base then a snippet can be a single inference rule. This is more favored for a non-developer because it is closer to how an idea is expressed in human language. However, the typical general purpose language programming style which I call the “Vertical Programming Model” is also supported in case someone prefers it. This model is different because it promotes distant memory calls and growth of code in vertical direction. In the example provided in the blue box, the variable “Found” is defined in the 1^st line and referred only in the 10^th line thereafter. This model is better for expressing advanced logic since not everything can be done using the horizontal model.

This diagram briefs the final solution with respect to the solution overview we saw earlier. We have selected mind maps as the knowledge representation schema and the three modules of the solution are going to offer these mentioned features. All the three modules are driven by the new programming language and a set of complementary tools. It’s important to note that the same unified mechanism is capable of serving for significantly different needs that arise inside these three modules.

This diagram illustrates an example use case for the system. Software applications and monitoring tools generate log files and each log file is interpreted through a script. As a result we get a mind map for each log file containing the data extracted from it. Then another script is used to aggregate these data in a meaningful way into a single mind map. We can call this the data map. Now we apply the rule base on this data map to generate inferences. This may result in an inference mind map which can then be used either by external systems for their use or by the presentation script to generate a set of reports to be used by various stakeholders. Though this is not the only way to use the framework, this scenario covers most actions that are involved in a typical log analysis procedure.

With this we can conclude that “The new framework provides a unified platform for generic log analysis. It enables users to perform different tasks in a homogeneous fashion. In addition it formulates infrastructure for a shared rule base”. The possibility of a shared rule base is important because it gives so much power to organizations and communities dealing with same tools and software to reuse expert knowledge.

There are few possible improvements for the framework to make it more useful in the domain. Since some software applications and tools are widely used in software development, the framework can be accompanied with a set of scripts to interpret them so that not everyone has to come up with their own version. One drawback in using the framework’s scripting language for interpreting log files is that the script does not reflect the format and structure of the log file and its mapping to the mind map. Therefore the readability is poor. A solution for this would be developing a new declarative language to map the information of interest in a log file into a mind map and generate the script from the declaration under the hood. I have already done some work on this and have submitted a paper to another conference. Apparently most expert rules are easier put in vague terms than expressing in crisp logic. Therefore it would be a good idea to add the capability to the framework to work with fuzzy rules as well. Although it’s possible in the current implementation to write a script to generate custom reports, the task will be much more intuitive if the report format can be designed in a integrated development environment with a designer. Developing such a designer is one more interesting future improvement.

That ends the presentation and thanks for listening.

Monday, May 2, 2011

Bolero: The beauty of repetition

Some time back I got the opportunity to work in Goteborg (Gothenburg in English) which is the second largest city in Sweden. I was amazed by the rich culture of the city and was used to go to some event everyday after work with the intention of exploring the city as much as possible. Goteborg opera house was one of my favorite places. I watched about 10 shows in it. One of them was a ballet named 3xBollero. It comprised 3 dance performances to Ravel's Bolero. I particularly liked the third performance which was called "Episode 17" choreographed by Alexander Ekman. I guess other two performances were too advanced for me to grasp. It was the first time I listened to Bolero and I started liking it.

Bolero is a orchestral work by Maurice Ravel which is known to be ultimately romantic. Special thing about it is that it repeats the same short theme over and over again. Instead of a complicated long theme we hear the same piece repeating in different melodies. Each repetition adds a new color to the work and the listener does not get bored by the recurrence. I do not know much about music and I can describe my feeling as "A masterpiece highlighting the beauty of fine details of a simple thing using repetition". Bolero is said to be inspired by a painting created by Valentin Serov (Painting is the one in the beginning of the article). A brilliant performance of Bolero is available here:

In the modern world we get continuously bombarded with lot of stuff and usually are not given a chance to stop and look at something closely. Repetition is regarded as waste of time and people always ask for "new" things. We have been trained for fast moving TV screens for a long time. However, my experience is that I need some time to feel or understand something properly; even a really simple thing. I get emotionally attached with a thing only after keep looking at it for a long time. The same goes with problem solving too. I solve a math or engineering problem only when I have the patience to think about it repetitively for 100 times. After the same problem revolves in the head for a prolonged period, the patterns associated with it and the very reasons that made it a problem start emerging. After this stage the solution appears like a obvious one that should have been uncovered in the first see.

My opinion is that the overwhelming amount of data we get exposed to does more harm than it helps. Serious encounter with anything needs close attention and patience. Works like Bolero just remind this fact to us who are compelled to run with gigabyte speeds.

Friday, April 29, 2011

Did somebody say "It Depends"?

In a toast masters club, there is a role named "Ah Counter". The duty of this person is to listen carefully to the speakers during a session and mark the gap filler words and sounds that appear between meaningful phrases. They call them "clutch words". It includes sounds and words like "Ah", "Um", "So", "but", etc. In addition, if the speaker habitually puts words such as "kind of", "sort of", "you know", etc everywhere they are also regarded as clutch words. After the speech, Ah Counter provides a report of the number of clutch words uttered by a speaker. This helps a speaker to identify the most frequent clutch words used by him (mostly unknowingly) and to put a conscious effort during next speech to minimize them. I know that this feedback approach helps greatly from my experience. I was surprised by the high clutch word count reported regarding my initial speeches in the toast masters program (conducted in our company) because I did not utter any of them intentionally. However, after being cautious about them I was able to reduce the number significantly. During my last prepared speech I uttered only one or two clutch words.

Toast masters community says that the clutch word count is an indicator of the preparedness of a speech. The argument is that a speaker will tend to put those words to fill in gaps in his speech while thinking about the next thing to say if he is not well prepared. This is a sound argument and I realized the truth in it after noticing that I always get a significantly higher clutch word count in my unprepared speeches (there is a separate session in toast masters for quick topics) than I get in prepared ones. Almost every speaker in the club exhibited an improvement in clutch word usage throughout the program indicating that they are preparing better for speeches and are becoming better public speakers.

I think there is another meaning to the clutch word count too. In my opinion, a speaker may use a gap filler word like "kind of" or "sort of" when he is not sure about what he is saying. This form of clutch words can appear even in written forms like articles. For example, I used the term "gap filler words" in the beginning of the article when introducing clutch words. If I was not certain about the appropriateness of that term I would have written "sort of gap filler words". Ideally what I should have done in such a doubtful situation is to put some extra effort to verify the appropriateness of the term or to find a better alternative term. Instead of doing that I hide this uncertainty inside the term "sort of" so that I am not responsible even if the term that follows is not a good fit. This is none other than cheating. I am cheating to the listener or to the reader by concealing my laziness. After getting to know about clutch words, I noticed that many of my previous speeches and writings exploited this cheat trick and everyday I see other speakers and writers doing the same thing. The general rule is that if you use terms like "kind of" and "sort of" unnecessarily you really do not know what you are saying.

We all have heard people using the term "it depends" during technical discussions. Some of them put it in the beginning of every fact they talk about particularly when answering questions. Is there any meaning to this term whatsoever? As a listener, I already know that "it depends". Specifically I know that everything in the world depends on something else. I do not need to read Stephen Hawking's "A Brief History of Time" to understand it. What the heck is the need for a speaker to utter this term? I guess the reason is the same as mentioned in the previous paragraph. They say "it depends" because they do not really know what they are talking about. After saying "it depends" one can say anything. He is shielded from any criticism or questioning on what he says because he dilutes it with the first two words and hence does not stand for it. I suggest to regard "it depends" as a clutch word in technical speeches. It is okay to be used when the dependency really counts where the speaker is responsible for explaining each dependency and its effect.

The toast masters program helped me to get rid of unnecessary words in public speaking and to identify unprepared and dishonest speakers (and writers). I hope this article helped you, the reader, to be cautious about this. Your feedback is much appreciated. So please use the Comments section.

Friday, April 22, 2011

Breain teasers

Here are two interesting problems I came across recently. Take my words; they are worth giving a try. You might not come up with a correct answer quickly. The key is to keep on trying and never to give up. Providing this sort of exercises to the brain from time to time is a very good practice particularly to fight against decline of thinking power with aging.

(1) The problem of circles

I read this problem in a web site. It is really interesting. The challenge is to find the maximum number of closed regions that can be formed by overlapping a given number of equal sized circles. Each region that counts must fall within at least one circle. The answer for the case with two circles is obvious; we can form 3 regions. It doesn't even deserve a pictorial illustration. 3 circles is also a pretty simple case. It is shown in the picture in the beginning of the article. The result is 7 regions. What about the case with 4 circles? Now the problem is getting more challenging. However, I guess anyone with average thinking ability can come up with the answer after few minutes of brainstorming. The answer is illustrated below.

What we have done is placing the forth circle between the 3 circles in the orientation for the previous case. The result is 13 regions. However, there's a little problem. How can one be certain that this is the orientation yielding the maximum number of regions? 13 is the correct answer in this case, but how can we prove that it is?

We clearly observe that the difficulty of the problem problem grows exponentially with the number of circles. Furthermore we do not have a proof even if we come up with a decent solution. The issue is that the ad hoc thinking we used for the simple cases just do not work when the problem gets more complex. Ad hoc methods generally do not generate proofs either. The problem asks us to come out of the frame and think smarter to arrive at an enlightening solution. Can we do it? Yes, it is possible. All one needs to do is to defeat the inertia to engage in thinking and continue with willpower.

Here we go...My challenge is: What is the maximum number of regions that can be formed by overlapping 10 circles?

(2) The problem of Aluminium plates

This is a practical engineering problem. One of my friends who works in an Aluminium factory came up with this question regarding a problem encountered in their day to day operations. They get fairly big square shaped Aluminium plates and are asked to cut rectangular pieces of various sizes out of them. The requests come at different times so they do not know the sizes of all the pieces they need to cut from a single square. When they get a request for an Aluminium piece, they cut it from one of many available half cut Aluminium squares by guessing the best one to cut from. When few pieces are cut from a square it might look like this.

At a given time our friends have many plates with this type of shapes. The pieces are always cut with parallel edges to that of the initial square. Therefore there are no slant cuts. After a square is consumed to a level that is no more usable (no pieces with a given minimum size can be cut from the remainder) they add it to a recycling phase where those remainders are melted to make new squares. Understandably, this melting process is quite costly and optimum usage of an Aluminium square before recycling will definitely save them a good amount of money. Therefore the problem is to come up with a systematic way (an algorithm..forget this word if you are not a computer engineer) to cut pieces from Aluminium squares in an optimal way.

We can formulate the problem like this. Given all the shapes of remaining Aluminium squares and the size of the rectangular piece to cut, how to find the best square to cut from and the best place in it to cut the piece? Remember that figuring out the optimality criteria is also a part of the solution. After all what our friends in the Aluminium factory need is to cut more pieces from squares.

I will post my analysis and solutions to these problems in a later post. Until then, happy thinking.

Monday, March 28, 2011

If you don't swim, you miss!!

I'm trying to write an inspirational story. It is based on a recent experience of mine. I stole my theme from Osho. The heading of his one beautiful article was "If you swim, you miss". I twisted the title to match my topic.

I have been trying to learn swimming at least for past 7 years. I made several attempts from time to time. Each time I started with much enthusiasm and tried hard to learn the basics during the first few days. However, I hardly made any progress beyond floating in the water and drifting couple of meters as long as I could hold one breadth. I thought that swimming is not my thing and there's something terribly wrong in my body which hinders me from becoming a swimmer. This thought lead me to give up every time.

However, many incidents compelled me to give another try to learn it. Whenever I got into a swimming pool, may be after a pool side party or in a hotel during a trip, I had to keep my self in the shallow end standing on the pool like a kid which posed me in utter disgrace. Last year I bought a new swimming pool membership and thought to me, "This time you are either going to learn swimming or drown in the pool and die". This time I made a change; I got the help of a professional trainer.

I started with the same desperate condition. However, my coach gave me the most important advice in learning swimming: "Don't expect progressive results and never give up". This meant that I should not be discouraged if I do not see any progress within days of training. I asked him how long should I be trying to become a swimmer with basic skills. He said "about 30 days". At that point I realized why I was not successful in my previous attempts. In any of those attempts I did not try even closer to 30 days.

Having received the correct advices I started training with a huge will power. I did not care whether people laugh at me or whether I look ridiculous making lot of sounds and drinking pool water. I believed that there should be a light in the end of the tunnel. Even in the twentieth day of my training I did not perform much better than my first day's workout. However, my coach was correct!! When it came close to thirty days...suddenly and totally unexpectedly...I could swim a good length while performing all the basics well. I could not believe it for a moment. It was one of my happiest days in that year. After all those years of struggling I became a swimmer.

It is interesting to note that many changes in the human body take place in leaps but not as ramps. Many people misread this behavior as unresponsiveness. A medical book I recently read, "The secrets of miracle doctors", suggests that this fact is true for many health aspects. What you need to do is keep making small quantitative changes even if you don't see any response. Those small changes add up and trigger a significant qualitative change somewhere down the line. It might be an amazing coincidence that Marx and Engels say that "Continuous quantitative changes lead to sudden qualitative changes in the society". This might be what keeps the agitation of courageous socialist leaders even when they get almost no response from the society.

Learning swimming inspired me to try out the same strategy in other things too. Even if none of them works, I'm a happy regular swimmer now.

Friday, March 4, 2011

Software Quality Verifier Framework

I completed my MSc thesis couple of weeks back. My project was developing a Software Quality Verification Framework. Given the value of early bug detection in the software life cycle, the framework addresses both white box and black box testing.

White box testing
White box testing is implemented in two phases.
1. Commit time analysis - In this, the code is automatically analyzed when the developer tries to add new code or code changes to the code repository. Quality verification is done employing tools against a predefined set of rules. The commit is rejected if the code does not conform to the rules. The developer is informed with the reasons for rejection in the svn client interface. This functionality is implemented using svn hooks. Example output is as follows.

2. Offline analysis - A more thorough analysis is performed in an offline fashion, in the context of a nightly build, for example. Results of the analysis is displayed in a dashboard which shows various analytics and provides violation drill-downs to code level. Automatic emails can be configured to acknowledging various stakeholders on the overall health of the system and developer technical debt. This is implemented using a tool named Sonar (http://www.sonarsource.org/).

Black box testing
It was identified during the research that quite a number of tools have become available recently for evaluating a software product during its run without looking at the source code which generated it. Different tools evaluate a product on different aspects such as memory usage (corruptions, leaks), IO usage, operating system calls, performance, access right violations, etc. However, there's no tool that combines the results generated by these individual tools to automatically generate a product health profile like Sonar does with the white box testing tools. There are two main problems associated with the approach of using individual tools manually to perform tests.
1. Tool usage requires expertise and also is laborious
2. There's no way to record or automate a once identified troubleshooting (or evaluating) procedure

I thought about different solutions for this. Noting that almost all the tools generate a textual output in the form of a log file, I decided to implement a way to automatically extract the information of interest in a given context from those log files and generate reports for consumption by various parties like project managers, developers and technical leads. The output was a simple scripting language based on mind maps. The developers can write scripts in this language to extract information from various log files, derive conclusions based on them and generate reports.

Following is the architecture of the framework. I will blog more about the framework later.