Thoughts, Feelings, Confessions & some Tech Stuff - Dileepa Jayathilake's Blog: 99xb

Showing posts with label 99xb. Show all posts

Sunday, October 23, 2011

The toilet model of life

One thing that fascinates me in modern civilized world is the use of commode toilets. They are radically different from the means people used for toilet in olden days. What happens when a person wants to use a commode toilet? He has a need and sits on the toilet seat, does the thing and presses a button (or pushes a handle)...and the stuff are gone...they disappear into another reality after which he is NOT RESPONSIBLE about it. Nobody can point at something and say that "Hey, look, this is your stuff". However, the case with earlier practices would be so different. If we take, for example, the practice of going into jungle, "it" is there in the same reality after he does it. He sees it, smells it, and may even step on it the next time he walks into the jungle.

When observing the changes happening in the current days' world with respect to politics, education, technology and human relationships I have been smelling a paradigm shift. Lot of things are changing their shape radically. The change in technology is more prominent. If you ask an analyst he will load you with a lot of facts on the new trends and how you should be dealing with them. However, these are mere facts only. A logical man needs something more than facts. He needs a concrete philosophy through which the new changes can be understood, described and predicted. I have been searching for this philosophy for quite a time, and finally, I discovered it...not in books...not even in the web...but in the toilet. I call this new philosophy, "The Toilet Model of Life".

The tiny toy software

Let's take few examples from modern technology first. In the olden days software came as big packages. The installation was heavy and the users were supposed to go through a significant learning curve before start using them. In order to make a certain window appear, a user had to go to Options menu, select "Advanced" screen and navigate to the "Advanced" tab in it (which some people call the super-duper advanced screen) and check a configuration. New versions came in 2-3 years and had a whole bunch of new features added. Users of these software were specialists in them who knew through years of experience, the recipe needed to get something useful done from them. They were very much dependent on the few software they working with and were emotionally attached to them. Software had to be used with great RESPONSIBILITY and discipline. A long uninstallation procedure had to be followed when the user no longer needed the software. Even after the uninstallation, software usually left lot of traces behind in the system. Briefly said, the software was in the center and the users were in the periphery. However, what we see today are software with very short release cycles (as short as half an hour) that shape the product gradually according to fast changing market needs. Since the world wide web has opened up a super fast channel for software delivery, the users no longer need to wait for months or years for a particular software to show up. There are hundreds of software available to serve a certain need. Therefore the user has acquired the center and the software are pushed into periphery. A typical user consumes at least a few dozen software in a daily basis. Software vendors, therefore, cannot assume an expert user. Their product can be just the 'costume-of-the-day' for the user. In order to tackle these new conditions, the features need to be provided at the finger tips. Any configuration setting should be displayed only at the appropriate instance and the behavior of the software should adapt according to the user. Installation and uninstallation should be quick, light and clean.

Smart phone software are typical examples for this new domain of software. They are light in size and are easily installed. They come up with only a handful of features that are readily available through simple gestures. Removing them is just a matter of few taps. Even better is that they do not leave any trace behind. Any person having a smart phone can download an app, use it for a while and flush it from the device. No responsibility is left; just like in the commode toilet.

Things get more aligned with the toilet mechanism when it comes to cloud. What the cloud means is that the users no longer need to keep their heavy data in their devices. They are simply irresponsible on how and where they are kept, backed up and so on. They are readily available for them to CONSUME, DISPOSE AND FORGET. The same toilet theory applies when you are the software vendor who delivers cloud based products. Not only that you do not need to get the trouble of being responsible in managing the data, but also you do not have to hold the weight of a subscription to a data hosting service. You pay only when you consume it. CONSUME-DISPOSE-FORGET is in action even in the software vendors' domain.

No more experts

In the past, not only the software users were experts, but also the software producers. They were specialists in certain technologies. A C++ guy, a Java guy and so on. They were emotionally attached to their technology and were ready to go for wars for the protection of them. However, with the current rate of technology mutations, one can hardly become an expert in a technology. By the time somebody acquires expertise in a technology, it becomes obsolete. Therefore the viable model for software producers is to be open minded and be flexible to move between technologies rapidly. Quick learning ability, adaptability and flexibility are becoming the defining qualities of successful professionals. The cadres in this new workforce will not be emotionally attached to any piece of technology like their predecessors. The new model for the producer is LOAD-PRODUCE-UNLOAD-FORGET.

Frienditutes

To show the toilet model in action in the field of modern human relationships is a trivial task. I will take a simple example. Guys usually like to have a lot of good looking female friends. And there are times that a guy really needs to show this to others. However, similar to any other valuable thing in the world, this too doesn't come for free. To have good looking female friends a guy has to pay a certain price because females expect a lot from their male friends. He has to buy them expensive gifts, be a driver at times, keep on spending hours jabbering girly crap with them, etc. Dilbert creator Scott Adams suggested a modern solution for this which is based on the fact that most friend stuff are now happening in Facebook. The concept is named "frienditute" who is either a good looking female in Facebook or anyone who is smart enough to appear like a good looking female in Facebook. If you are a guy, all you need to do is hiring a frienditute for a period when you need to show that you have gorgeous girl friends. They will comment on your messages, write on your FB wall, say nice things about your pictures and so on. They will pretend to be good friends of you during the hired period. No long term costs in keeping relationships with beautiful girls...a simple analogy for CONSUMABLE-DISPOSABLE-FORGETTABLE human relationships which is becoming a norm in the modern world.

China - the rising sun

If there is a paradigm shift happening in the world affecting every facet of human life, shouldn't it be triggered and backed by a political body? It should and, of course it does. In order to identify this we only need to answer an easy question. What is the political regime that will dominate the world both politically and economically after few decades? Undoubtedly, it's the Chinese regime...and the Chinese politics is based on their most popular philosophy: Taoism. What does Taoism say? It asks us to 'live in this moment'; not the moment before, not the moment later. Taoism asks you to do whatever you are doing right now in your fullest potential...and then forget it...do not get emotionally attached to it...do not be responsible because by that time what you are currently engaged will be the past which is insignificant. This is exactly what the Chinese are doing all around the world and that is why they are so much successful. Take a Chinese product and you will see this philosophy in it. It comes for a very low price with almost all the features known for that kind of products. However, it is not attached to a big trade name and you should not talk about the durability of the product. Simple enough, isn't it? Now compare this with the Western world based on Christianity which says "the god is watching everything you do and you are responsible for what you do". This western dominance is now becoming the history with the speed of light. China is the rising sun.

Let's summarize. When you are the consumer CONSUME-DISPOSE-FORGET. When you are the producer LOAD-PRODUCE-UNLOAD-FORGET. As a final remark I ask my reader to be ready for this new world, both emotionally and intellectually. Do not get surprised if the friend who warmly shares your feelings today will behave like a complete stranger tomorrow.

Wednesday, July 6, 2011

Speech in ICSCA 2011

I presented a paper in "2011 International Conference on Software and Computer Applications". Following is my speech along with corresponding slides.

Good Afternoon! I’m Dileepa from university of Moratuwa, Sri Lanka and I’m going to talk about a framework I developed for automated log file analysis.

First, I’ll explain the background and then the problem identification. After that, I’ll talk about the overview of the solution which is the new framework, and then the design and implementation of it. This section will include an experiment I did as a proof of concept. Finally I will conclude the work.

Software log files are analyzed for many reasons by different professionals. Testers use them to check the conformance of a software to a given functionality. For example, in a system where messages are passed between different processes, a QA engineer can perform a certain action and then check the log to see whether the correct messages are generated. The developers analyze logs mainly for troubleshooting. When something goes wrong in production sites or even when a bug is reported by an outsourced QA firm, the most useful resource available for the developer to troubleshoot is the application log file most of the time. Domain experts also use logs sometimes for troubleshooting and the system admins monitor logs to confirm that everything is working fine in the overall system level.

Now we see that it’s always a human user who analyzes a log file in a given scenario. However, with the increasing complexity of software systems and the demands for high speed high volume operations this complete manual process has become a near impossibility. First, one needs an expert for log file analysis which inflicts a cost and even with expertise it’s a labor intensive task. More often than not log file analysis is a repetitive and a boring task resulting in human errors. It’s highly likely that when analyzing a certain log for a period of time one can identify recurring patterns. Ideally those patterns should be automated. In most cases it is essential to automate at least a part of the analysis process.

However, automation is not free of challenges. One big problem is that log files have different structures and format. To make things worse, the structure and format change over time. There’s no platform to automate log analysis in a generic way. When automating analysis, one needs to create some rules and put them in a machine readable way. Then, to manage those rules or to reuse them, they need to be kept in a human readable way too. Keeping things both machine and human readable is not an easy task. Because of these challenges, most organizations completely abandon automation and others go for proprietary implementations in general purpose languages. That inflicts a significant cost because every log analysis procedure needs to be implemented from scratch without reuse. When implemented in a general purpose language the rules are not readable particularly for non-developers. If not designed properly to deal with changes with an additional cost, it will be difficult to add new rules later and handle log file format and structure changes. Another significant problem is that proprietary automations come up with fixed reports which cannot be customized.

So there are many facts that stand for the need for a common platform for generic log file analysis. Some level of support already exists. For example, we have xml which is a universal format used everywhere. It’s a good candidate for keeping log information. Many tools are freely available to process xml. However, xml comes with a cost; the spatial cost for meta data. This makes it inappropriate for certain kinds of logs. In addition it is not very human readable. There are many languages available for processing, but they look almost like other general purpose languages. They are not for non-developer. Not every log file is in xml. There are lot of other text formats plus binary formats.

Researchers have done some work on creating formal definitions for log files. They are based on regular expressions and assume a log file consisting of line entries. Therefore these existing definitions do not help with log files with complex structures which is very common. Also they are unable to handle difficult syntax that cannot be resolved with a regular grammar even in line logs. Another flaw is that these definitions do not take any advantage from xml.

What are the expected features from a framework for generic log file analysis? First, it needs to be able to handle the different and changing log file structures and formats. It also needs to come up with a knowledge representation schema which is both human and machine readable. Also it is important to have the ability to convert to and from xml for exploiting the power of existing xml tools. Due to the reasons I mentioned earlier, the new framework must be friendly to non-developers and be capable of generating custom reports.

Ok; this is the high level picture of the solution. Mainly it comprises three modules that lie on top of the new knowledge representation schema. The input to the system is a set of log files and the output is a set of reports. The first module, which is the Interpretation module is supposed to provide a “Unified mechanism for extracting information of interest from both text and binary log files with arbitrary structure and format”. In other words it is the part of the framework that helps one to express the structure and format of his log file and point to the information of interest. Output of this module will be the extracted information expressed in the knowledge representation mechanism. The Processing module is the one that keeps the expert knowledgebase to make inferences from this information. As mentioned here it is supposed to provide an “Easy mechanism to build and maintain a rule base for inferences”. What comes out of this module is a set of conclusions drawn on the information. After that it is a matter of presenting these findings to various stakeholders. This is exactly the responsibility of the next module, the “Presentation” module. It should provide “Flexible means for generating custom reports from inferences”.

One important selection here is the way of representing knowledge. This decision must be made carefully because the rest of the solution depends heavily on that. If there is a single factor that determines the success or failure of the entire solution it should be this. After analyzing the drawbacks of existing knowledge representation schemas and the current day’s requirements I decided to use mind map as the knowledge unit in the framework. Mind mapping is a popular activity used by people to quickly organize day-to-day actions, thoughts, plans and even lecture notes. Research proves that mind maps resemble the organization of knowledge in human brain than sequential text does. Therefore it is a good form for human readability. Because of its factual form it is easy to change and visualize contents of a mind map. On the other hand, computers also can process mind maps easily because they can be represented by tree which is a popular data structure that has been there from the beginning of computer programming. All the power of existing tree algorithms can be exploited when processing them. Since xml too can be mapped to a tree, mind maps are easily convertible to and from xml which opens up the door to utilize existing xml tools in processing. In addition, mind maps can be combined with each other in node level which is a desirable feature in mixing data from different sources.

This diagram shows the architecture of the entire system. Parser, Execution Engine, Meta data and the Data types constitute the new scripting language which I will be explaining later. Text and binary file readers serve for the Interpretation module. The system exposes its functionality via a programming interface which is marked here as the Control Code. In addition to the users of the generated reports, External systems also can interact with the system to use the analyzed data.

The framework includes a new scripting language targeting the three main phases in log file analysis. It is centered on mind maps and offers many convenient operations to handle them easily. All the syntax is configurable which means one can define his own syntax to make it look like a totally new language. One main application of this can be localized syntax. Configs for syntax is kept in a separate file in a per script basis. Since mind maps can grow into very big sizes when used for analyzing huge logs it is desirable to have strong filtering capabilities to bring out a set of nodes of interest at a glance. Our new language comes with advanced filtering capabilities for this. Most of them are similar to filtering features in jQuery. One other interesting feature is the statement chaining. With this one can write a long statement like a story in one line and perform operations in many nodes with a single function call. I’ll demonstrate this in the next slide. Then the new language supports built-in and custom data types, functions like all other languages.

The scripting language is specially designed to promote a programming model which I call the “Horizontal Programming Model”. This is inspired by the pattern of referencing in natural language. In a text written in natural language, each sentence can refer something mentioned in the previous sentence, but not something said many sentences before. This neighbor referencing model results in a human friendly flow of ideas much like a story. Horizontal programming is implemented by statement chaining coupled with filtering. A complete idea is expressed in only one or two lines of code. This small snippet is independent of the rest of the script. If we consider the script as the complete rule base then a snippet can be a single inference rule. This is more favored for a non-developer because it is closer to how an idea is expressed in human language. However, the typical general purpose language programming style which I call the “Vertical Programming Model” is also supported in case someone prefers it. This model is different because it promotes distant memory calls and growth of code in vertical direction. In the example provided in the blue box, the variable “Found” is defined in the 1^st line and referred only in the 10^th line thereafter. This model is better for expressing advanced logic since not everything can be done using the horizontal model.

This diagram briefs the final solution with respect to the solution overview we saw earlier. We have selected mind maps as the knowledge representation schema and the three modules of the solution are going to offer these mentioned features. All the three modules are driven by the new programming language and a set of complementary tools. It’s important to note that the same unified mechanism is capable of serving for significantly different needs that arise inside these three modules.

This diagram illustrates an example use case for the system. Software applications and monitoring tools generate log files and each log file is interpreted through a script. As a result we get a mind map for each log file containing the data extracted from it. Then another script is used to aggregate these data in a meaningful way into a single mind map. We can call this the data map. Now we apply the rule base on this data map to generate inferences. This may result in an inference mind map which can then be used either by external systems for their use or by the presentation script to generate a set of reports to be used by various stakeholders. Though this is not the only way to use the framework, this scenario covers most actions that are involved in a typical log analysis procedure.

With this we can conclude that “The new framework provides a unified platform for generic log analysis. It enables users to perform different tasks in a homogeneous fashion. In addition it formulates infrastructure for a shared rule base”. The possibility of a shared rule base is important because it gives so much power to organizations and communities dealing with same tools and software to reuse expert knowledge.

There are few possible improvements for the framework to make it more useful in the domain. Since some software applications and tools are widely used in software development, the framework can be accompanied with a set of scripts to interpret them so that not everyone has to come up with their own version. One drawback in using the framework’s scripting language for interpreting log files is that the script does not reflect the format and structure of the log file and its mapping to the mind map. Therefore the readability is poor. A solution for this would be developing a new declarative language to map the information of interest in a log file into a mind map and generate the script from the declaration under the hood. I have already done some work on this and have submitted a paper to another conference. Apparently most expert rules are easier put in vague terms than expressing in crisp logic. Therefore it would be a good idea to add the capability to the framework to work with fuzzy rules as well. Although it’s possible in the current implementation to write a script to generate custom reports, the task will be much more intuitive if the report format can be designed in a integrated development environment with a designer. Developing such a designer is one more interesting future improvement.

That ends the presentation and thanks for listening.

Friday, April 29, 2011

Did somebody say "It Depends"?

In a toast masters club, there is a role named "Ah Counter". The duty of this person is to listen carefully to the speakers during a session and mark the gap filler words and sounds that appear between meaningful phrases. They call them "clutch words". It includes sounds and words like "Ah", "Um", "So", "but", etc. In addition, if the speaker habitually puts words such as "kind of", "sort of", "you know", etc everywhere they are also regarded as clutch words. After the speech, Ah Counter provides a report of the number of clutch words uttered by a speaker. This helps a speaker to identify the most frequent clutch words used by him (mostly unknowingly) and to put a conscious effort during next speech to minimize them. I know that this feedback approach helps greatly from my experience. I was surprised by the high clutch word count reported regarding my initial speeches in the toast masters program (conducted in our company) because I did not utter any of them intentionally. However, after being cautious about them I was able to reduce the number significantly. During my last prepared speech I uttered only one or two clutch words.

Toast masters community says that the clutch word count is an indicator of the preparedness of a speech. The argument is that a speaker will tend to put those words to fill in gaps in his speech while thinking about the next thing to say if he is not well prepared. This is a sound argument and I realized the truth in it after noticing that I always get a significantly higher clutch word count in my unprepared speeches (there is a separate session in toast masters for quick topics) than I get in prepared ones. Almost every speaker in the club exhibited an improvement in clutch word usage throughout the program indicating that they are preparing better for speeches and are becoming better public speakers.

I think there is another meaning to the clutch word count too. In my opinion, a speaker may use a gap filler word like "kind of" or "sort of" when he is not sure about what he is saying. This form of clutch words can appear even in written forms like articles. For example, I used the term "gap filler words" in the beginning of the article when introducing clutch words. If I was not certain about the appropriateness of that term I would have written "sort of gap filler words". Ideally what I should have done in such a doubtful situation is to put some extra effort to verify the appropriateness of the term or to find a better alternative term. Instead of doing that I hide this uncertainty inside the term "sort of" so that I am not responsible even if the term that follows is not a good fit. This is none other than cheating. I am cheating to the listener or to the reader by concealing my laziness. After getting to know about clutch words, I noticed that many of my previous speeches and writings exploited this cheat trick and everyday I see other speakers and writers doing the same thing. The general rule is that if you use terms like "kind of" and "sort of" unnecessarily you really do not know what you are saying.

We all have heard people using the term "it depends" during technical discussions. Some of them put it in the beginning of every fact they talk about particularly when answering questions. Is there any meaning to this term whatsoever? As a listener, I already know that "it depends". Specifically I know that everything in the world depends on something else. I do not need to read Stephen Hawking's "A Brief History of Time" to understand it. What the heck is the need for a speaker to utter this term? I guess the reason is the same as mentioned in the previous paragraph. They say "it depends" because they do not really know what they are talking about. After saying "it depends" one can say anything. He is shielded from any criticism or questioning on what he says because he dilutes it with the first two words and hence does not stand for it. I suggest to regard "it depends" as a clutch word in technical speeches. It is okay to be used when the dependency really counts where the speaker is responsible for explaining each dependency and its effect.

The toast masters program helped me to get rid of unnecessary words in public speaking and to identify unprepared and dishonest speakers (and writers). I hope this article helped you, the reader, to be cautious about this. Your feedback is much appreciated. So please use the Comments section.

Saturday, March 19, 2011

Automatic log file analysis

keywords: log data extraction, record expert knowledge, mind maps, expert systems, Application Verifier

I'm currently engaged in a research on automatic log file analysis. I came across this idea during my MSc research on software quality verification. When it comes to black box testing, there are many handy tools that analyse a certain aspect of an application. These aspects may be CPU utilization, memory consumption, IO efficiency or low level API call failures. One prominent problem associated is the requirement for expertise for using these tools. Even for experts the process takes a lot of time. For example, I have been using a free Microsoft tool called Application Verifier which keeps an eye on an application's virtual memory errors, heap errors, access failures due to improper access rights, incorrect usage of locks (which may result in hangs or crashes), exceptions, corrupt Windows handles, etc. It is a very useful tool to capture application errors that are impossible or extremely difficult to identify in a manual QA process. Even with experience it takes me about 2 days to test a product with this tool before a release. Given the hectic schedules close to a release, what happens more often than not is that I do not get a chance to do this test. One other problem is that there is no good way to record my analysis knowledge so that someone else or "something" else can perform the analysis if I'm busy with other stuff. Sequential text, which is the popular form of recording knowledge is not a good option in this case due to several reasons. First, it is difficult to write documents in sequential text form (I think most developers agree with me in this). Then it is difficult for someone to understand it due to the inherent ambiguous nature of natural language. Furthermore, a program (this is the "something" I was referring to) cannot understand it for performing an automated analysis.

Almost all the analysis tools that are out there generate some form of a log file. Big majority of them are text files; either xml or flat text. If we can come up with a mechanism to extract the information from these log files then the analysis procedure can be partly automated. The challenge here is to devise a scheme that can deal with the wide variety of proprietary structures of these log files. Though there are a bunch of tools available for log data extraction all of them are bound to a specific log file structure. All the log analysis tools I found are web log analyzers. They analyze the logs generated by either Apache web server or IIS. One cannot use them to analyze any other log file. An additional restriction is that the reports generated after the analysis are predefined. One cannot craft customized reports for a specific need.

There's one more dimension to highlight the importance of automated log file analysis. The majority of software products themselves generate log files. These logs are analyzed by product experts in troubleshooting. Each product has its own log file format and the knowledge required for reading the logs and making conclusions lies only within a limited group of product experts. With the maturity of a product, it is highly likely that some troubleshooting patterns emerge over time. However, there is no means for recording the knowledge on these recurring patterns for later use of the same expert, others or an automation program.

The tasks associated with log file analysis are information extraction, inference, report generation and expert knowledge recording. What I'm working on is a unified mechanism to automate all these tasks. I'm trying to do it with a new simple scripting language based on mind maps. I will write more about the solution in future with the progress of my research. Please keep me posted (dilj220@gmail.com) about:

Any automated log analysis tool known to you
Any other reason or scenario that comes to your mind for automated log file analysis
The features that you expect as a developer / QA engineer / product expert / manager from an automatic log file analysis tool

Friday, March 4, 2011

Software Quality Verifier Framework

I completed my MSc thesis couple of weeks back. My project was developing a Software Quality Verification Framework. Given the value of early bug detection in the software life cycle, the framework addresses both white box and black box testing.

White box testing
White box testing is implemented in two phases.
1. Commit time analysis - In this, the code is automatically analyzed when the developer tries to add new code or code changes to the code repository. Quality verification is done employing tools against a predefined set of rules. The commit is rejected if the code does not conform to the rules. The developer is informed with the reasons for rejection in the svn client interface. This functionality is implemented using svn hooks. Example output is as follows.

2. Offline analysis - A more thorough analysis is performed in an offline fashion, in the context of a nightly build, for example. Results of the analysis is displayed in a dashboard which shows various analytics and provides violation drill-downs to code level. Automatic emails can be configured to acknowledging various stakeholders on the overall health of the system and developer technical debt. This is implemented using a tool named Sonar (http://www.sonarsource.org/).

Black box testing
It was identified during the research that quite a number of tools have become available recently for evaluating a software product during its run without looking at the source code which generated it. Different tools evaluate a product on different aspects such as memory usage (corruptions, leaks), IO usage, operating system calls, performance, access right violations, etc. However, there's no tool that combines the results generated by these individual tools to automatically generate a product health profile like Sonar does with the white box testing tools. There are two main problems associated with the approach of using individual tools manually to perform tests.
1. Tool usage requires expertise and also is laborious
2. There's no way to record or automate a once identified troubleshooting (or evaluating) procedure

I thought about different solutions for this. Noting that almost all the tools generate a textual output in the form of a log file, I decided to implement a way to automatically extract the information of interest in a given context from those log files and generate reports for consumption by various parties like project managers, developers and technical leads. The output was a simple scripting language based on mind maps. The developers can write scripts in this language to extract information from various log files, derive conclusions based on them and generate reports.

Following is the architecture of the framework. I will blog more about the framework later.