Saturday, March 19, 2011

Automatic log file analysis


keywords: log data extraction, record expert knowledge, mind maps, expert systems, Application Verifier

I'm currently engaged in a research on automatic log file analysis. I came across this idea during my MSc research on software quality verification. When it comes to black box testing, there are many handy tools that analyse a certain aspect of an application. These aspects may be CPU utilization, memory consumption, IO efficiency or low level API call failures. One prominent problem associated is the requirement for expertise for using these tools. Even for experts the process takes a lot of time. For example, I have been using a free Microsoft tool called Application Verifier which keeps an eye on an application's virtual memory errors, heap errors, access failures due to improper access rights, incorrect usage of locks (which may result in hangs or crashes), exceptions, corrupt Windows handles, etc. It is a very useful tool to capture application errors that are impossible or extremely difficult to identify in a manual QA process. Even with experience it takes me about 2 days to test a product with this tool before a release. Given the hectic schedules close to a release, what happens more often than not is that I do not get a chance to do this test. One other problem is that there is no good way to record my analysis knowledge so that someone else or "something" else can perform the analysis if I'm busy with other stuff. Sequential text, which is the popular form of recording knowledge is not a good option in this case due to several reasons. First, it is difficult to write documents in sequential text form (I think most developers agree with me in this). Then it is difficult for someone to understand it due to the inherent ambiguous nature of natural language. Furthermore, a program (this is the "something" I was referring to) cannot understand it for performing an automated analysis.

Almost all the analysis tools that are out there generate some form of a log file. Big majority of them are text files; either xml or flat text. If we can come up with a mechanism to extract the information from these log files then the analysis procedure can be partly automated. The challenge here is to devise a scheme that can deal with the wide variety of proprietary structures of these log files. Though there are a bunch of tools available for log data extraction all of them are bound to a specific log file structure. All the log analysis tools I found are web log analyzers. They analyze the logs generated by either Apache web server or IIS. One cannot use them to analyze any other log file. An additional restriction is that the reports generated after the analysis are predefined. One cannot craft customized reports for a specific need.

There's one more dimension to highlight the importance of automated log file analysis. The majority of software products themselves generate log files. These logs are analyzed by product experts in troubleshooting. Each product has its own log file format and the knowledge required for reading the logs and making conclusions lies only within a limited group of product experts. With the maturity of a product, it is highly likely that some troubleshooting patterns emerge over time. However, there is no means for recording the knowledge on these recurring patterns for later use of the same expert, others or an automation program.

The tasks associated with log file analysis are information extraction, inference, report generation and expert knowledge recording. What I'm working on is a unified mechanism to automate all these tasks. I'm trying to do it with a new simple scripting language based on mind maps. I will write more about the solution in future with the progress of my research. Please keep me posted (dilj220@gmail.com) about:

  • Any automated log analysis tool known to you
  • Any other reason or scenario that comes to your mind for automated log file analysis
  • The features that you expect as a developer / QA engineer / product expert / manager from an automatic log file analysis tool

No comments:

Post a Comment