Passive learning with logging frameworks

title:Passive learning with logging frameworks
keywords:passive learning logs
contact:J.J.G. Meijer MSc & M. Gerhold MSc & dr. M.I.A. Stoelinga


Passive learning is the process of generating (probablistic) state machines from finite traces. These traces are typically outputs from a particular system running. For instance, an example is a sequence of responses from a webserver. One could use these state machines to identify the behavior of the observed system. Once the behavior is identified, bugs could be fixed or additional features implemented/changed if necessary.

Passive learning often produces better state machines when there are a lot of traces identifying the behavior of the system. These traces are typically not long (~100 events), but there should be many (> 10.000).

We would like to have a procedure that can produce state machines from events from popular logging frameworks. Like log4j, log4net, or even rsyslogd (that generates logs for the Linux kernel). The main problem is that the logs generated by the mentioned frameworks are (without preprocessing) single traces; they are extremely long, and there are not a lot of them. These logs may be rotated, and stored as a log file for a single day. But typically the start and end of a day do not signify the start and end of a particular behavior, like starting and stopping a service.

The goal is to automate in the largest degree feasible, the splitting of such log files, e.g. by manually placing a few "barriers" in the log file and have an algorithm extend these barriers to the rest of the log file. These barriers signify the beginning and end of multiple traces; these traces can then be fed to passive learners such as DFASAT, or ProM.