Anyone interested is welcome to try some tools for rolling logs up system debug hills, see http://www.cs.sandia.gov/sisyphus/. Here's the README: Welcome to the sisyphus toolkit! Version 0.9beta (Nov 5, 2004) This is a snapshot of some tools created by a project with the following charter: With the specific goal of increasing supercomputer RAS (reliability, availability, and serviceability), we intend to produce a machine-learning analysis system which enables content-novice analysts to efficiently understand evolving trends, identify anomalies, and investigate cause-effect hypotheses in large multiple-source event log sets. Currently it provides two independant tools (teirify and slctify) which address the first two items above by automatically generating regular expressions of messages in your logfiles, categorized by increasing anomaly: common, deviant, and anomalous. Common are those types which occur at least k times (k is an input argument), deviant are messages which appear fewer than k times but are similar in content to common messages, and anomalous are messages which are completely anomalous in content and occurence. A simple GUI is included for efficient review of results. This provides an efficient means to define "normal", and thus provides a basis to detect "abnormal". See pdfs in doc/ieee_cluster04 for more details. The included version of SLCT is pretty heavily modified, but it still generates some non-ideal "message types" imho (because it is clustering frequent words, not frequent word combinations). Teiresias mines combinations, and so does do a bit better in this regard. However, SLCT's successor (http://kodu.neti.ee/~risto/loghound/) also specifically addresses this issue. Am investigating further... I'm actively working on using latent semantic analysis to analyze logs. I've currently got code to generate term-doc matricies, perform various term weightings, compute SVD and doc-doc similarities, and novel visualization (http://www.cs.sandia.gov/projects/VxInsight.html). These tools (except vxinsight) will be included in the 1.0 version of the sisyphus toolkit, but they're not quite ready for release. Contact me for more info if really interested. This toolkit is pretty much a hack. I find it useful and interesting so thought I'd make it available to others. I'll add more documentation etc as interest is expressed and time allows. Feedback and/or questions are welcome. Jon Stearley jrstear@private _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Fri Nov 05 2004 - 19:45:13 PST