[logs] sisyphus toolkit

From: Jon Stearley (jrstear@private)
Date: Fri Nov 05 2004 - 11:08:32 PST


Anyone interested is welcome to try some tools for rolling logs up
system debug hills, see http://www.cs.sandia.gov/sisyphus/.  
Here's the README:


		Welcome to the sisyphus toolkit!
		 Version 0.9beta (Nov 5, 2004)

This is a snapshot of some tools created by a project with the
following charter:
  With the specific goal of increasing supercomputer RAS (reliability,
  availability, and serviceability), we intend to produce a
  machine-learning analysis system which enables content-novice
  analysts to efficiently understand evolving trends, identify
  anomalies, and investigate cause-effect hypotheses in large
  multiple-source event log sets.

Currently it provides two independant tools (teirify and slctify)
which address the first two items above by automatically generating
regular expressions of messages in your logfiles, categorized by
increasing anomaly: common, deviant, and anomalous.  Common are those
types which occur at least k times (k is an input argument), deviant
are messages which appear fewer than k times but are similar in
content to common messages, and anomalous are messages which are
completely anomalous in content and occurence.  A simple GUI is
included for efficient review of results.  This provides an efficient
means to define "normal", and thus provides a basis to detect
"abnormal".  See pdfs in doc/ieee_cluster04 for more details.

The included version of SLCT is pretty heavily modified, but it still
generates some non-ideal "message types" imho (because it is
clustering frequent words, not frequent word combinations).  Teiresias
mines combinations, and so does do a bit better in this regard.
However, SLCT's successor (http://kodu.neti.ee/~risto/loghound/) also
specifically addresses this issue.  Am investigating further...

I'm actively working on using latent semantic analysis to analyze
logs.  I've currently got code to generate term-doc matricies, perform
various term weightings, compute SVD and doc-doc similarities, and
novel visualization (http://www.cs.sandia.gov/projects/VxInsight.html).  
These tools (except vxinsight) will be included in the 1.0 version of
the sisyphus toolkit, but they're not quite ready for release.
Contact me for more info if really interested.

This toolkit is pretty much a hack.  I find it useful and interesting
so thought I'd make it available to others.  I'll add more
documentation etc as interest is expressed and time allows.  Feedback
and/or questions are welcome.

Jon Stearley	jrstear@private

_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Fri Nov 05 2004 - 19:45:13 PST