Re: [logs] what is normal ?

From: Jon Stearley (jrstearat_private)
Date: Wed Oct 30 2002 - 09:52:08 PST

  • Next message: Yang Lee: "RE: [logs] what is normal ?"

    On Wed, Oct 30, 2002 at 07:27:12AM -0500, Chris Brenton wrote:
    > On Tue, 2002-10-29 at 23:04, Marcus J. Ranum wrote:
    > > Dale.Drewat_private wrote:
    > > >You need to be able to look for
    > > >"abnormal" patterns in log data
    > > 
    > > I'd like to know how to do this. Any pointers?
    > 
    > ;-)
    > 
    > I think Marcus himself has probably posted the best ideas along this
    > thread, namely his whole concept of "stateful logging". Key in on what
    > you know and understand to be normal, question everything else. The
    > entries might in fact be "abnormal", or they could be false positives.
    > Best way to tell is to have a clueful human on the back end sorting it
    > out. From there its just a matter of tweaking the system to reduce the
    > false positive rate.
    
    i've been tinkering with an ai-aided approach and will toss in my 2c.  
    
    i'm using the teiresias algorithm
    (http://cbcsrv.watson.ibm.com/Tspd.html) to classify log lines.  this
    does pretty well at classifying the lines (teiresias calls these motifs,
    i use it to basically write functional equivalents of regular
    expressions for me).  those lines which don't match a motif (depends on
    the l,w,k inputs to the algorithm) are "anomalous".  the total number of
    unique words and motifs in a given set of lines is easy to calculate and
    provides two additional (high-level) metrics.
    
    i'm currently working on how these metrics change over time, and
    calculating the time characteristics of the motifs (ie- this motif
    occurs (bursty, constant, periodic, etc) at some rate, aiming to cluster
    them into "event" groups (ie- these X lines usually appear together, at
    the following rate, over the following duration (rigidly fixed, or not),
    etc).  i expect that comparing all this over time and/or across (sets)
    of hosts should yeild some pretty useful anomaly analysis, we'll see...
    
    current code is rough but works, help is welcome (lemme know if
    interested)...
    
    --
    +--------------------------------------------------------------+
    | Jon Stearley                  (505) 845-7571  (FAX 844-2067) |
    | Compaq Federal LLC            High Performance Solutions     |
    | Sandia National Laboratories  Scalable Systems Integration   |
    +--------------------------------------------------------------+
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Wed Oct 30 2002 - 10:32:16 PST