Re: [logs] AI/adaptive/heuristic syslog analysis

From: dgillettat_private
Date: Fri Dec 21 2001 - 02:25:15 PST

  • Next message: Tina Bird: "Re: [logs] AI/adaptive/heuristic syslog analysis"

      I have three basic concerns with this approach:
    
    1.  A stealthy/patient attacker might be able to stay "below radar" 
    while the system acclimates to his presence.  i.e. Normal/routine may 
    not equate to *authorized*.
    
    2.  Anent the recent thread about court admissability, it is likely 
    to become necessary to explain why such a system flagged some 
    particular traffic.  I haven't followed the field closely, but my 
    impression has long been that reporting/reproducing the learned 
    "reasoning" is a particularly thorny issue.
    
    3.  There remain persistent anecdotes to the effect that some 
    automated British defence system, during the 1982 Falklands war, 
    detected an incoming missile, identified it as an Exocet, and on that 
    basis classified it as "friendly" -- even though it was rapidly 
    closing on a British ship.  I think there has to remain some human 
    interface to the ruleset, so that for instance an administrator can 
    revoke permissions previously granted to some traffic.  I'm not sure 
    how else to get such a learning system to converge on policy changes 
    in an acceptable time.
    
    Dave Gillett
    
    
    On 20 Dec 2001, at 17:21, Tina Bird wrote:
    
    > Hi Jon --
    > 
    > Just in case you haven't yet seen this (but you might have,
    > given the SRI address in your headers):
    > 
    > http://www.sdl.sri.com/projects/emerald
    > 
    > is this first thing I've found in this category, in the
    > current round of revisions of my log analysis notes...
    > 
    > On Thu, 20 Dec 2001, Jon Stearley wrote:
    > 
    > > What experience, thinking/dreaming, and interest do people have in
    > > making the computer learn what is and isn't "normal" in syslog output?
    > > ie- having the computer process/classify syslog output (or, an
    > > arbitrary stream) and present it in a high signal/noise ratio manner?
    > > I'm not talking about writing regexps, I'm talking about having the
    > > computer infer/learn the regexps (characterization information,
    > > regardless of its form) over some training period (ie- ongoing), and
    > > then presenting the analysis in a high signal/noise ratio manner.  it
    > > could then tie into some action/response mechanism of which there are
    > > many to choose from, but my interests are mainly in the learning
    > > process.
    > > 
    > > My thinking/researching/hacking has ranged from using simple
    > > statistics, LZ77, natural language modelling (WHIRL, SRILM, others),
    > > and sequencing algorithms (only TEIRESIAS yet) in this effort.  It's
    > > basically a statistics/signal-processing problem imho.  I don't get
    > > paid for this and haven't put sufficient personal time on it to make a
    > > huge amount of progress/success, but it looks quite likely that I'll
    > > be able to spend more time on it in the coming year.
    > > 
    > > http://www.counterpane.com/log-analysis.html and other loggy spots I
    > > know of are mostly "expert system" based (ie- we enumerate the expert
    > > knowledge).  I know this ai/buzzword/etc approach is not particularly
    > > new, but it does appear to me to be unsolved - interesting, at least.
    > > 
    > > I'm basically polling for pointers, experience/advice, and
    > > collaborators.  Thanks!
    > > 
    > > 
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    > For additional commands, e-mail: loganalysis-helpat_private
    > 
    
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Fri Dec 21 2001 - 10:04:24 PST