I have three basic concerns with this approach: 1. A stealthy/patient attacker might be able to stay "below radar" while the system acclimates to his presence. i.e. Normal/routine may not equate to *authorized*. 2. Anent the recent thread about court admissability, it is likely to become necessary to explain why such a system flagged some particular traffic. I haven't followed the field closely, but my impression has long been that reporting/reproducing the learned "reasoning" is a particularly thorny issue. 3. There remain persistent anecdotes to the effect that some automated British defence system, during the 1982 Falklands war, detected an incoming missile, identified it as an Exocet, and on that basis classified it as "friendly" -- even though it was rapidly closing on a British ship. I think there has to remain some human interface to the ruleset, so that for instance an administrator can revoke permissions previously granted to some traffic. I'm not sure how else to get such a learning system to converge on policy changes in an acceptable time. Dave Gillett On 20 Dec 2001, at 17:21, Tina Bird wrote: > Hi Jon -- > > Just in case you haven't yet seen this (but you might have, > given the SRI address in your headers): > > http://www.sdl.sri.com/projects/emerald > > is this first thing I've found in this category, in the > current round of revisions of my log analysis notes... > > On Thu, 20 Dec 2001, Jon Stearley wrote: > > > What experience, thinking/dreaming, and interest do people have in > > making the computer learn what is and isn't "normal" in syslog output? > > ie- having the computer process/classify syslog output (or, an > > arbitrary stream) and present it in a high signal/noise ratio manner? > > I'm not talking about writing regexps, I'm talking about having the > > computer infer/learn the regexps (characterization information, > > regardless of its form) over some training period (ie- ongoing), and > > then presenting the analysis in a high signal/noise ratio manner. it > > could then tie into some action/response mechanism of which there are > > many to choose from, but my interests are mainly in the learning > > process. > > > > My thinking/researching/hacking has ranged from using simple > > statistics, LZ77, natural language modelling (WHIRL, SRILM, others), > > and sequencing algorithms (only TEIRESIAS yet) in this effort. It's > > basically a statistics/signal-processing problem imho. I don't get > > paid for this and haven't put sufficient personal time on it to make a > > huge amount of progress/success, but it looks quite likely that I'll > > be able to spend more time on it in the coming year. > > > > http://www.counterpane.com/log-analysis.html and other loggy spots I > > know of are mostly "expert system" based (ie- we enumerate the expert > > knowledge). I know this ai/buzzword/etc approach is not particularly > > new, but it does appear to me to be unsolved - interesting, at least. > > > > I'm basically polling for pointers, experience/advice, and > > collaborators. Thanks! > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: loganalysis-unsubscribeat_private > For additional commands, e-mail: loganalysis-helpat_private > --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Fri Dec 21 2001 - 10:04:24 PST