Hi, Stefano. Interesting to see somebody try taking a whack at this from an AI standpoint. I think you'll find that it is essential for you to scope the problem down to certain logs and certain situations; syslogs are generally free-form strings, and you will have trouble evaluating them for the same reason that the rule-based approaches do. My first instinct would be that you would want several conventional rule-based systems watching several logs, and feeding into a single "meta-log" that would serve as input to the NN. The rule-based systems would be responsible with replacing strings with numbers in real time. A crude example might be that you might decide you are interested in intrustion detection, so you might want an input row of normalized values corresponding to "log-value app-value msg-value server-value". You would assign a value for each log file, then for each process that writes to the logs, then for various keywords mfound in strings, then for what server generated it. For example, a message in /var/log/secure coming from sshd containing the word "bad" and on a machine located outside of your firewall might have high values for each of these categories, whereas a message in /var/log/cron.log containing the keyword "run-parts" and emanating from an internal workstation might be low across the board. You also might consider grammar matches rather than (or in addition to) keywords, if you can define a good trend ("error: noun verb noun") or something. These are just random thoughts-- been years since I trained a backprop. But I hope it helps get you started.... On Thu, 20 Jun 2002, Stefano Zanero wrote: > Good afternoon (or whatever) everyone, > > I need to excuse myself if my first post on this list will have a sort > of > "Life, the Universe and Everything" approach, but reading the last few > days > of posting and discussing briefly with Tina, you are just about the > right > audience for asking my questions. > > I'm currently working around an academic project to evaluate how and if > neural network (NN) systems can be used as outlyer detectors on system > logs, > to spot potential security breaches or anomalies. > > Some fixed points in my approach are, currently: > 1) avoid trying to compete with rule-based or signature-based systems > (so > called "misuse detection") on their ground: if an attack can be > described > with a signature, it should be looked for with a signature-based system, > not > an NN > > 2) trying to develop an approach as general as possible, keeping my > opportunities open until prototype development really begins > > 3) the chosen approach, for those with experience with neural > algorithms, is > unsupervised learning, but this could change if we feel that supervised > learning is appropriate and feasible. > > I was thus reading with great interest your posts about log > "normalization", > but I think that either I missed the beginning of the discussion or you > didn't discuss an important point: > WHAT DOES REALLY MATTER to be analyzed. > > A NN-based algorithm has serious performance issues to consider, and the > input to feed into it should be as compact and also with as few "fields" > as > possible. In addition, these fields should allow me to easily manipulate > them, and to convert them in numeric value with an adequate mapping (to > be > studied, this is in fact the core problem of the whole work). > > So, my questions are: > - WHAT should be analyzed ? > - HOW do you suggest to structure the "normalized" format of logs > extracted > from various sources > - IF and HOW you suggest to integrate this log with the results from a > network sniffer directly observing raw packets on the network > > I will gladly hear any input and if I was unclear in my message, please > expose any questions or doubt. > > Just a warning - I will be giving a lecture over the weekend so I could > delay of one or two days my answers - please don't get pissed of :P > > Stefano > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: loganalysis-unsubscribeat_private > For additional commands, e-mail: loganalysis-helpat_private > -- Chip Seraphine Unix Systems Administrator NeuStar, Inc charles.seraphineat_private V: 312 928 4643 M: 312 420 7049 --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Thu Jun 20 2002 - 08:18:34 PDT