Re: [logs] Logs & the great unification theory

From: Chip Seraphine (charles.seraphineat_private)
Date: Thu Jun 20 2002 - 06:53:25 PDT

Next message: Stefano Zanero: "[logs] Logs & the great unification theory"

Previous message: Raistlin: "Re: [logs] Logs & the great unification theory"
Reply: Raistlin: "Re: [logs] Logs & the great unification theory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, Stefano.  Interesting to see somebody try taking a whack at this from an 
AI standpoint.  I think you'll find that it is essential for you to scope 
the problem down to certain logs and certain situations; syslogs are 
generally free-form strings, and you will have trouble evaluating them for 
the same reason that the rule-based approaches do.

My first instinct would be that you would want several conventional 
rule-based systems watching several logs, and feeding into a single 
"meta-log" that would serve as input to the NN.  The rule-based systems 
would be responsible with replacing strings with numbers in real time.

A crude example might be that you might decide you are interested in 
intrustion detection, so you might want an input row of normalized values 
corresponding to "log-value app-value msg-value server-value".  You would 
assign a value for each log file, then for each process that writes to the 
logs, then for various keywords mfound in strings, then for what server 
generated it.  For example, a message in /var/log/secure coming from sshd 
containing the word "bad" and on a machine located outside of your firewall 
might have high values for each of these categories, whereas a message in 
/var/log/cron.log containing the keyword "run-parts" and emanating from an 
internal workstation might be low across the board.

You also might consider grammar matches rather than (or in addition to) 
keywords, if you can define a good trend ("error: noun verb noun") or 
something.

These are just random thoughts-- been years since I trained a backprop.  But 
I hope it helps get you started....

On Thu, 20 Jun 2002, Stefano Zanero wrote:

> Good afternoon (or whatever) everyone,
> 
> I need to excuse myself if my first post on this list will have a sort
> of
> "Life, the Universe and Everything" approach, but reading the last few
> days
> of posting and discussing briefly with Tina, you are just about the
> right
> audience for asking my questions.
> 
> I'm currently working around an academic project to evaluate how and if
> neural network (NN) systems can be used as outlyer detectors on system
> logs,
> to spot potential security breaches or anomalies.
> 
> Some fixed points in my approach are, currently:
> 1) avoid trying to compete with rule-based or signature-based systems
> (so
> called "misuse detection") on their ground: if an attack can be
> described
> with a signature, it should be looked for with a signature-based system,
> not
> an NN
> 
> 2) trying to develop an approach as general as possible, keeping my
> opportunities open until prototype development really begins
> 
> 3) the chosen approach, for those with experience with neural
> algorithms, is
> unsupervised learning, but this could change if we feel that supervised
> learning is appropriate and feasible.
> 
> I was thus reading with great interest your posts about log
> "normalization",
> but I think that either I missed the beginning of the discussion or you
> didn't discuss an important point:
> WHAT DOES REALLY MATTER to be analyzed.
> 
> A NN-based algorithm has serious performance issues to consider, and the
> input to feed into it should be as compact and also with as few "fields"
> as
> possible. In addition, these fields should allow me to easily manipulate
> them, and to convert them in numeric value with an adequate mapping (to
> be
> studied, this is in fact the core problem of the whole work).
> 
> So, my questions are:
> - WHAT should be analyzed ?
> - HOW do you suggest to structure the "normalized" format of logs
> extracted
> from various sources
> - IF and HOW you suggest to integrate this log with the results from a
> network sniffer directly observing raw packets on the network
> 
> I will gladly hear any input and if I was unclear in my message, please
> expose any questions or doubt.
> 
> Just a warning - I will be giving a lecture over the weekend so I could
> delay of one or two days my answers - please don't get pissed of :P
> 
> Stefano
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: loganalysis-unsubscribeat_private
> For additional commands, e-mail: loganalysis-helpat_private
> 

-- 

Chip Seraphine
Unix Systems Administrator
NeuStar, Inc
charles.seraphineat_private
V: 312 928 4643
M: 312 420 7049

---------------------------------------------------------------------
To unsubscribe, e-mail: loganalysis-unsubscribeat_private
For additional commands, e-mail: loganalysis-helpat_private

Next message: Stefano Zanero: "[logs] Logs & the great unification theory"
Previous message: Raistlin: "Re: [logs] Logs & the great unification theory"
Reply: Raistlin: "Re: [logs] Logs & the great unification theory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Thu Jun 20 2002 - 08:18:34 PDT