Re: [logs] Logs & the great unification theory

From: Chip Seraphine (charles.seraphineat_private)
Date: Thu Jun 20 2002 - 06:53:25 PDT

  • Next message: Stefano Zanero: "[logs] Logs & the great unification theory"

    Hi, Stefano.  Interesting to see somebody try taking a whack at this from an 
    AI standpoint.  I think you'll find that it is essential for you to scope 
    the problem down to certain logs and certain situations; syslogs are 
    generally free-form strings, and you will have trouble evaluating them for 
    the same reason that the rule-based approaches do.
    
    My first instinct would be that you would want several conventional 
    rule-based systems watching several logs, and feeding into a single 
    "meta-log" that would serve as input to the NN.  The rule-based systems 
    would be responsible with replacing strings with numbers in real time.
    
    A crude example might be that you might decide you are interested in 
    intrustion detection, so you might want an input row of normalized values 
    corresponding to "log-value app-value msg-value server-value".  You would 
    assign a value for each log file, then for each process that writes to the 
    logs, then for various keywords mfound in strings, then for what server 
    generated it.  For example, a message in /var/log/secure coming from sshd 
    containing the word "bad" and on a machine located outside of your firewall 
    might have high values for each of these categories, whereas a message in 
    /var/log/cron.log containing the keyword "run-parts" and emanating from an 
    internal workstation might be low across the board.
    
    You also might consider grammar matches rather than (or in addition to) 
    keywords, if you can define a good trend ("error: noun verb noun") or 
    something.
    
    These are just random thoughts-- been years since I trained a backprop.  But 
    I hope it helps get you started....
    
    On Thu, 20 Jun 2002, Stefano Zanero wrote:
    
    > Good afternoon (or whatever) everyone,
    > 
    > I need to excuse myself if my first post on this list will have a sort
    > of
    > "Life, the Universe and Everything" approach, but reading the last few
    > days
    > of posting and discussing briefly with Tina, you are just about the
    > right
    > audience for asking my questions.
    > 
    > I'm currently working around an academic project to evaluate how and if
    > neural network (NN) systems can be used as outlyer detectors on system
    > logs,
    > to spot potential security breaches or anomalies.
    > 
    > Some fixed points in my approach are, currently:
    > 1) avoid trying to compete with rule-based or signature-based systems
    > (so
    > called "misuse detection") on their ground: if an attack can be
    > described
    > with a signature, it should be looked for with a signature-based system,
    > not
    > an NN
    > 
    > 2) trying to develop an approach as general as possible, keeping my
    > opportunities open until prototype development really begins
    > 
    > 3) the chosen approach, for those with experience with neural
    > algorithms, is
    > unsupervised learning, but this could change if we feel that supervised
    > learning is appropriate and feasible.
    > 
    > I was thus reading with great interest your posts about log
    > "normalization",
    > but I think that either I missed the beginning of the discussion or you
    > didn't discuss an important point:
    > WHAT DOES REALLY MATTER to be analyzed.
    > 
    > A NN-based algorithm has serious performance issues to consider, and the
    > input to feed into it should be as compact and also with as few "fields"
    > as
    > possible. In addition, these fields should allow me to easily manipulate
    > them, and to convert them in numeric value with an adequate mapping (to
    > be
    > studied, this is in fact the core problem of the whole work).
    > 
    > So, my questions are:
    > - WHAT should be analyzed ?
    > - HOW do you suggest to structure the "normalized" format of logs
    > extracted
    > from various sources
    > - IF and HOW you suggest to integrate this log with the results from a
    > network sniffer directly observing raw packets on the network
    > 
    > I will gladly hear any input and if I was unclear in my message, please
    > expose any questions or doubt.
    > 
    > Just a warning - I will be giving a lecture over the weekend so I could
    > delay of one or two days my answers - please don't get pissed of :P
    > 
    > Stefano
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    > For additional commands, e-mail: loganalysis-helpat_private
    > 
    
    -- 
    
    Chip Seraphine
    Unix Systems Administrator
    NeuStar, Inc
    charles.seraphineat_private
    V: 312 928 4643
    M: 312 420 7049
    
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Thu Jun 20 2002 - 08:18:34 PDT