Re: [logs] What "should" be logged? (long)

From: John Rowan Littell (littejoat_private)
Date: Tue Aug 20 2002 - 08:34:15 PDT

  • Next message: Lance Spitzner: "RE: [logs] What's normal?"

    tbird> 1) What sort of state changes "should" applications and operating systems
    tbird> log in the first place?  --> A standard for programmers
    [and big list of possible categories]
    
    I'll add one to that: resource utilization.  I'm thinking of questions
    like "how much memory did it take to complete this task" (or
    system/user time, or whatever).  This is going to be a lot more
    application specific, but I want it in the interest of planning.
    If I'm going to build a better server for foo, then these sorts of
    questions are the ones I need answered.
    
    tbird> - Object access: failed and successful attempts to read files, start or
    tbird> stop processes, etc (understanding that most organizations will not need
    tbird> or want this level of detail)
    
    I'll expand this one: object processing.  What was done to an object
    and the outcome of that action.  Mail queue ID was passed on to
    recipient after passing through virus filter; packet was dropped
    according to rule n...  I think that most organizations -would-
    want this level of detail, at least for some applications, if only
    for the pretty graphs they can generate.
    
    tbird> 3) Given a particular operating system and/or system purpose, what are
    tbird> (pick your favorite integer) 15 messages that pretty much always mean bad
    tbird> news: that the system has been compromised, that a catastrophic failure
    tbird> has happened, however we choose to define "bad news" for that "typical"
    tbird> environment?  What >>is<< "bad news"?  Do we have sample data?
    
    The problem here is that we can define a very small number of states
    in which a machine can be thought of as working properly.  However,
    there are many more states in which it is working improperly.  I'm
    sure we could come up with 15 great signs for really bad news, but
    I would argue that if you see one of those in your log file, you're
    already hosed.  What I want is the news 15 minutes prior that tells
    me the system has slipped out of optimal state.  Rarely does a
    system go from fully functional to critical (except when I get my
    rock hammer out) -- it slips, bit by bit, and we should be able to
    detect this (we can't always -- go back to question 1 and start
    logging the appropriate data).  The new red background on your
    website was probably preceded by a number of those inoccuous looking
    login failures, possibly from strange locations.  The disk failure
    was likely preceded by SCSI bus errors.  And so on.
    
    I'd be willing to crunch more sample log data, if you'd like.  Of
    course, we could start the log parsing debate up again as well.  This
    is a qustion of where we're going vs. where we are.  Perhaps my
    suggestion would be to have a bunch of sample signatures that we could
    pop into swatch or logcheck that would weed out (or in) some of the
    most common messages.  Include comments.
    
      --rowan
    
    -- 
    John "Rowan" Littell
    Systems Administrator
    Earlham College Computing Services
    http://www.earlham.edu/~littejo/
    
    
    

    _______________________________________________ LogAnalysis mailing list LogAnalysisat_private https://lists.shmoo.com/mailman/listinfo/loganalysis



    This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 11:47:57 PDT