[logs] What "should" be logged? (long)

From: Tina Bird (tbird@precision-guesswork.com)
Date: Tue Aug 20 2002 - 07:03:08 PDT

  • Next message: Paul Ebersman: "Re: [logs] Logging: World Domination"

    Good morning, Bennett,
    
    I think that some of the definitional difficulties we're having will be
    made clearer by my simply jumping into the fray and filling out my
    thoughts from last night, so I'm going to talk about issue #1, the sets of
    changes that applications and operating systems "ought" to log.
    
    On Tue, 20 Aug 2002, Bennett Todd wrote:
    
    > 2002-08-20-02:42:49 Tina Bird:
    > > 1) What sort of state changes "should" applications and operating systems
    > > log in the first place?  --> A standard for programmers
    >
    > Perhaps it's my Unix upbringing, but I think it's best to have
    > adjustable logging levels; certainly, "alert", "info", and "debug"
    > are reasonable. A program should log an alert (log a descriptive
    > message at alert priority) when a human really urgently needs to
    > take a look. "info" is good for logging routine behavior, when
    > routine actions are something for which it's likely that some sites
    > would want to do reporting or stats or whatever. And "debug" should
    > dump enough details to help track down when you've mis-configured
    > something --- when the gizmo is doing what you told it, rather than
    > what you wanted it to do.
    >
    I'm not arguing with this, but this answer doesn't particularly address
    the question I was trying to ask.  This is more a set of directives on how to
    prioritize the messages once they've been generated.  But I believe
    there's a set of generic-enough administrative and error conditions that
    can be defined to provide guidelines for developers who are curious about
    what they ought to do.  Once that list exists, developers and
    administrators can customize the severities to their own environment.
    
    So what's on the list?  Here's my start for operating systems, with notes
    and queries -- bearing in mind that one of the tasks at hand is to
    generate these conditions on everyone's favorte systems so we can do
    things like answer question #5 (what >doesn't< get logged) in an orderly
    fashion:
    
    - System startup: are there multiple run levels?  If so, system
    should
    record which level is starting in some way that a human can make sense of
    it
    - System shutdown: are there multiple modes of shutdown?  Does the system
    have any capacity to send "oh my god i'm going down" messages in the case
    of an emergency crash or power loss?  Are there distinctions between
    normal and abnormal shutdowns that can be differentiated in the logs?
    - File system full: including thresholds (default or user defined) -- boy
    wouldn't it be nice if the logs "automagically" included the three (or
    however many) biggest culprits in terms of file size or space consumed by
    a directory or folder in an error message?
    - Hardware failures: power supplies, network interfaces, etc.  I am
    relatively uneducated about hardware diagnostics, other than Cisco gear...
    - Logins: failed and successful; console, remote (what protocol if
    remote); anonymous account, unprivileged user account, privileged user
    account, including switches to other users (unprivileged, privileged) from
    user accounts
    - Account creation: failed and successful; adding new user ID, assigning
    rights and privileges to new user, adding password to new user
    - Account modification: failed and successful; assigning or removing
    rights and privileges, resetting password; privileged user or unprivileged
    user
    - Account removal: failed and successful
    - Account disabled: too many failed logins, account expired, etc.
    - Password/security information copied: failed and successful
    - System configuration change: failed and successful; including access
    control, network addressing, audit policy; who made change, what changed,
    from system kernel on out to user-level applications
    - Operating system patch applied: who applied patch, what system
    components changed, source of patch (?)
    - Network connections: failed and successful connection attempts;
    anonymous service, user-specific service, access to administrative tools
    or control connection; DNS zone transfers, etc.
    - Audit logs: failed and successful attempts to modify or clear audit logs
    - Object access: failed and successful attempts to read files, start or
    stop processes, etc (understanding that most organizations will not need
    or want this level of detail)
    
    *whew*
    
    I'm sure I've left things out, and I'm sure this can be sorted into a less
    intimidating list of message categories.  But in addition to worrying
    about format and how to handle the expected flow of data and how to
    protect audit data traveling across networks, we need to worry about what
    we expect to see.
    
    With regard to Marcus' post, this list represents my
    (not-sufficiently-caffeinated) first stab at a set of messages that could
    have standardized tokens across a variety of operating system platforms.
    
    In addition to these conditions, specific applications "should" record
    explicit errors when they fail to start due to misconfiguration (syslogd,
    anyone?); messages when they receive incorrect or unexpected input (yes, I
    know that in order to do this the programmer has to manage to detect
    incorrect or unexpected input, which is what creates buffer overflows in
    the first place, but since this is tbird daydreaming I'm allowed).
    
    What I'll do whilst everyone is discussing this list -- and fixing it ;-)
    -- is to start collecting samples of these messages from the data I've got
    and the machines in my lab.  And getting it on the Web site.
    
    Go to!  Bennett, does this clarify what I was getting at?
    
    tbird
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 07:08:41 PDT