[logs] Logging: World Domination

From: Tina Bird (tbird@precision-guesswork.com)
Date: Mon Aug 19 2002 - 23:42:49 PDT

  • Next message: Bennett Todd: "Re: [logs] Logging: World Domination"

    I run into a variety of list members at conferences and such.  Those of
    you who have seen me in person in the last six months have probably heard
    parts of my "why logging sucks" rant, and may have heard me threaten to
    start a couple of list discussions related to those issues.  I've been
    threatening to start consensus building (i.e. stating my claims on the
    list and watching those of you with strong opinions correct me) on the
    following issues:
    
    1) What sort of state changes "should" applications and operating systems
    log in the first place?  --> A standard for programmers
    2) Given a particular operating system and/or system purpose (such as a
    UNIX mail server, or a Windows Domain Controller, or whatever), what are
    the (pick your favorite integer) 15 most frequently logged messages in
    the elusive "typical" environment?  What do they mean?  Do we have sample
    data?
    3) Given a particular operating system and/or system purpose, what are
    (pick your favorite integer) 15 messages that pretty much always mean bad
    news: that the system has been compromised, that a catastrophic failure
    has happened, however we choose to define "bad news" for that "typical"
    environment?  What >>is<< "bad news"?  Do we have sample data?
    4) If you're a new system administrator and you're just starting to
    integrate machines into a central logging infrastructure, where should you
    start?
    5) What sort of situations do >>not<< create log data for default
    configurations of a particular operating system or application?
    
    We spend a lot of energy worrying about what syslog server application to
    use, how to transport the data, how to archive it, but there are a lot of
    issues bigger even than getting the logs out of the damn originating
    applications and servers.  If we can reach any sort of consensus on
    these issues then we can actually build >>useful<< templates for swatch,
    logsurfer, and the other log parsing tools out there.  And we can work on
    tools that can find deviations from baseline numbers if we can come up
    with a guess for what set of messages define the baseline.
    
    It's hard to tell people to look for "weird things" in their log files
    when we've got absolutely no resources -- other than the logs themselves
    -- to provide that help describe what normal things look like.  Maybe it's
    because I live in California now, but the idea of a "quest for normal"
    really appeals to me ;-)
    
    I suppose it's possible that a couple of the commercial log management
    systems -- NetForensics or Intellitactics -- may already have the answers
    to these questions, but I bet they don't have the visibility into the
    large number and types of networks that we have here.
    
    Over the next couple of days, now that I've finally admitted to working on
    this in public, I will be documenting my first pass at answers to these
    questions, based on my own research and on the data in Counterpane's
    customer base (suitably sanitized, of course).  Please rev up your engines
    for the discussion...and I'll warn the Log Analysis Webmistress about the
    sort of chaos we're likely to be creating.
    
    cheers -- tbird
    
    "Wine is strong, the King is stronger, women are strongest, but TRUTH
              conquers all."
    -----     Inscription in the Rosslyn Chapel (near Edinburgh, Scotland)
    
    http://www.shmoo.com/~tbird
    Log Analysis http://www.counterpane.com/log-analysis.html
    VPN http://vpn.shmoo.com
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Mon Aug 19 2002 - 23:49:42 PDT