[logs] What's normal?

From: Tina Bird (tbird@precision-guesswork.com)
Date: Tue Aug 20 2002 - 07:35:00 PDT

  • Next message: Darren Reed: "Re: [logs] Logging: World Domination"

    I am gleefully being extremely particular about the bits of people's
    messages to which I respond...
    
    On Tue, 20 Aug 2002, Paul Ebersman wrote:
    
    >
    > tbird> 2) Given a particular operating system and/or system purpose
    > tbird> (such as a UNIX mail server, or a Windows Domain Controller, or
    > tbird> whatever), what are the (pick your favorite integer) 15 most
    > tbird> frequently logged messages in the elusive "typical"
    > tbird> environment?  What do they mean?  Do we have sample data?
    >
    > I think the key problem is that in order to do this, you have to
    > define "standard" platform/usage profiles. Most of us can't even get
    > that for the company we're working for at the time, much less across
    > the Internet. B^)
    >
    Yeah, yeah, yeah, and for decades, system administrators have talked
    themselves out of doing the experiment with arguments exactly this way.
    
    Look, everyone, presumably at least some of us have access to log data on
    a "live server."  I posit that we'll learn really really interesting
    things by taking a day's or a week's worth of data and looking at the
    messages.  We're not building a standard here, I'm after quick and dirty.
    Guidance for someone just starting out.
    
    So if everyone gets their logs into a text format (for those of you who
    aren't on UNIX boxen) and does something like:
    
    cat /your/log/files* \
    |sed -e "/^... ...........$HOSTNAME //" -e "s/\[[0-9]]*\]:/:/" \
    |sort |uniq -c |sort -nr > uniq.sorted.freq
    
    we'll get actually observational data on what shows up on production
    machines.
    
    I'm not claiming it will be the same for everyone.  I'm claiming it will
    teach us something, and that by providing that kind of "here's what shows
    up" view of things we'll make it easier for the newbies.
    
    My hope is that by providing this sort of information we'll make it easier
    for people to get up to speed on what is and is not typical >>for them<<.
    
    
    > We could probably come up with some guesses. I've always been a big
    > believer that I'd rather have an 80% solution than wait two years for
    > a 95% solution but I'm not sure we'd be anywhere near that accurate
    > across all reasonable configurations. I'd also be willing to bet that
    > in some cases, guessing wrong would be worse than doing nothing.
    >
    I don't think this is likely -- at least if someone guesses wrong about
    what normal is, they're presumably using that information to further
    understand their data, which means they're looking at their data and
    therefore a little likelier to notice something evil happening.  Doing
    nothing means continuing to ignore it.
    
    > I'll posit the next straw man to torch: what would be useful is a
    > standardized methodology for how to turn two weeks of verbose logging
    > into a template against which to compare "normal", "abnormal" and
    > "catastrophic" at your particular site/application. This does assume
    > your first point of somewhat standardized logging being available on
    > all critical OSs and Apps.
    
    Remember that by "standardized logging" I'm not >>even<< worrying about
    log formats or severities --just message categories.  Taking yet another
    step back.
    
    This strawman is certainly one of the goals.
    
    t.
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 07:43:46 PDT