Re: [logs] Logging: World Domination

betat_private

2002-08-20-02:42:49 Tina Bird:
> 1) What sort of state changes "should" applications and operating systems
> log in the first place?  --> A standard for programmers

Perhaps it's my Unix upbringing, but I think it's best to have
adjustable logging levels; certainly, "alert", "info", and "debug"
are reasonable. A program should log an alert (log a descriptive
message at alert priority) when a human really urgently needs to
take a look. "info" is good for logging routine behavior, when
routine actions are something for which it's likely that some sites
would want to do reporting or stats or whatever. And "debug" should
dump enough details to help track down when you've mis-configured
something --- when the gizmo is doing what you told it, rather than
what you wanted it to do.

> 2) Given a particular operating system and/or system purpose (such as a
> UNIX mail server, or a Windows Domain Controller, or whatever), what are
> the (pick your favorite integer) 15 most frequently logged messages in
> the elusive "typical" environment?  What do they mean?  Do we have sample
> data?

Those vary wildly from application to application, and often vary at
least a little from version to version. Mail servers will log lots
of stuff related to handling email, in formats dependant on the
version of MTA you're running. If you're running a server with
packet filtering on, and you tell it to log rejected packets, in
many environments that will dominate the logs.

> 3) Given a particular operating system and/or system purpose, what are
> (pick your favorite integer) 15 messages that pretty much always mean bad
> news: that the system has been compromised, that a catastrophic failure
> has happened, however we choose to define "bad news" for that "typical"
> environment?  What >>is<< "bad news"?  Do we have sample data?

I think for many sites, the best approach is to hand-craft, for each
special-purpose server, a swatchrc with ignore lines for each normal
routine message, and alert lines for anything that doesn't match the
normal stuff. Combine that with daily reporting of summaries of the
routine stuff, suitable for monitoring long-term trends for capacity
planning, and some availability monitoring stuff to catch when the
box keels over altogether and when it gets overloaded, and you've
got a pretty decent grip on what's happening.

> 4) If you're a new system administrator and you're just starting to
> integrate machines into a central logging infrastructure, where should you
> start?

Pick a decent logging protocol. AFAIK, syslog-ng is currently about
as good as we've got. Build a logging box. Logging boxes like to
have plenty of RAM for buffering, and they like to have fast disk
subsystems. Remember, if you want to make use of the log data, you
can't have the system anywhere near saturated; it's gotta have
enough extra bandwidth for you to grep the logs while it continues
to collect more. Thank goodness syslog-ng at least lets you log over
TCP, avoiding the problem UDP-based syslog has of losing lots of
messages when the system load goes up.

There are different schools on log centralization design; I
personally favour treating log data as generic goo, and wedging it
all into a horking big server (praise cthulhu disk is so cheap),
then pulling whatever bits I deem interesting out of that; I find
it comforting for forensic analysis. Do make sure your clocks are
nicely synced (ntp for hard cases, clockspeed where you can use
it), and log everything in UTC nee GMT, dealing with timezones that
stagger and lurch about whenever congress is in session is a pain.

Other folks like to direct different grades of logs to different
places, info to one place, alerts to another, debugging goo never
leaves the original servers.

> 5) What sort of situations do >>not<< create log data for default
> configurations of a particular operating system or application?

I'm not sure what you mean by this question.

> It's hard to tell people to look for "weird things" in their log files
> when we've got absolutely no resources -- other than the logs themselves
> -- to provide that help describe what normal things look like.

I dunno, I don't expect enough regularity from one server's log file
to another, from one platform to another, from one point in time to
another, to have much optimism about universal "normal" logfiles.

> Maybe it's because I live in California now, but the idea of a
> "quest for normal" really appeals to me ;-)

Huh. I guess times have changed. 'Twas a time when that was more of
a right-coast kinda goal, and the folks out on the left coast were
chasing individuality:-).

-Bennett