[logs] What "should" be logged? (long)

tbird@precision-guesswork.com

Good morning, Bennett,

I think that some of the definitional difficulties we're having will be
made clearer by my simply jumping into the fray and filling out my
thoughts from last night, so I'm going to talk about issue #1, the sets of
changes that applications and operating systems "ought" to log.

On Tue, 20 Aug 2002, Bennett Todd wrote:

> 2002-08-20-02:42:49 Tina Bird:
> > 1) What sort of state changes "should" applications and operating systems
> > log in the first place?  --> A standard for programmers
>
> Perhaps it's my Unix upbringing, but I think it's best to have
> adjustable logging levels; certainly, "alert", "info", and "debug"
> are reasonable. A program should log an alert (log a descriptive
> message at alert priority) when a human really urgently needs to
> take a look. "info" is good for logging routine behavior, when
> routine actions are something for which it's likely that some sites
> would want to do reporting or stats or whatever. And "debug" should
> dump enough details to help track down when you've mis-configured
> something --- when the gizmo is doing what you told it, rather than
> what you wanted it to do.
>
I'm not arguing with this, but this answer doesn't particularly address
the question I was trying to ask.  This is more a set of directives on how to
prioritize the messages once they've been generated.  But I believe
there's a set of generic-enough administrative and error conditions that
can be defined to provide guidelines for developers who are curious about
what they ought to do.  Once that list exists, developers and
administrators can customize the severities to their own environment.

So what's on the list?  Here's my start for operating systems, with notes
and queries -- bearing in mind that one of the tasks at hand is to
generate these conditions on everyone's favorte systems so we can do
things like answer question #5 (what >doesn't< get logged) in an orderly
fashion:

- System startup: are there multiple run levels?  If so, system
should
record which level is starting in some way that a human can make sense of
it
- System shutdown: are there multiple modes of shutdown?  Does the system
have any capacity to send "oh my god i'm going down" messages in the case
of an emergency crash or power loss?  Are there distinctions between
normal and abnormal shutdowns that can be differentiated in the logs?
- File system full: including thresholds (default or user defined) -- boy
wouldn't it be nice if the logs "automagically" included the three (or
however many) biggest culprits in terms of file size or space consumed by
a directory or folder in an error message?
- Hardware failures: power supplies, network interfaces, etc.  I am
relatively uneducated about hardware diagnostics, other than Cisco gear...
- Logins: failed and successful; console, remote (what protocol if
remote); anonymous account, unprivileged user account, privileged user
account, including switches to other users (unprivileged, privileged) from
user accounts
- Account creation: failed and successful; adding new user ID, assigning
rights and privileges to new user, adding password to new user
- Account modification: failed and successful; assigning or removing
rights and privileges, resetting password; privileged user or unprivileged
user
- Account removal: failed and successful
- Account disabled: too many failed logins, account expired, etc.
- Password/security information copied: failed and successful
- System configuration change: failed and successful; including access
control, network addressing, audit policy; who made change, what changed,
from system kernel on out to user-level applications
- Operating system patch applied: who applied patch, what system
components changed, source of patch (?)
- Network connections: failed and successful connection attempts;
anonymous service, user-specific service, access to administrative tools
or control connection; DNS zone transfers, etc.
- Audit logs: failed and successful attempts to modify or clear audit logs
- Object access: failed and successful attempts to read files, start or
stop processes, etc (understanding that most organizations will not need
or want this level of detail)

*whew*

I'm sure I've left things out, and I'm sure this can be sorted into a less
intimidating list of message categories.  But in addition to worrying
about format and how to handle the expected flow of data and how to
protect audit data traveling across networks, we need to worry about what
we expect to see.

With regard to Marcus' post, this list represents my
(not-sufficiently-caffeinated) first stab at a set of messages that could
have standardized tokens across a variety of operating system platforms.

In addition to these conditions, specific applications "should" record
explicit errors when they fail to start due to misconfiguration (syslogd,
anyone?); messages when they receive incorrect or unexpected input (yes, I
know that in order to do this the programmer has to manage to detect
incorrect or unexpected input, which is what creates buffer overflows in
the first place, but since this is tbird daydreaming I'm allowed).

What I'll do whilst everyone is discussing this list -- and fixing it ;-)
-- is to start collecting samples of these messages from the data I've got
and the machines in my lab.  And getting it on the Web site.

Go to!  Bennett, does this clarify what I was getting at?

tbird

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
https://lists.shmoo.com/mailman/listinfo/loganalysis