[logs] Re: What "should" be logged? (long)

betat_private

The light begins to dawn. If I'm not mistaken, the goal here is
to get a grip on automated log analysis; further, the idea is
to find, let us call them "canonicalization targets" to which
we could then normalize various and diverse logs. In some dream
fantasy world we could hope that perhaps eventually maybe some
platforms would directly use these targets, but that's probably out
of scope of any engineering discussion.

At first blush, your list of "operating system" canonical log
entries looks like a great starting point; if it's incomplete in
ways that strike us as cross-platform important, I imagine we'll
note that when we start trying to really canonicalize some log data.

I further suspect that as we run this down, we'll find that on many
platforms the information will have to be gathered from diverse
places, and it'll be incomplete.

Hmm. One question does come to mind at this point: do we want to
define intended purpose(s) for this log data, to give us a scope?
E.g. if we're interested in being able to support performance
monitoring, we'll probably want to define some canonical log record
types for various OS resources, with an expectation that an OS (or a
monitoring daemon) could periodically emit status entries, and could
also use these formats to cry for help when a resource is just about
exhausted. You mentioned filesystem space; other such resources
would include real memory, virtual memory, CPU utilization, network
bandwidth, .... If on the other hand we only care about being able
to set off an SA's pager with something informative, we could
probably confine all this to one record type, OS RESOURCE, with a
free-text message that could be a winge about disk space, or load
average, or whatever.

To really make interesting use of this, we'll want to write one or
more frameworks, at least some of which as portable as possible, for
doing this canonicalization. To really write code, besides needing
the dictionary of log entry types we're going to try and populate
for each platform, we'll also need an agreed concrete representation
of a log entry. And that'll need to be a format that represents
everything we possibly can in nicely universal, easy to parse and
process formats, and at the same time is extensible.

May I propose as a concrete log format.

Each log entry will be a single text line in US ASCII.

Let's use TAI labels for our timestamps, as varying-length hex
strings; if the hex string is 16 characters long, it's TAI64 with
one-second resolution; if it's 24 characters, that's TAI64N with
nanosecond resolution; 32 for TAI64NA for attoseconds. Let's have a
slot for the originating host, fill that with a domain name or IP
addr; then comes the token that defines the facility ("os", "mta",
"firewall", ...), then a token describing the record type from
within that facility, then the remainder of the record containing
zero or more additional tokens fixed for each record type,
optionally ending with one free-format text field.

-Bennett