The light begins to dawn. If I'm not mistaken, the goal here is to get a grip on automated log analysis; further, the idea is to find, let us call them "canonicalization targets" to which we could then normalize various and diverse logs. In some dream fantasy world we could hope that perhaps eventually maybe some platforms would directly use these targets, but that's probably out of scope of any engineering discussion. At first blush, your list of "operating system" canonical log entries looks like a great starting point; if it's incomplete in ways that strike us as cross-platform important, I imagine we'll note that when we start trying to really canonicalize some log data. I further suspect that as we run this down, we'll find that on many platforms the information will have to be gathered from diverse places, and it'll be incomplete. Hmm. One question does come to mind at this point: do we want to define intended purpose(s) for this log data, to give us a scope? E.g. if we're interested in being able to support performance monitoring, we'll probably want to define some canonical log record types for various OS resources, with an expectation that an OS (or a monitoring daemon) could periodically emit status entries, and could also use these formats to cry for help when a resource is just about exhausted. You mentioned filesystem space; other such resources would include real memory, virtual memory, CPU utilization, network bandwidth, .... If on the other hand we only care about being able to set off an SA's pager with something informative, we could probably confine all this to one record type, OS RESOURCE, with a free-text message that could be a winge about disk space, or load average, or whatever. To really make interesting use of this, we'll want to write one or more frameworks, at least some of which as portable as possible, for doing this canonicalization. To really write code, besides needing the dictionary of log entry types we're going to try and populate for each platform, we'll also need an agreed concrete representation of a log entry. And that'll need to be a format that represents everything we possibly can in nicely universal, easy to parse and process formats, and at the same time is extensible. May I propose as a concrete log format. Each log entry will be a single text line in US ASCII. Let's use TAI labels for our timestamps, as varying-length hex strings; if the hex string is 16 characters long, it's TAI64 with one-second resolution; if it's 24 characters, that's TAI64N with nanosecond resolution; 32 for TAI64NA for attoseconds. Let's have a slot for the originating host, fill that with a domain name or IP addr; then comes the token that defines the facility ("os", "mta", "firewall", ...), then a token describing the record type from within that facility, then the remainder of the record containing zero or more additional tokens fixed for each record type, optionally ending with one free-format text field. -Bennett
This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 11:27:22 PDT