On Thu, 22 Aug 2002 16:02:18 -0700, Chris Adams wrote: > On Thursday, August 22, 2002, at 02:02 , Bennett Todd wrote: > >> We need automated creation, distribution and analysis to work > >> across numerous different platforms with products from hundreds of > >> different vendors. > > > > We can express the semantics that we need with a record that's a > > linear list of whitespace-separated tokens on a single text line, > > with some fixed fields aways required, followed by heirarchically > > assigned tokens, first one defining a category ("OS"; "Firewall"; > > "DB"; "Webserver"; ...); then a separate list of tokens for each of > > those categories; the remainder of the record with format defined > > appropriately for each of those; and so forth. > ... > > It's only a huge amount of work for people who have trouble > > splitting text on whitespace. I hope we won't have them working on > > A simple whitespace delimited format is something like tab-delimited > text - raw fields without metadata. What you're describing now is > different: > > - per-record metadata > - support for hierarchal data > - support for adding additional user/vendor-specific fields > > That last one is important - if I use XML, I can add an attribute or tag > anywhere I'd like without causing anything to break - code which isn't > looking for it will simply ignore it. This is much cleaner than having > to tack everything in a large extension jumble at the end where there's > no connection between standard field a and custom field n. But the second one isn't. We need a distinction between data that are required for every message, like timestamps, and data that are specific to the type of message. The ability to nest is unimportant, and that is why XML is overkill. For example, everyone agrees that this is a bad idea: <MESSAGE> <timestamp>0123456789</timestamp> <sourcehost>mail.example.com</sourcehost> ... <ESMTP> <ID>ABCDEFG</ID> ... </ESMTP> </MESSAGE> So instead we have proposals to use XML like this: <MESSAGE timestamp="0123456789" sourcehost="mail.example.com" ...> <ESMTP id="ABCDEFG" ...> Message delivered successfully </ESMTP> </MESSAGE> Notice that in both cases, we have a message, which has its own data and which encapsulates the ESMTP data, but the ESMTP data do not depend upon the message data itself. So concievably, this could work just as well: <LOGDATA internal-identifier="1234" timestamp="0123456789" ...> ... </LOGDATA> <ESMTP internal-identifier="1234" ...> ... </ESMTP> Now, you might rightly point out that this is a dumb scheme which opens up the possibility of malformed collections of tags. But none of the schemes under discussion are designed resist corruption in transit, so this is not a unique disadvantage. And it emphasizes my point that we do not need the nesting capabilities of XML; we're only using its structure for separating classes of data. And furthermore, we'd prefer to avoid "Message delivered successfully" because that's a freeform string, so ideally all the tags would be empty. So let me be productive for a moment and propose an alternate solution: Syslog messages will be single lines of whitespace delimited text. They will be divided into two parts, a universal part which every message will have, and a variable part which will vary depending upon the type of the message. Each part will consist of pairs field="value". field will be an alphanumeric string; it corresponds to the column name in a database. value is a freeform collection of octets. value terminates when it sees a ", unless that " is preceded by a \. \\ produces a literal backslash. Each field="value" pair should have whitespace on either side. As an example, let me take a line from my home computer's /var/log/messages and reformat it. I have deliberately chosen a simple message: Aug 1 01:34:00 klaus ntpd[25529]: time reset -0.352516 s The universal part in this case is the part before the colon. It becomes: timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="ntpd" pid="25529" type="ntp" (Obviously I've not taken the time to convert the date to a value in seconds) The format of the variable part is determined by the type field above; in this case, the variable part will be of type ntp, so a postprocessor would invoke its ntp processing routines, if it has any, or skip the rest of the message. If it does the former, it will see: action="time reset" time="-0.352516" Thus the full message as you would see if you cat'ted the logfile would be: timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="ntpd" pid="25529" type="ntp" action="time reset" time="-0.352516" As an example, I've done this to a few other lines. There are, I expect, far more complex examples than the ones here, and these are very, very rough. I had a little bit of trouble with postfix but mostly because I don't know exactly what it's doing at each step and so don't know the right designation for it. The others were very easy to translate. Aug 5 14:00:03 klaus syslogd: restart timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="syslogd" pid="1234" type="syslog" action="restart" Aug 18 13:05:21 klaus named[18678]: starting BIND 9.2.1 -u named -t /var/named Aug 18 13:05:25 klaus named[18678]: command channel listening on 127.0.0.1#953 Aug 18 13:05:25 klaus named[18678]: command channel listening on 192.168.0.6#953 timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="starting" user="named" chroot="/var/named" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="listening" listenip="127.0.0.1" listenport="953" channeltype="rndc" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="listening" listenip="192.168.0.6" listenport="953" channeltype="rndc" Aug 22 18:13:28 klaus postfix/smtpd[2995]: connect from unknown[206.168.146.149] Aug 22 18:13:29 klaus postfix/smtpd[2995]: 62F72261C: client=unknown[206.168.146.149] Aug 22 18:13:31 klaus postfix/cleanup[28377]: 62F72261C: message-id=<200208222229.IAA13509@caligul a.anu.edu.au> Aug 22 18:13:31 klaus postfix/smtpd[2995]: disconnect from unknown[206.168.146.149] Aug 22 18:13:32 klaus postfix/qmgr[343]: 62F72261C: from=<loganalysis-adminat_private>, size =6824, nrcpt=1 (queue active) Aug 22 18:13:41 klaus postfix/local[156]: 62F72261C: to=<krhat_private>, relay=local, delay=1 2, status=sent ("|/usr/local/bin/procmail") timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="connect" sourcehost="unknown" sourceip="206.168.146.149" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="message" sourcehost="unknown" sourceip="206.168.146.149" esmtpid="62F72261C" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="28377" type="postfix" process="cleanup" action="message" esmtpid="62F72261C" msgid="<200208222229.IAA13509at_private>" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="disconnect" sourcehost="unknown" sourceip="206.168.146.149" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="343" type="postfix" process="qmgr" action="delivering" from="<loganalysis-adminat_private>" size="6824" nrcpt="1" queue="active" timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="156" type="postfix" process="local" action="delivered" esmtpid="62F72261C" TO="<krhat_private>" relay="local" delay="12" status="sent" localdeliveryagent="|/usr/local/bin/procmail" -- Kyle R. Hofmann <krhat_private> _______________________________________________ LogAnalysis mailing list LogAnalysisat_private https://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 23:35:51 PDT