Re: Re[2]: [logs] Logging: World Domination

From: Kyle R. Hofmann (krhat_private)
Date: Thu Aug 22 2002 - 21:39:33 PDT

  • Next message: Chris Adams: "Re: Re[2]: [logs] Logging: World Domination"

    On Thu, 22 Aug 2002 16:02:18 -0700, Chris Adams wrote:
    > On Thursday, August 22, 2002, at 02:02 , Bennett Todd wrote:
    > >> We need automated creation, distribution and analysis to work
    > >> across numerous different platforms with products from hundreds of
    > >> different vendors.
    > >
    > > We can express the semantics that we need with a record that's a
    > > linear list of whitespace-separated tokens on a single text line,
    > > with some fixed fields aways required, followed by heirarchically
    > > assigned tokens, first one defining a category ("OS"; "Firewall";
    > > "DB"; "Webserver"; ...); then a separate list of tokens for each of
    > > those categories; the remainder of the record with format defined
    > > appropriately for each of those; and so forth.
    > ...
    > > It's only a huge amount of work for people who have trouble
    > > splitting text on whitespace. I hope we won't have them working on
    > 
    > A simple whitespace delimited format is something like tab-delimited 
    > text - raw fields without metadata. What you're describing now is 
    > different:
    > 
    > - per-record metadata
    > - support for hierarchal data
    > - support for adding additional user/vendor-specific fields
    > 
    > That last one is important - if I use XML, I can add an attribute or tag 
    > anywhere I'd like without causing anything to break - code which isn't 
    > looking for it will simply ignore it. This is much cleaner than having 
    > to tack everything in a large extension jumble at the end where there's 
    > no connection between standard field a and custom field n.
    
    But the second one isn't.  We need a distinction between data that are
    required for every message, like timestamps, and data that are specific to
    the type of message.  The ability to nest is unimportant, and that is why XML
    is overkill.
    
    For example, everyone agrees that this is a bad idea:
    
    <MESSAGE>
      <timestamp>0123456789</timestamp>
      <sourcehost>mail.example.com</sourcehost>
    
      ...
    
      <ESMTP>
        <ID>ABCDEFG</ID>
        ...
      </ESMTP>
    </MESSAGE>
    
    So instead we have proposals to use XML like this:
    
    <MESSAGE timestamp="0123456789" sourcehost="mail.example.com" ...>
      <ESMTP id="ABCDEFG" ...>
        Message delivered successfully
      </ESMTP>
    </MESSAGE>
    
    Notice that in both cases, we have a message, which has its own data and which
    encapsulates the ESMTP data, but the ESMTP data do not depend upon the message
    data itself.  So concievably, this could work just as well:
    
    <LOGDATA internal-identifier="1234" timestamp="0123456789" ...>
      ...
    </LOGDATA>
    <ESMTP internal-identifier="1234" ...>
      ...
    </ESMTP>
    
    Now, you might rightly point out that this is a dumb scheme which opens up
    the possibility of malformed collections of tags.  But none of the schemes
    under discussion are designed resist corruption in transit, so this is not a
    unique disadvantage.  And it emphasizes my point that we do not need the
    nesting capabilities of XML; we're only using its structure for separating
    classes of data.  And furthermore, we'd prefer to avoid "Message delivered
    successfully" because that's a freeform string, so ideally all the tags
    would be empty.
    
    So let me be productive for a moment and propose an alternate solution: Syslog
    messages will be single lines of whitespace delimited text.  They will be
    divided into two parts, a universal part which every message will have, and
    a variable part which will vary depending upon the type of the message.
    Each part will consist of pairs field="value".  field will be an alphanumeric
    string; it corresponds to the column name in a database.  value is a freeform
    collection of octets.  value terminates when it sees a ", unless that " is
    preceded by a \.  \\ produces a literal backslash.  Each field="value" pair
    should have whitespace on either side.
    
    As an example, let me take a line from my home computer's /var/log/messages
    and reformat it.  I have deliberately chosen a simple message:
    
    Aug  1 01:34:00 klaus ntpd[25529]: time reset -0.352516 s
    
    The universal part in this case is the part before the colon.  It becomes:
    
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="ntpd" pid="25529" type="ntp"
    
    (Obviously I've not taken the time to convert the date to a value in seconds)
    The format of the variable part is determined by the type field above; in
    this case, the variable part will be of type ntp, so a postprocessor would
    invoke its ntp processing routines, if it has any, or skip the rest of the
    message.  If it does the former, it will see:
    
    action="time reset" time="-0.352516"
    
    Thus the full message as you would see if you cat'ted the logfile would be:
    
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="ntpd" pid="25529" type="ntp" action="time reset" time="-0.352516"
    
    As an example, I've done this to a few other lines.  There are, I expect, far
    more complex examples than the ones here, and these are very, very rough.  I
    had a little bit of trouble with postfix but mostly because I don't know
    exactly what it's doing at each step and so don't know the right designation
    for it.  The others were very easy to translate.
    
    Aug  5 14:00:03 klaus syslogd: restart
    
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="syslogd" pid="1234" type="syslog" action="restart"
    
    Aug 18 13:05:21 klaus named[18678]: starting BIND 9.2.1 -u named -t /var/named
    Aug 18 13:05:25 klaus named[18678]: command channel listening on 127.0.0.1#953
    Aug 18 13:05:25 klaus named[18678]: command channel listening on 192.168.0.6#953
    
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="starting" user="named" chroot="/var/named"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="listening" listenip="127.0.0.1" listenport="953" channeltype="rndc"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="named" pid="18678" type="named" action="listening" listenip="192.168.0.6" listenport="953" channeltype="rndc"
    
    Aug 22 18:13:28 klaus postfix/smtpd[2995]: connect from unknown[206.168.146.149]
    Aug 22 18:13:29 klaus postfix/smtpd[2995]: 62F72261C: client=unknown[206.168.146.149]
    Aug 22 18:13:31 klaus postfix/cleanup[28377]: 62F72261C: message-id=<200208222229.IAA13509@caligul
    a.anu.edu.au>
    Aug 22 18:13:31 klaus postfix/smtpd[2995]: disconnect from unknown[206.168.146.149]
    Aug 22 18:13:32 klaus postfix/qmgr[343]: 62F72261C: from=<loganalysis-adminat_private>, size
    =6824, nrcpt=1 (queue active)
    Aug 22 18:13:41 klaus postfix/local[156]: 62F72261C: to=<krhat_private>, relay=local, delay=1
    2, status=sent ("|/usr/local/bin/procmail")
    
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="connect" sourcehost="unknown" sourceip="206.168.146.149"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="message" sourcehost="unknown" sourceip="206.168.146.149" esmtpid="62F72261C"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="28377" type="postfix" process="cleanup" action="message" esmtpid="62F72261C" msgid="<200208222229.IAA13509at_private>"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="2995" type="postfix" process="smtpd" action="disconnect" sourcehost="unknown" sourceip="206.168.146.149"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="343" type="postfix" process="qmgr" action="delivering" from="<loganalysis-adminat_private>" size="6824" nrcpt="1" queue="active"
    timestamp="1234567890" sourcehost="klaus.lemniscate.net" program="postfix" pid="156" type="postfix" process="local" action="delivered" esmtpid="62F72261C" TO="<krhat_private>" relay="local" delay="12" status="sent" localdeliveryagent="|/usr/local/bin/procmail"
    
    -- 
    Kyle R. Hofmann <krhat_private>
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 23:35:51 PDT