Re: Re[2]: [logs] Logging: World Domination

From: Chris Adams (cadamsat_private)
Date: Sun Aug 25 2002 - 22:40:54 PDT

  • Next message: Chris Adams: "Re: [logs] tokens and layouts..."

    On Friday, August 23, 2002, at 04:29 , Kyle R. Hofmann wrote:
    > I agree, that would be nice.  But it occurred to me last night that 
    > we've
    > all really been unclear on one thing, which is where we use our schemes.
    > Let me go on a long digression while I explain what I think we're trying
    > to get to.
    
    The way I had in mind was a replacement syslogd which would listen on 
    the traditional ports and a new TCP port. This would have some sort of 
    output processor (flat file, database, socket, forwarding to another 
    syslogd, etc.). Traditional syslog is wrapped in the minimal event 
    wrapper ("<event time=... host=... facility=... 
    priority=...>message</event>") before being passed to the output 
    processor; new events are passed in directly. Implementations might also 
    want some sort of output preprocessor to do things like throw out events 
    or subfields the admins don't care about.
    
    As far as replacements for syslog(3), I in mind something like 6 - it 
    delivers a complete formatted message to the daemon. I consider 
    something like 5 (newsyslog() given arbitrary field=values) a special 
    case of this - field=value trivially maps into XML, so something like 5 
    would be a convenience function so you could simply pass it a bunch of 
    field=value pairs in whatever form makes sense for your language 
    (something like your code for C, a simple associative array / hash 
    everywhere that's supported, etc.) and it'd send the appropriate XML.
    
    We'd want to do things like this to make it trivial to easily log the 
    standard error objects from various languages - to really benefit from 
    this we need to make it as easy as possible for a programmer to send 
    verbose log messages since most programmers aren't going to take the 
    time to really use a detailed custom logger. As an example, I'd want 
    Java class which could take an Exception, repackage the fields which map 
    to something standard and include a serialized copy of the original 
    Exception in a Java-specific tag (uses like that are why I want 
    nesting - it'd be extremely nice to have that kind of detail available 
    for debugging if you want it).
    
    This sort of API is arguably the discussion we should be having now as 
    it has some of the answers to both the "What should we be recording?" 
    and "How can we get programmers to adopt this new system?". The related 
    API for dealing with events once they're received is equally important - 
    I don't see much chance of solving the much harder problem of automating 
    analysis in the near future so I'd like to minimize the amount of 
    scutwork needed to hand-tune analysis code for a given environment.
    
    > Let me pop off the stack and return to what I first said.  So, where 
    > exactly
    > do you want to use XML, and where exactly do I want to use 
    > field="value"?
    
    I want the central log server to receive XML. Once the XML gets there it 
    can be translated into anything you want for analysis. IMHO, field=value 
    should be a function of your programming environment.
    
    > This is an area where XML seems to have an advantage because of the 
    > DTD, but
    > I'm not so sure that it's as much of an advantage as it might look like,
    > because to allow truly flexible logging, you must let the vendor define 
    > his
    > own DTD.  This is necessary, in fact, if you're implementing a new 
    > protocol
    > or service of some sort.  And that lets the vendor get away with the 
    > same
    > vendor-prefixed tags that you'd prefer he not have.
    
    Vendor support will definitely be a huge hurdle. What I had in mind for 
    standard formats was basically some generic events for common services 
    and a well-defined vendor extension system. IDMEF seems to be using XML 
    namespaces - see 
    http://www.silicondefense.com/idwg/draft-ietf-idwg-idmef-xml-06.txt. 
    Basically, they have an AdditionalData element which contains arbitrary 
    elements with their own namespace - here's the example:
    
    <additionaldata type="xml">
    	<test:test xmlns:test="http://www.ietf.org/test.html" 
    xmlns="http://www.ietf.org/test.html">
    		<test:a test:attr="...">
    			...
    		</test:a>
    		<test:b>
    			...
    		</test:b>
    		<test:c>
    			...
    		</test:c>
    	</test:test>
    </additionaldata>
    
    I think we'd do well to copy this model - push strong for vendors to use 
    the common fields where possible and toss everything else in something 
    that AdditionalData element. This is a good example of why I think 
    nesting should be a mandatory requirement - I think it's much cleaner 
    than a vendor-prefixed fieldname and it will still be ignored by 
    anything which isn't looking for it.
    
    >> The other area where nesting feels more natural is dumping more complex
    >> data - things like RPC calls or SSL negotiation:
    >
    > Yes, and that brings up a really good question: How many log messages 
    > are
    > complex enough to make XML useful, and how many are not?  Something 
    > like an
    > NTP time reset is too simple to need XML, but an SSL negotiation is 
    > complex.
    > Especially pertinent, I think, is the fact that the SSL negotiation may
    > involve an arbitrarily long chain of certificates, which XML would 
    > handle
    > easily, while field="value" would not.
    >
    > I'd be willing to admit that field="value" is the wrong choice if there 
    > are
    > a lot of possibly useful log messages that are too complex for it to 
    > handle.
    > Unfortunately, I don't think that's likely,
    
    I agree that there probably will not be many messages which preclude 
    field=value - initially. I see it as a chicken and egg problem - since 
    there's no standard way of doing it, programmers either punt the issue 
    by not logging anything useful (causing sysadmins everywhere to 
    impotently curse them for it) or they roll their own system resulting in 
    all of the usual fun with inadequate custom logging systems.
    
    I think the problem is that while enhanced logging would be popular with 
    sysadmins it usually isn't seen as important enough to justify the 
    development times for a custom implementation. The cost would drop if we 
    did some work to provide the higher-level classes and functions which 
    make it as easy as possible to log more useful information. If it 
    reached the point where we could at least trivially log a language's 
    native error structures I think that alone would be enough to tip the 
    balance in favor of complex log messages.
    
    > I like the idea of being able to use standard databases, and I'm wary 
    > of the ability of XML to handle huge
    > amounts of data efficiently, especially for post-processing.
    
    I share this concern - that's one reason why I see XML as the format 
    syslog's processor receives rather than the final storage system. I like 
    using XML as an interchange language to ease the task of crossing 
    vendor / program boundaries - once it reaches your processor it *should* 
    be converted into some highly-optimized internal format since your log 
    analysis system is the only consumer.
    
    I'd probably have my events ending up in MySQL initially so my processor 
    would basically translating the data I care about into a format 
    appropriate for my schema and discarding the rest. The XML is just used 
    to ensure that I get more granular (and hopefully more verbose) data to 
    simplify that translation process.
    
    In a prototype form this could be as simple as having the newsyslog 
    start a perl script which would simply read STDIN using XML::Simple and 
    generate a SQL INSERT for each event (with noise filtering and some 
    database optimization, this could easily hit non-prototype status).
    
    >>> And furthermore, we'd prefer to avoid "Message delivered
    >>> successfully" because that's a freeform string, so ideally all the 
    >>> tags
    >>> would be empty.
    >>
    >> Presumably we'd define a DTD which would make any tags which could be
    >> empty on successful transactions optional.
    >
    > Not optional, because then successful transactions would never be 
    > recorded,
    > which we might not want.  Better would be to make them empty, e.g., 
    > <TAG/>
    > ([XML], 3.1).
    
    I was thinking about optional subtags for things like errors which have 
    no meaningful value when no error was encountered. I agree with your 
    point however - we'd want to carefully consider which ones were 
    considered optional.
    
    Chris
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Mon Aug 26 2002 - 02:08:20 PDT