On Friday, August 23, 2002, at 04:29 , Kyle R. Hofmann wrote: > I agree, that would be nice. But it occurred to me last night that > we've > all really been unclear on one thing, which is where we use our schemes. > Let me go on a long digression while I explain what I think we're trying > to get to. The way I had in mind was a replacement syslogd which would listen on the traditional ports and a new TCP port. This would have some sort of output processor (flat file, database, socket, forwarding to another syslogd, etc.). Traditional syslog is wrapped in the minimal event wrapper ("<event time=... host=... facility=... priority=...>message</event>") before being passed to the output processor; new events are passed in directly. Implementations might also want some sort of output preprocessor to do things like throw out events or subfields the admins don't care about. As far as replacements for syslog(3), I in mind something like 6 - it delivers a complete formatted message to the daemon. I consider something like 5 (newsyslog() given arbitrary field=values) a special case of this - field=value trivially maps into XML, so something like 5 would be a convenience function so you could simply pass it a bunch of field=value pairs in whatever form makes sense for your language (something like your code for C, a simple associative array / hash everywhere that's supported, etc.) and it'd send the appropriate XML. We'd want to do things like this to make it trivial to easily log the standard error objects from various languages - to really benefit from this we need to make it as easy as possible for a programmer to send verbose log messages since most programmers aren't going to take the time to really use a detailed custom logger. As an example, I'd want Java class which could take an Exception, repackage the fields which map to something standard and include a serialized copy of the original Exception in a Java-specific tag (uses like that are why I want nesting - it'd be extremely nice to have that kind of detail available for debugging if you want it). This sort of API is arguably the discussion we should be having now as it has some of the answers to both the "What should we be recording?" and "How can we get programmers to adopt this new system?". The related API for dealing with events once they're received is equally important - I don't see much chance of solving the much harder problem of automating analysis in the near future so I'd like to minimize the amount of scutwork needed to hand-tune analysis code for a given environment. > Let me pop off the stack and return to what I first said. So, where > exactly > do you want to use XML, and where exactly do I want to use > field="value"? I want the central log server to receive XML. Once the XML gets there it can be translated into anything you want for analysis. IMHO, field=value should be a function of your programming environment. > This is an area where XML seems to have an advantage because of the > DTD, but > I'm not so sure that it's as much of an advantage as it might look like, > because to allow truly flexible logging, you must let the vendor define > his > own DTD. This is necessary, in fact, if you're implementing a new > protocol > or service of some sort. And that lets the vendor get away with the > same > vendor-prefixed tags that you'd prefer he not have. Vendor support will definitely be a huge hurdle. What I had in mind for standard formats was basically some generic events for common services and a well-defined vendor extension system. IDMEF seems to be using XML namespaces - see http://www.silicondefense.com/idwg/draft-ietf-idwg-idmef-xml-06.txt. Basically, they have an AdditionalData element which contains arbitrary elements with their own namespace - here's the example: <additionaldata type="xml"> <test:test xmlns:test="http://www.ietf.org/test.html" xmlns="http://www.ietf.org/test.html"> <test:a test:attr="..."> ... </test:a> <test:b> ... </test:b> <test:c> ... </test:c> </test:test> </additionaldata> I think we'd do well to copy this model - push strong for vendors to use the common fields where possible and toss everything else in something that AdditionalData element. This is a good example of why I think nesting should be a mandatory requirement - I think it's much cleaner than a vendor-prefixed fieldname and it will still be ignored by anything which isn't looking for it. >> The other area where nesting feels more natural is dumping more complex >> data - things like RPC calls or SSL negotiation: > > Yes, and that brings up a really good question: How many log messages > are > complex enough to make XML useful, and how many are not? Something > like an > NTP time reset is too simple to need XML, but an SSL negotiation is > complex. > Especially pertinent, I think, is the fact that the SSL negotiation may > involve an arbitrarily long chain of certificates, which XML would > handle > easily, while field="value" would not. > > I'd be willing to admit that field="value" is the wrong choice if there > are > a lot of possibly useful log messages that are too complex for it to > handle. > Unfortunately, I don't think that's likely, I agree that there probably will not be many messages which preclude field=value - initially. I see it as a chicken and egg problem - since there's no standard way of doing it, programmers either punt the issue by not logging anything useful (causing sysadmins everywhere to impotently curse them for it) or they roll their own system resulting in all of the usual fun with inadequate custom logging systems. I think the problem is that while enhanced logging would be popular with sysadmins it usually isn't seen as important enough to justify the development times for a custom implementation. The cost would drop if we did some work to provide the higher-level classes and functions which make it as easy as possible to log more useful information. If it reached the point where we could at least trivially log a language's native error structures I think that alone would be enough to tip the balance in favor of complex log messages. > I like the idea of being able to use standard databases, and I'm wary > of the ability of XML to handle huge > amounts of data efficiently, especially for post-processing. I share this concern - that's one reason why I see XML as the format syslog's processor receives rather than the final storage system. I like using XML as an interchange language to ease the task of crossing vendor / program boundaries - once it reaches your processor it *should* be converted into some highly-optimized internal format since your log analysis system is the only consumer. I'd probably have my events ending up in MySQL initially so my processor would basically translating the data I care about into a format appropriate for my schema and discarding the rest. The XML is just used to ensure that I get more granular (and hopefully more verbose) data to simplify that translation process. In a prototype form this could be as simple as having the newsyslog start a perl script which would simply read STDIN using XML::Simple and generate a SQL INSERT for each event (with noise filtering and some database optimization, this could easily hit non-prototype status). >>> And furthermore, we'd prefer to avoid "Message delivered >>> successfully" because that's a freeform string, so ideally all the >>> tags >>> would be empty. >> >> Presumably we'd define a DTD which would make any tags which could be >> empty on successful transactions optional. > > Not optional, because then successful transactions would never be > recorded, > which we might not want. Better would be to make them empty, e.g., > <TAG/> > ([XML], 3.1). I was thinking about optional subtags for things like errors which have no meaningful value when no error was encountered. I agree with your point however - we'd want to carefully consider which ones were considered optional. Chris _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Mon Aug 26 2002 - 02:08:20 PDT