Re: Re[2]: [logs] Logging: World Domination

cadamsat_private

On Friday, August 23, 2002, at 04:29 , Kyle R. Hofmann wrote:
> I agree, that would be nice.  But it occurred to me last night that 
> we've
> all really been unclear on one thing, which is where we use our schemes.
> Let me go on a long digression while I explain what I think we're trying
> to get to.

The way I had in mind was a replacement syslogd which would listen on 
the traditional ports and a new TCP port. This would have some sort of 
output processor (flat file, database, socket, forwarding to another 
syslogd, etc.). Traditional syslog is wrapped in the minimal event 
wrapper ("<event time=... host=... facility=... 
priority=...>message</event>") before being passed to the output 
processor; new events are passed in directly. Implementations might also 
want some sort of output preprocessor to do things like throw out events 
or subfields the admins don't care about.

As far as replacements for syslog(3), I in mind something like 6 - it 
delivers a complete formatted message to the daemon. I consider 
something like 5 (newsyslog() given arbitrary field=values) a special 
case of this - field=value trivially maps into XML, so something like 5 
would be a convenience function so you could simply pass it a bunch of 
field=value pairs in whatever form makes sense for your language 
(something like your code for C, a simple associative array / hash 
everywhere that's supported, etc.) and it'd send the appropriate XML.

We'd want to do things like this to make it trivial to easily log the 
standard error objects from various languages - to really benefit from 
this we need to make it as easy as possible for a programmer to send 
verbose log messages since most programmers aren't going to take the 
time to really use a detailed custom logger. As an example, I'd want 
Java class which could take an Exception, repackage the fields which map 
to something standard and include a serialized copy of the original 
Exception in a Java-specific tag (uses like that are why I want 
nesting - it'd be extremely nice to have that kind of detail available 
for debugging if you want it).

This sort of API is arguably the discussion we should be having now as 
it has some of the answers to both the "What should we be recording?" 
and "How can we get programmers to adopt this new system?". The related 
API for dealing with events once they're received is equally important - 
I don't see much chance of solving the much harder problem of automating 
analysis in the near future so I'd like to minimize the amount of 
scutwork needed to hand-tune analysis code for a given environment.

> Let me pop off the stack and return to what I first said.  So, where 
> exactly
> do you want to use XML, and where exactly do I want to use 
> field="value"?

I want the central log server to receive XML. Once the XML gets there it 
can be translated into anything you want for analysis. IMHO, field=value 
should be a function of your programming environment.

> This is an area where XML seems to have an advantage because of the 
> DTD, but
> I'm not so sure that it's as much of an advantage as it might look like,
> because to allow truly flexible logging, you must let the vendor define 
> his
> own DTD.  This is necessary, in fact, if you're implementing a new 
> protocol
> or service of some sort.  And that lets the vendor get away with the 
> same
> vendor-prefixed tags that you'd prefer he not have.

Vendor support will definitely be a huge hurdle. What I had in mind for 
standard formats was basically some generic events for common services 
and a well-defined vendor extension system. IDMEF seems to be using XML 
namespaces - see 
http://www.silicondefense.com/idwg/draft-ietf-idwg-idmef-xml-06.txt. 
Basically, they have an AdditionalData element which contains arbitrary 
elements with their own namespace - here's the example:

<additionaldata type="xml">
	<test:test xmlns:test="http://www.ietf.org/test.html" 
xmlns="http://www.ietf.org/test.html">
		<test:a test:attr="...">
			...
		</test:a>
		<test:b>
			...
		</test:b>
		<test:c>
			...
		</test:c>
	</test:test>
</additionaldata>

I think we'd do well to copy this model - push strong for vendors to use 
the common fields where possible and toss everything else in something 
that AdditionalData element. This is a good example of why I think 
nesting should be a mandatory requirement - I think it's much cleaner 
than a vendor-prefixed fieldname and it will still be ignored by 
anything which isn't looking for it.

>> The other area where nesting feels more natural is dumping more complex
>> data - things like RPC calls or SSL negotiation:
>
> Yes, and that brings up a really good question: How many log messages 
> are
> complex enough to make XML useful, and how many are not?  Something 
> like an
> NTP time reset is too simple to need XML, but an SSL negotiation is 
> complex.
> Especially pertinent, I think, is the fact that the SSL negotiation may
> involve an arbitrarily long chain of certificates, which XML would 
> handle
> easily, while field="value" would not.
>
> I'd be willing to admit that field="value" is the wrong choice if there 
> are
> a lot of possibly useful log messages that are too complex for it to 
> handle.
> Unfortunately, I don't think that's likely,

I agree that there probably will not be many messages which preclude 
field=value - initially. I see it as a chicken and egg problem - since 
there's no standard way of doing it, programmers either punt the issue 
by not logging anything useful (causing sysadmins everywhere to 
impotently curse them for it) or they roll their own system resulting in 
all of the usual fun with inadequate custom logging systems.

I think the problem is that while enhanced logging would be popular with 
sysadmins it usually isn't seen as important enough to justify the 
development times for a custom implementation. The cost would drop if we 
did some work to provide the higher-level classes and functions which 
make it as easy as possible to log more useful information. If it 
reached the point where we could at least trivially log a language's 
native error structures I think that alone would be enough to tip the 
balance in favor of complex log messages.

> I like the idea of being able to use standard databases, and I'm wary 
> of the ability of XML to handle huge
> amounts of data efficiently, especially for post-processing.

I share this concern - that's one reason why I see XML as the format 
syslog's processor receives rather than the final storage system. I like 
using XML as an interchange language to ease the task of crossing 
vendor / program boundaries - once it reaches your processor it *should* 
be converted into some highly-optimized internal format since your log 
analysis system is the only consumer.

I'd probably have my events ending up in MySQL initially so my processor 
would basically translating the data I care about into a format 
appropriate for my schema and discarding the rest. The XML is just used 
to ensure that I get more granular (and hopefully more verbose) data to 
simplify that translation process.

In a prototype form this could be as simple as having the newsyslog 
start a perl script which would simply read STDIN using XML::Simple and 
generate a SQL INSERT for each event (with noise filtering and some 
database optimization, this could easily hit non-prototype status).

>>> And furthermore, we'd prefer to avoid "Message delivered
>>> successfully" because that's a freeform string, so ideally all the 
>>> tags
>>> would be empty.
>>
>> Presumably we'd define a DTD which would make any tags which could be
>> empty on successful transactions optional.
>
> Not optional, because then successful transactions would never be 
> recorded,
> which we might not want.  Better would be to make them empty, e.g., 
> <TAG/>
> ([XML], 3.1).

I was thinking about optional subtags for things like errors which have 
no meaningful value when no error was encountered. I agree with your 
point however - we'd want to carefully consider which ones were 
considered optional.

Chris

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis