Re: [logs] Syslog payload format

betat_private

2003-01-13T14:44:11 Ogle Ron (Rennes):
> I know there are some problems with syslog (timestamp and udp),
> but you guys are throwing the baby out with the bath water, and I
> had to say something.  There is nothing in the current syslog that
> prevents me from being precise or ambiguous.  I also understand on
> trying to formalize some higher level constructs, but the price is
> simplicity and ease of use.

I think that's why these are two very separate projects. For those
who are most troubled by the grossest defects of syslog classic,
we're developing SELP, which just fixes the UDP and the broken
timestamp.

For those who are deeply concerned about the lack of good high-level
log analysis, an unavoidable consequence of the lossy nature of
current logging, we're trying to develop a more structured and
informative logging system. The two are being separate; new log
payloads can be carried by old or new transports, and vice versa.
What's more this syslog payload structuring and tagging exercise
would also be happy carried over the new RFC 3195 and following
(syslog-reliable, syslog-secure).

> The lossiness is due to the fact that the developer didn't care
> about giving any more details.

It's aggravated by the lack of a standard specification for the
structured payload. We're undertaking to develop one, concurrently
with an API to make it as easy as possible to emit the structured
data while allowing code to guarantee the correctness of the
structure.

> The point is that you guys are wasting a lot of time if the
> developer doesn't use it or the OS manufacturer doesn't
> incorporate it.

If we can make it good enough, we can incorporate backward-compat
shims to allow old and new data to be easily merged, with the shim
adding as much additional information as can be automatically
appended. We can write translators that read old-style unstructured
log entries for specific applications and convert them to the
best-possible new structure.

We can build log analysis tools that work off the new structure, and
deploy them --- with our new logging system (including
backwards-compat shims) as prerequisite.

When it's deployed, older apps will have less capable analysis than
new ones, simply because there isn't enough data available; if they
want the best possible analysis they'll have to upgrade to the new
logging strategy.

The analysis tools will pull the rest of the world kicking and
screaming along behind. Or this project will be a failure, which is
of course possible. But if we don't try, we'll never know for sure.

> This is the biggest beef.  Tagged or XML doesn't make any difference.

Actually, it does, due to the danger of excess power in the parser.

> You're bloating the data so that a network sniffer can easily read your
> logs.

No, we're providing an extensible framework for structured log data,
enabling application-independant analysis.

> This wastes a lot of resources on the client.

This is unavoidable if you want a better grade of analysis.

> Get the data out of the client as efficiently as possible.

All of it. For all apps.

Unless you're going to require a knowlege base in the server that
grows linearly with the number of apps (rather than with the number
of distinct kinds of apps), you have to encode the structure in the
data, rather than leaving it implicit for the server to deduce.

> With this tagging/xml, you are putting a layer of complexity that
> will kill it.

Could be. Could be that application-independant analyzable
structured logfiles are impossible today. We'll find out. I sorta
hope you're wrong.

But anyway, since we're separating them, we'll at least have SELP as
a small improvement for today:-).

> Every language will have their own tags to describe the same thing.

That's exactly what we're planning on avoiding. If we can't get a
consistent lexicon of tags, where the same concept is always tagged
with the same tag regardless of application, than this will be a
complete failure.

> It's an amazing sight to see that no matter which country a tool
> comes from that does network analysis of an IP packet, it can
> still show you IP addresses, ports, and flags.  Of course, my tool
> of choice always puts it in the language that I'm most comfortable
> with using.

Oh, you're talking about _human_ languages! That's the easiest one
of all. Speak english.

If we should decide that we want to internationalize the lexicon,
that's easy, too. Folks do the equivalent job all the time, tools
like gettext help in maintaining implementations.

The hard part isn't the language the words are in (or languages, we
can have a set of possible expressions for each tag), it's ensuring
that the tag concepts are distinct. If we _know_ all possible
language expressions for a tag, we can continue to have a portable
analysis system.

-Bennett