Re: [logs] Syslog payload format

wcolburnat_private

I want to chime in with a vote, and an example, of why exposing the
internal data structures might be an acceptable idea.

Lets suppose that I have a program that is going to do logging.  It is a
big program, and to simplify the logging, different parts of the program
are going to log differently.  For simplicity, I want to be able to
create logging instances quickly and easily, so I have a custom logger
that takes two arguments for the "core" information, plus some other
arguements for the data to be logged.  The code could look like this:

static logcontrol_t loginit = {
  LOGCONTROL_INIT,           /* this structure has never been used */
  "Initialization Errors",   /* human readable name for the structure */
  LOGCONTROL_PRIORITY_ERROR, /* default priority */
  LOGCONTROL_CHANNEL_INIT,   /* predefined "initialization" channel */
  LOGCONTROL_SOME_FLAG,	     /* set a flag, such as blocking, or encryption */
};

logcontrol_t logruntime = {
  LOGCONTROL_INIT,           /* this structure has never been used */
  "Runtime Errors",          /* human readable name for the structure */
  LOGCONTROL_PRIORITY_ERROR, /* default priority */
  "Random Channel",          /* user defined channel */
  LOGCONTROL_NO_FLAGS,	     /* no flags are set here */
};

So now, when I call the syslogreplacement() function, I have a choice of
two "styles" of log message to generate.  One, loginit, is only
available in a certain module of the program, but the other is global
and can be called from anywhere.

-----init.c-----
init()
{
  syslogreplacement(loginit,<something to syslog>);
  syslogreplacement(logruntime,<another something to syslog>);
}
-----end-----

-----main.c-----
main()
{
  init();
  syslogreplacement(logruntime,<something else to syslog>);
}
-----end-----

So now, from init.c I can easily syslog in two different ways, but only
one way from main.c.

The next problem, is what if I want to override something?  The only
thing I can think of to override is the priority, so it can be defined
as a special number and passed as an arguemnt to a slightly different
function.

syslogreplacement2(logruntime,LOGOVERRIDE_CRITICAL, <another something else>);

The benefits of this system are that I can create a new "log" instance
on the fly, even as a locally scoped variable in a for() loop.

for (i=0;i<MAXINT;i=nextprime(i))
{
  logcontrol_t l={LOGCONTROL_INIT,
                  "prime for loop",
                  LOGCONTROL_PRIORITY_ERROR,
                  LOGCONTROL_CHANNEL_KERNEL,
		  LOGCONTROL_FLAG_EPHERMAL}; /* don't cache an fd in me */

  syslogreplacement(l,<something to log>);
}

By having all the "guts" in a place that the programmer can get to them,
he is encouraged to use them.  We should learn from the mistakes of
syslog() and its hardcoded "facility", by letting facilities be
arbitrary stings, and providing lots of predefined (and REGISTERED to
prevent collisions) facilities.

One option I didn't complicate things with was the idea that we could
have a "facilitiy" and a "sub-facility":

logcontrol_t log={
LOGCONTROL_INIT,
  "Yendor Lives!",
  LOGCONTROL_PRIORITY_ERROR,
  LOGCONTROL_CHANNEL_NETHACK,
  "pet movement",               /* a finer granularity of the NETHACK   log */
  LOGCONTROL_FLAG_BLOCKING|LOGCONTROL_FLAG_TCP
};

There are lots more policy items that could go into logcontrol_t as
well, such as automatic transmission of the hostname, uid, gid, local
time, mac address, CPUid, current PID, i-ching hexagrams, etc.

A downside, of course, is that if everyone can stick random things into
the "facility" and "sub-facility" field, that we could have a
proliferation of them.  Or is that a downside?  If the logging system
was designed from the beginning with the idea that this will happen,
then maybe it would cope just fine.

Ok, so now I theororized what I think is a pretty cool logging
paradigm.  What about the actual data?  I want data that I can easily
program, and not have to muck with.  Making an object that has to be
created, filled, and then passed is too annoying.  Instead, we have
string keys and string datas (kinda like gnudb).

syslogreplacement(log,LOGKEY_USERNANE,"root",LOGKEY_TEXT,"is a dork");

Since keys are just little strings, we can make them up as we goo.

syslogreplacement(log,"koo-koo","kachoo");

Now I have free text!  But a terrible varargs problems.  The last
argument of the call could be a special token.

syslogreplacement(log,"varargs","sucks",LOGKEY_TERM);

I admit, that isn't pretty, but assuming I don't make a mistake I have a
pretty spiffy log system.  Things are passed around as strings, and the
underlying structure doesn't care much what they are.  I can create
channels, and facilities, and datatypes on the fly without worrying
whether or not the implementors planned for them to exist.  As long as
the final logfile manages to preserve the key and data (and protect
against quoting problems) then my data has structure, even though it is
just text.  If the analysis software wants to analyze the logs, it needs
to know what the keys mean, but with a registered and populate default
listing of keys, plus a design paradigm that expects them to be added
freely, it shouldn't be a problem.

On Thu, Dec 19, 2002 at 01:32:20PM -0500, Marcus J. Ranum wrote:
> Darren Reed wrote:
> >initlogging(name,options);
> >logitems[0].type = STRING;
> >logitems[0].value = "marcus login: from";
> >logitems[1].type = HOSTNAME;
> >logitems[1].value = where;
> >addlogmessage(logtype,priority,logitems,2);
> 
> This API has problems - mostly because it's exposing
> the internal data structure to programmers who will
> either get it wrong or mess with it. Thus it'd be
> impossible to change the structure in the future. For
> all that the API I was suggesting was butt-ugly, you
> could replace it completely without changing user-land
> code since it's all done through calls rather than
> direct assignments.
> 
> >Maybe this is good, maybe it's bad, but it gets away from
> >varargs and is hopefully clear about relationship between type and
> >object data.
> 
> Typing log data's a problem I think it's best to ignore.
> Systems aren't going to always have the best information
> and if they can't type it right we need to give them a
> chance to send something else - whatever they have. Which
> means that a lot of this stuff is going to get promoted
> to strings eventually. So you may as well just make it
> official and treat everything as string data since that's
> where it'll wind up. How do you deal with a machine address
> that is variously "amnesiac" 127.0.0.1 "127.0.0.1" and
> "burfle.ranum.com" (not really in DNS) and "www.ranum.com"
> (is in DNS)
> 
> Must keep it simple and stupid or it'll be ASN.1 before
> we know what hit us..
> 
> mjr. 
> ---
> Marcus J. Ranum				http://www.ranum.com
> Computer and Communications Security	mjrat_private
> 
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysisat_private
> http://lists.shmoo.com/mailman/listinfo/loganalysis

--
William Colburn, "Sysprog" <wcolburnat_private>
Computer Center, New Mexico Institute of Mining and Technology
http://www.nmt.edu/tcc/     http://www.nmt.edu/~wcolburn
_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis