[logs] Re: regex-less parsing of messages

From: Solomon, Frank (frank@private)
Date: Tue Dec 06 2005 - 05:45:34 PST


Jason, your example certainly struck a chord.  We haven't even begun to
put our mail logs into our central log server because of the technical
challenges that would pose.  And yet, we get asked the same sort of
questions which require a highly trained person to probe through the
heterogeneous mail log files and trace the path of some errant envelope
that may or may not actually exist.  It is not pretty; part of the price
we pay for having to accommodate multiple mail systems, vendors and
standards.

Our standing joke is:  "That's the nice thing about standards, there are
so many to choose from and everyone can have their own."  So, "sendmail"
has its "standard" log format and "Exchange" has its "standard" log
format, and "Novell" has its "standard" log format, etc.  I saw an
article recently describing the new "logging standard" that Microsoft
was about to introduce in their latest OS.  Well that will certainly
clear things up!  I'm sure all their competitors will rush to implement
compatible systems.  Don't get me wrong, I laud Microsoft's attempt to
enforce programmer discipline.

In case you're interested in the MS stuff:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wes/wes
/about_the_windows_event_log.asp

<dreaming>
Certainly, the first challenge in being able to analyze data is getting
it into a common format with a common symbolic representation of the
underlying information.  Since we cannot count on the energy and
discipline of the programmers that write the log-generating programs,
that energy must be invested in and discipline must be enforced by the
log collection mechanism.  It's becoming obvious to me that the blanket
approach of collecting everything on the off chance that some auditor or
forensic specialist in the future might be able to make sense of it, is
a waste of resources.  That implies that the requirements for what needs
to be logged could be set at the collecting end and that somehow those
requirements need to be communicated to the source of the messages to
make sure that the required messages exist and are coded appropriately
(which they won't be).
</dreaming>

I know, I'm dreaming: there's no choice but to continue to collect tons
of ore and hope to glean an ounce of silver from it every once in a
while.  And besides, those old log CD's make nifty tree ornaments.

John Moehrke mentioned that his organization was making the attempt to
define the standards for the events at the beginning.  To quote:  "We
thus will be sending the experts in log analysis an already manageable
format."  That's a great idea, but it suffers from the same standards
problem I've mentioned:  everybody's likely to have their own (maybe
someday the only industry will be healthcare, but not yet).  And after
looking at the RFC, I can't imagine that good things will come of the
burden this will place on the infrastructure if the logging rate is very
high.  Can you imagine the "sendmail" guys wrapping xml around the mail
logs?  Or, all the mail system vendors agreeing on a common xml schema
for their mail logs?  Yeah, it might happen.

Personally, I'm glad that syslog uses udp.

Sorry, I've rambled entirely too long, I'll go back to merely listening.

Frank Solomon
University of Kentucky
Lead Systems Programmer, Enterprise Systems
http://www.franksolomon.net
"If you give someone a program, you will frustrate them for a day; if
you teach them how to program, you will frustrate them for a lifetime."
--Anonymous


-----Original Message-----
[mailto:loganalysis-bounces+sysfrank=uky.edu@private] On Behalf
Of Jason Haar
Sent: Monday, December 05, 2005 3:15 PM

. . .snip. . .

Boring, everyday example:  These days (due to the horrors of antispam
systems) internal users routinely ring the helpdesk and ask "Customer YY
sent me an email and I never got it. What happened?". To figure that out
involves converting what you can learn about customer YY into DNS
records and IP addresses, then tracking any related connections as they
hits the edge of our Internet link. Where it first meets our RBL checks,
then flows through AV and antispam systems, then through a couple more
internal mail relays before hitting our end mail servers. We have logs
all merged together from all those systems, but frankly, I am still the
only one who can link all those events together. And my attempts at
turning that eyeballing into a program have failed so far. And that's
only one example.

. . .
_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Tue Dec 06 2005 - 05:49:08 PST