[logs] Thoughts on log normalization

mikemat_private

Hi Folks,

After reading everyone's posts over the past few days, (excellent
content), I've decided to throw together a rudimentary log-parser in perl
(cause it's quick and dirty and I know it best), that takes log data from
lots of different sources and spits it out in a tokenized format as
decribed by MJR in past posts.  My intention is to then take this
"normalized" data and feed it into a database as well as a monitoring
program that will employ mjr's artificial ignorance.

The logic that I've come up with so far consists of a routine that does
the following:

1)  tries to guess the type of message based upon some obvious content,
such as %PIX indicates it's a pix firewall message.  If it can determine
the type of message, parsing the contents is a cinch, since the order of
the tokens is already known (until Cisco changes their format).

2)  if #1 fails, do a "brute-force" approach (for lack of a better term),
that employs sequences of tokens, ie. If the date comes first, then a
hostname after it, etc.  I'm not quite sure how this'll work out yet.
Hopefully #1 will cover 95% of the cases.

I'm looking for peoples' opinion on this approach - am I totally barking
up the wrong tree, or is this essentially the way to go.  I don't really
care about debate over perl vs. c vs. anything else, just the overall
concepts.  

This whole log-analysis thing feels like an elephant to me, so I'm just
trying to eat it one spoonful at a time.

Thanks in advance for any input.

-Mike.

==================================================================
Mike Messick           Dona nobis pacem          rm -rf /bin/laden
PGP Key Fingerprint:                       email: mikemat_private 
2048/0x57318496 053B 412B 82FC 3808 E141  CDCD 74AE 01C5 5731 8496

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
https://lists.shmoo.com/mailman/listinfo/loganalysis