For anyone whom is interested, Ive recently made a log policy engine (APE - Anomaly Policy Engine) available. It's a fairly flexible, very robust log parsing agent that has a significant amount of rule hierarchy support and handles data at very high speeds, regardless of the fact that its written in perl. :-) For those of you whom are interested, you can find it at http://www.hackertracker.org/cst/cst.html <http://www.hackertracker.org/cst/cst.html> under the "Intrusion Detection server" link. Dale -----Original Message----- From: Adam Sah [mailto:asahat_private] Sent: Tuesday, June 04, 2002 6:33 PM To: loganalysisat_private Subject: Re: [logs] Re: Generic Log Message Parsing Tool I don't know if this helps, but the Addamark LMS uses perl5 regular expressions to hack up the log into fields, then hits those fields with arbitrary expressions (SQL+perl). We solve the performance problem by running the parse in parallel across a cluster of PCs-- this also provides linear scaling. In practice, we've never had a problem parsing up somebody's log, including some crazy custom ones. Anyway, I've included a little writeup on our scheme/format below. If it's helpful, feel free to steal the ideas-- our goal is to be compatible with whatever parsing format(s) become popular, and if they're based on us, that only makes our job easier ;-) adam Adam Sah -- CTO, Addamark Technologies -- http://www.addamark.com/ <http://www.addamark.com/> ..tear.along.dotted.line................................................ The Addamark parsing script format is as follows: ^...your regexp here...$ name1:type,name2:type,name3:type,name4:type,... ...your code here... The regexp locates the individual fields in a given record, each match (paren-match) is given a "name" and forced into the given datatype as per the name:type line. These parse fields are then made available to the code section, e.g. as variables. In our case, the "code" is a SQL statement, in which you can embed Perl. If you don't feel like writing a SQL engine (understandable!), you could jump straight into Perl. For readability, we've added "...X" to the regexp language which means "create a match out of anything up to the next X" For example, here's the Addamark script to parse an Apache weblog: ^... ... ... \[...\] "... ... ..." ... ... "..." "..." ...$ ClientIP:VARCHAR,unused1:VARCHAR,unused2:VARCHAR,tsStr:VARCHAR, Method:VARCHAR, Url:VARCHAR, HttpVers:VARCHAR, RespCode:INT32, RespSize:INT32, Referrer:VARCHAR, UserAgent:VARCHAR, RespTime:VARCHAR SELECT _strptime( tsStr, "%d/%b/%Y:%H:%M:%S %Z") as ts, ClientIP, _rev_dns(ClientIP) as ClientDNS, -- do a reverse DNS lookup, -- this too happens in parallel across the cluster Method, Url, HttpVers, RespCode, RespSize, Referrer, UserAgent, _int32(RespTime) as RespTime, -- another way to parse strings to nums FROM stdin; You could replace the SELECT statement with some arbitrary Perl code, which has some API for defining the output columns. notes: - Multi-line records are relatively rare, so we handle them by pre-processing the log data so that the records are all on one line apiece. - Variant records are handled using regexp "union" ("|"). This doubles the number of parse fields, but that's easily re-unified in the code section. Perl5 regexps already handle binary data. > On Tue, Jun 04, 2002 at 05:36:05PM -0400, Steve wrote: > > I've been working on understanding the Perl module Parse::RecDescent > > for just such a thing. I suspect it would be possible to create a > > stockpile of its "grammars" for many established log formats, and then > > people would have an easier time modifying it for new formats. > That's a large part of what I've got so far; the problems > I'm running into are scalability/performance ones--Parse::RecDescent is > a beautiful beast, but not a very fast one at all. Marcus was doing his > version in C, which for performance reasons makes a lot of sense, so I > was thinking that perhaps someone else might be able to pick up that > path. Perhaps a good way to start would be to ignore the implementation > issues for now, and just start building a stockpile of grammars as you > suggest; it should be relatively easy to convert a well-formed grammar to > a lex/yacc syntax, yes? (I don't know, as my C skills are less than > stellar and I've never actually used lex/yacc.) > (I also started a conversation this morning with Damian > Conway and Mark-Jason Dominus about a faster way to implement a parser > in Perl, using iteration rather than recursion; it might be a long time > before that pans out, but if it does, maybe Perl could remain a valid > option as well.) > > -- Sweth. > > -- > Sweth Chandramouli Idiopathic Systems Consulting > svcat_private http://www.idiopathic.net/ <http://www.idiopathic.net/> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: loganalysis-unsubscribeat_private > For additional commands, e-mail: loganalysis-helpat_private > --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 12:09:38 PDT