RE: [logs] Re: Generic Log Message Parsing Tool

From: Dale.Drewat_private
Date: Wed Jun 05 2002 - 12:05:14 PDT

Next message: Marcus J. Ranum: "Re: [logs] Re: Generic Log Message Parsing Tool"

Previous message: Rajkumar S.: "Re: [logs] Re: Generic Log Message Parsing Tool"
Maybe in reply to: Steve: "[logs] Re: Generic Log Message Parsing Tool"
Next in thread: Jon Stearley: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Dale.Drewat_private: "RE: [logs] Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

For anyone whom is interested, Ive recently made a log policy engine (APE -
Anomaly Policy Engine) available.  It's a fairly flexible, very robust log
parsing agent that has a significant amount of rule hierarchy support and
handles data at very high speeds, regardless of the fact that its written in
perl.  :-)

For those of you whom are interested, you can find it at
http://www.hackertracker.org/cst/cst.html
<http://www.hackertracker.org/cst/cst.html>  under the "Intrusion Detection
server" link.

Dale

-----Original Message-----
From: Adam Sah [mailto:asahat_private] 
Sent: Tuesday, June 04, 2002 6:33 PM
To: loganalysisat_private
Subject: Re: [logs] Re: Generic Log Message Parsing Tool 

I don't know if this helps, but the Addamark LMS uses perl5 regular 
   expressions to hack up the log into fields, then hits those fields with 
   arbitrary expressions (SQL+perl).  We solve the performance problem by 
   running the parse in parallel across a cluster of PCs-- this also
provides 
   linear scaling.  In practice, we've never had a problem parsing up 
   somebody's log, including some crazy custom ones. 
Anyway, I've included a little writeup on our scheme/format below.  If it's 
   helpful, feel free to steal the ideas-- our goal is to be compatible with

   whatever parsing format(s) become popular, and if they're based on us, 
   that only makes our job easier ;-) 
adam 
Adam Sah -- CTO, Addamark Technologies -- http://www.addamark.com/
<http://www.addamark.com/>  
..tear.along.dotted.line................................................ 
The Addamark parsing script format is as follows: 
   ^...your regexp here...$ 
   name1:type,name2:type,name3:type,name4:type,... 
   ...your code here... 
The regexp locates the individual fields in a given record, each match 
   (paren-match) is given a "name" and forced into the given datatype as per

   the name:type line.  These parse fields are then made available to the 
   code section, e.g. as variables.  In our case, the "code" is a SQL 
   statement, in which you can embed Perl.  If you don't feel like writing a

   SQL engine (understandable!), you could jump straight into Perl.  For 
   readability, we've added "...X" to the regexp language which means 
   "create a match out of anything up to the next X" 
For example, here's the Addamark script to parse an Apache weblog: 
^... ... ... \[...\] "... ... ..." ... ... "..." "..." ...$ 
ClientIP:VARCHAR,unused1:VARCHAR,unused2:VARCHAR,tsStr:VARCHAR, 
   Method:VARCHAR, Url:VARCHAR, HttpVers:VARCHAR, RespCode:INT32, 
   RespSize:INT32, Referrer:VARCHAR, UserAgent:VARCHAR, RespTime:VARCHAR 
SELECT  _strptime( tsStr, "%d/%b/%Y:%H:%M:%S %Z") as ts, 
        ClientIP, 
        _rev_dns(ClientIP) as ClientDNS, -- do a reverse DNS lookup, 
                  -- this too happens in parallel across the cluster 
        Method, 
        Url, 
        HttpVers, 
        RespCode, 
        RespSize, 
        Referrer, 
        UserAgent, 
        _int32(RespTime) as RespTime, -- another way to parse strings to
nums 
FROM stdin; 
You could replace the SELECT statement with some arbitrary Perl code, which 
   has some API for defining the output columns. 
notes: 
 - Multi-line records are relatively rare, so we handle them by
pre-processing 
   the log data so that the records are all on one line apiece.  
 - Variant records are handled using regexp "union" ("|").  This doubles the

   number of parse fields, but that's easily re-unified in the code section.

   Perl5 regexps already handle binary data. 

> On Tue, Jun 04, 2002 at 05:36:05PM -0400, Steve wrote: 
> > I've been working on understanding the Perl module Parse::RecDescent 
> > for just such a thing.  I suspect it would be possible to create a 
> > stockpile of its "grammars" for many established log formats, and then 
> > people would have an easier time modifying it for new formats. 
>       That's a large part of what I've got so far; the problems 
> I'm running into are scalability/performance ones--Parse::RecDescent is 
> a beautiful beast, but not a very fast one at all.  Marcus was doing his 
> version in C, which for performance reasons makes a lot of sense, so I 
> was thinking that perhaps someone else might be able to pick up that 
> path.  Perhaps a good way to start would be to ignore the implementation 
> issues for now, and just start building a stockpile of grammars as you 
> suggest; it should be relatively easy to convert a well-formed grammar to 
> a lex/yacc syntax, yes?  (I don't know, as my C skills are less than 
> stellar and I've never actually used lex/yacc.) 
>       (I also started a conversation this morning with Damian 
> Conway and Mark-Jason Dominus about a faster way to implement a parser 
> in Perl, using iteration rather than recursion; it might be a long time 
> before that pans out, but if it does, maybe Perl could remain a valid 
> option as well.) 
> 
>       -- Sweth. 
> 
> -- 
> Sweth Chandramouli      Idiopathic Systems Consulting 
> svcat_private      http://www.idiopathic.net/
<http://www.idiopathic.net/>  
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: loganalysis-unsubscribeat_private 
> For additional commands, e-mail: loganalysis-helpat_private 
> 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: loganalysis-unsubscribeat_private 
For additional commands, e-mail: loganalysis-helpat_private

Next message: Marcus J. Ranum: "Re: [logs] Re: Generic Log Message Parsing Tool"
Previous message: Rajkumar S.: "Re: [logs] Re: Generic Log Message Parsing Tool"
Maybe in reply to: Steve: "[logs] Re: Generic Log Message Parsing Tool"
Next in thread: Jon Stearley: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Dale.Drewat_private: "RE: [logs] Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 12:09:38 PDT