Re: [logs] perl question relating to log analysis

From: Sweth Chandramouli (loganalysisat_private)
Date: Wed Aug 28 2002 - 20:20:29 PDT

  • Next message: Larry Brown: "[logs] Log Retention"

    On Tue, Aug 27, 2002 at 04:56:50PM +0000, Jeff Schaller wrote:
    > This all assumes that you have a good idea of what your data looks
    > like frequency-wise /before you look at it/. I could see this
    > community getting that done by collating a bunch of sanitized
    > logs, coming up with tight REs to match various messages, and then
    > grinding out the various statistics.
    	Assuming that your regexes are orthogonal (i.e. no two
    regexes will ever match the same string), it's also often worth it to
    build in some self-ordering logic.  The perl stubs to do something like
    this are basically:
    
    #!/usr/bin/perl -w
    
    use strict;
    my $reorder_count = 10000;
    
    my %regexes = (
       1 => {
          "pattern" => qr/regex one/
          ,"action" => sub { regex_one_action }
          ,"count" => 0
       }
       ,2 => {
          "pattern" => qr/regex two/
          ,"action" => sub { regex_two_action }
          ,"count" => 0
       }
       ,3 => {
          "pattern" => qr/regex three/
          ,"action" => sub { regex_three_action }
          ,"count" => 0
       }
    );
    
    LINE: while (my $line = <ARGV>) {
       if (($. % $reorder_count) == 0) {
          %regexes = map {
             $_ => $regexes{$_}
          } sort {
             $regexes{$a}->{"count"} <=> $regexes{$b}->{"count"}
          } keys %regexes;
       };
       for my $key (keys %regexes) {
          $line =~ $regexes{$key}->{"pattern"}
             && $regexes{$key}->{"action"}->()
             && $regexes{$key}->{"count"}++
             && next LINE;
       };
    };
    __END__
    
    	.  The $reorder_count can be used to determine how often to
    resort the regexes; for really big data sets, I've also seen it be
    useful to do apply some type of growth function to $reorder_count after
    resorting (so that the first sorting happens after, say, 10k records,
    and the next one happens after 20k, and the next after 40k, etc.).  If
    the same script is going to be run many times, it might also make sense
    to have it dump the sorted regex list to a file (probably using a Tie,
    since IIRC Data::Dumper horks when dealing with qr-compiled regexes), so
    that it can persist between invocations.
    
    	-- Sweth.
    
    -- 
    Sweth Chandramouli      Idiopathic Systems Consulting
    svcat_private      http://www.idiopathic.net/
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Wed Aug 28 2002 - 21:31:34 PDT