Re: [logs] perl question relating to log analysis

Previous message: Michael Poon: "Re: [logs] a small reminder"
In reply to: Russell Fulton: "Re: [logs] perl question relating to log analysis"
Next in thread: Stephen W. Thompson: "[logs] Re: perl question relating to log analysis"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

cadamsat_private

On Monday, Aug 26, 2002, at 22:31 US/Pacific, Russell Fulton wrote:
>> Try analysing your data and putting your most common cases first, so
>> they will match sooner and return before the rest are executed.
>
> Given that the optimizer is working over multiple statements or
> expressions I don't think the order is actually material.

I think it will matter - if your data will result in one expression 
matching more frequently than the average checking it ahead of the 
average and below average cases will avoid unnecessary checks.

Here's a framework for testing this - running it to match 
qpopper-related entries on 8 million lines of syslog data shows that 
moving the 3rd item up in the list would improve my average execution 
time.

#!/usr/bin/perl -w

my @pats;
my @pat_hits;

push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) 
TLSv1\/SSLv3 handshake with client/o;
push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: apop/o;
push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) POP login 
by user "([^"]+)\" at \(([^)]+)\)/o;

# Exclude startup overhead:
my $start = (times())[0];

while (<>) {
         check_line($_);
}

my $elapsed = (times())[0] - $start;

print "Hit statistics for " . scalar(@pats) . " rules after processing 
$. lines in $elapsed seconds (user time):\n";
foreach $i (sort { return $pat_hits[$b] <=> $pat_hits[$a] } 0..(@pats - 
1)) {
         printf "%4d: %10d (%3.2f%%)\n", $i, $pat_hits[$i], 
$pat_hits[$i] / $.;
}

sub check_line {
         $line = shift;

         for ($i = 0; $i < @pats; $i++) {
                 if ($line =~ $pats[$i]) {
                         $pat_hits[$i]++;
                         return $i;
                 }
         }
}

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis