Re: [logs] perl question relating to log analysis

From: Chris Adams (cadamsat_private)
Date: Wed Aug 28 2002 - 12:34:10 PDT

  • Next message: Karl Vogel: "Re: [logs] perl question relating to log analysis"

    On Monday, Aug 26, 2002, at 22:31 US/Pacific, Russell Fulton wrote:
    >> Try analysing your data and putting your most common cases first, so
    >> they will match sooner and return before the rest are executed.
    >
    > Given that the optimizer is working over multiple statements or
    > expressions I don't think the order is actually material.
    
    I think it will matter - if your data will result in one expression 
    matching more frequently than the average checking it ahead of the 
    average and below average cases will avoid unnecessary checks.
    
    Here's a framework for testing this - running it to match 
    qpopper-related entries on 8 million lines of syslog data shows that 
    moving the 3rd item up in the list would improve my average execution 
    time.
    
    #!/usr/bin/perl -w
    
    my @pats;
    my @pat_hits;
    
    push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) 
    TLSv1\/SSLv3 handshake with client/o;
    push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: apop/o;
    push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) POP login 
    by user "([^"]+)\" at \(([^)]+)\)/o;
    
    # Exclude startup overhead:
    my $start = (times())[0];
    
    while (<>) {
             check_line($_);
    }
    
    my $elapsed = (times())[0] - $start;
    
    print "Hit statistics for " . scalar(@pats) . " rules after processing 
    $. lines in $elapsed seconds (user time):\n";
    foreach $i (sort { return $pat_hits[$b] <=> $pat_hits[$a] } 0..(@pats - 
    1)) {
             printf "%4d: %10d (%3.2f%%)\n", $i, $pat_hits[$i], 
    $pat_hits[$i] / $.;
    }
    
    sub check_line {
             $line = shift;
    
             for ($i = 0; $i < @pats; $i++) {
                     if ($line =~ $pats[$i]) {
                             $pat_hits[$i]++;
                             return $i;
                     }
             }
    }
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Wed Aug 28 2002 - 15:17:11 PDT