On Monday, Aug 26, 2002, at 22:31 US/Pacific, Russell Fulton wrote: >> Try analysing your data and putting your most common cases first, so >> they will match sooner and return before the rest are executed. > > Given that the optimizer is working over multiple statements or > expressions I don't think the order is actually material. I think it will matter - if your data will result in one expression matching more frequently than the average checking it ahead of the average and below average cases will avoid unnecessary checks. Here's a framework for testing this - running it to match qpopper-related entries on 8 million lines of syslog data shows that moving the 3rd item up in the list would improve my average execution time. #!/usr/bin/perl -w my @pats; my @pat_hits; push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) TLSv1\/SSLv3 handshake with client/o; push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: apop/o; push @pats, qr/^.+ \w+ in.qpopper\[([0-9]+)\]: \(v4\.0\.4\) POP login by user "([^"]+)\" at \(([^)]+)\)/o; # Exclude startup overhead: my $start = (times())[0]; while (<>) { check_line($_); } my $elapsed = (times())[0] - $start; print "Hit statistics for " . scalar(@pats) . " rules after processing $. lines in $elapsed seconds (user time):\n"; foreach $i (sort { return $pat_hits[$b] <=> $pat_hits[$a] } 0..(@pats - 1)) { printf "%4d: %10d (%3.2f%%)\n", $i, $pat_hits[$i], $pat_hits[$i] / $.; } sub check_line { $line = shift; for ($i = 0; $i < @pats; $i++) { if ($line =~ $pats[$i]) { $pat_hits[$i]++; return $i; } } } _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Wed Aug 28 2002 - 15:17:11 PDT