On Tue, Aug 27, 2002 at 04:56:50PM +0000, Jeff Schaller wrote: > This all assumes that you have a good idea of what your data looks > like frequency-wise /before you look at it/. I could see this > community getting that done by collating a bunch of sanitized > logs, coming up with tight REs to match various messages, and then > grinding out the various statistics. Assuming that your regexes are orthogonal (i.e. no two regexes will ever match the same string), it's also often worth it to build in some self-ordering logic. The perl stubs to do something like this are basically: #!/usr/bin/perl -w use strict; my $reorder_count = 10000; my %regexes = ( 1 => { "pattern" => qr/regex one/ ,"action" => sub { regex_one_action } ,"count" => 0 } ,2 => { "pattern" => qr/regex two/ ,"action" => sub { regex_two_action } ,"count" => 0 } ,3 => { "pattern" => qr/regex three/ ,"action" => sub { regex_three_action } ,"count" => 0 } ); LINE: while (my $line = <ARGV>) { if (($. % $reorder_count) == 0) { %regexes = map { $_ => $regexes{$_} } sort { $regexes{$a}->{"count"} <=> $regexes{$b}->{"count"} } keys %regexes; }; for my $key (keys %regexes) { $line =~ $regexes{$key}->{"pattern"} && $regexes{$key}->{"action"}->() && $regexes{$key}->{"count"}++ && next LINE; }; }; __END__ . The $reorder_count can be used to determine how often to resort the regexes; for really big data sets, I've also seen it be useful to do apply some type of growth function to $reorder_count after resorting (so that the first sorting happens after, say, 10k records, and the next one happens after 20k, and the next after 40k, etc.). If the same script is going to be run many times, it might also make sense to have it dump the sorted regex list to a file (probably using a Tie, since IIRC Data::Dumper horks when dealing with qr-compiled regexes), so that it can persist between invocations. -- Sweth. -- Sweth Chandramouli Idiopathic Systems Consulting svcat_private http://www.idiopathic.net/ _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Wed Aug 28 2002 - 21:31:34 PDT