[ Warning: more Perl and/or general optimsation than log analysis ] Russell, On 2002-08-26 17:18:42 +1200, Russell Fulton wrote: > I have recently reimplemented much of the functionality of Psionic's > Logcheck in a perl script. I have also added functionality to make it > more useful in a central log server enviroment (you can specify > specific checks for different hosts and have reports for different > hosts mailed to different admins). > > We are now testing it in a production enviroment, when we are happy > with it and I have written some documentation (what's that ?? ;-) I > will post the script to the list for others to have a play with. > > My immediate concern is that the perl scripts builds functions that > apply lots of regular expressions (REs) to each line of log files. > > sub check { > $_ = shift; > study $_; #hopefully speed up matching... > > return 0 if /re1/; > return 0 if /re2/; > return 1 if /re3/; > return 1 if /re4/; > return 1 if /re5/; > return 2 if /re6/; > return 2 if /re7/i; > return 3 if /re8/; > ... > return 4; > } > > return code tells the program what to do with this record. > > Anyone know of any tricks to speed this up since this is the innermost > loop of the process any gains here should be worthwhile. I know the > RE optimizer is pretty smart and that it will do some optimization > over statements but I have never figured out what the limitations are. The exact details of the "study" function are in the perlfunc man page: The way `study' works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the `'k'' characters are. From each search string, the rarest character is selected, based on some static frequency tables constructed from some C programs and English text. Only those places that contain this "rarest" character are examined. Anyway, you should probably consider using the Benchmark module, "perldoc Benchmark" for details. You can then play around with various combinations: Case 1: return 0 if /re1/; return 0 if /re2/; return 1 if /re3/; return 1 if /re4/; return 1 if /re5/; Case 2: return 0 if /re1/ || /re2/; return 1 if /re3/ || /re4/ || /re5/; Case 3: return 0 if /(re1)|(re2)/; return 1 if /(re3)|(re4)|(re5)/; Case 4: if (/re1/) { return 0; } elsif (/re2/) { return 0; } elsif (/re3/) { return 1; } elsif (/re4/) { return 1; } elsif (/re5/) { return 1; } Case 5: if (/re1/ || /re2/) { return 0; } elsif (/re3/ || /re4/ || /re5/) { return 1; } Case 6: if (/(re1)|(re2)/) { return 0; } elsif (/(re3)|(re4)|(re5)/) { return 1; } Make sure you run the various cases matching your various types (e.g. re1, re4, etc.). I know for a fact that Case 4 is slower than Case 1, but Case 5 or Case 6 may be faster so I threw it in. Try analysing your data and putting your most common cases first, so they will match sooner and return before the rest are executed. If any of your expressions are exact matches, /^string$/, then use eq: return 5 if ($_ eq "string"); If any of your expressions are simple constant substrings, / something/, then you may wish to try index(): return 6 if (index($_, " something") != -1); Good luck! -- Shane Carpe Diem _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Mon Aug 26 2002 - 09:22:51 PDT