More Perl - hit DELETE now if uninterested. Followup question: what efficiency considerations would apply if one were to replace: return 2 if /msg 2/ return 1 if /msg 1/ return 3 if /msg 3/ with: %returnvalue = ( "msg 1" => 1, "msg 2" => 2, "msg 3" => 3, ); # Assumption: all incidences of "msg 1", "msg 2", "msg 3" can be # isolated using a parenthesized regexp ### # various stuff, then... if(/$subsidiary_regexp_before($regexp_to_glean_mgs)$subsidiary_regexp_after/) { return $returnvalue{$1}; } ? Remember my CompSci studies, the former is O(n) in the average case and the worst case, whereas the latter is O(1) across the board; however, I don't know enough Perl internals to know whether the interpreter can optimize the former better than the latter. -g Glenn Forbes Fleming Larratt Rice University Network Management glrattat_private ---------- Forwarded message ---------- Date: Tue, 27 Aug 2002 16:56:50 +0000 (UTC) From: Jeff Schaller <schallerat_private> To: Russell Fulton <r.fultonat_private> Cc: "loganalysisat_private" <loganalysisat_private> Subject: Re: [logs] perl question relating to log analysis On 27 Aug 2002, Russell Fulton wrote: > Those who are not interested in perl please hit DELETE now. Yet More Perl Ahead > > Try analysing your data and putting your most common cases first, so > > they will match sooner and return before the rest are executed. > > Given that the optimizer is working over multiple statements or > expressions I don't think the order is actually material. I think it would. Imagine you have 3 types of log entries. Message 1 occurs 10% of the time Message 2 occurs 80% of the time Message 3 occurs 10% of the time and that you order your function as follows: return 1 if /msg 1/ return 3 if /msg 3/ return 2 if /msg 2/ then perl has to execute two extraneous (theoretically) pattern matches 80% of the time. I think the upshot is to order the tests in a best-guess order of frequency: return 2 if /msg 2/ return 1 if /msg 1/ return 3 if /msg 3/ This all assumes that you have a good idea of what your data looks like frequency-wise /before you look at it/. I could see this community getting that done by collating a bunch of sanitized logs, coming up with tight REs to match various messages, and then grinding out the various statistics. I would also recommend playing with another "speed variable" -- ordering your regular expressions according to length. RE's with more static text will be faster to match (or mismatch) than those with variability (. [a-z] alternation, etc). Eg. if (/seven/) can fail more quickly against "eight" than can: if (/^....$/) as it can fail on the initial "s" vs "e" as opposed to the character count difference at the end. -jeff -- "Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the drug store, but that's just peanuts to space." -- The Hitchhiker's Guide to the Galaxy _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Wed Aug 28 2002 - 11:12:57 PDT