Re: [logs] perl question relating to log analysis (fwd)

glrattat_private

More Perl - hit DELETE now if uninterested.

Followup question: what efficiency considerations would apply if one
were to  replace:

return 2 if /msg 2/
return 1 if /msg 1/
return 3 if /msg 3/

with:

%returnvalue =
(
  "msg 1" => 1,
  "msg 2" => 2,
  "msg 3" => 3,
);
# Assumption: all incidences of "msg 1", "msg 2", "msg 3" can be
# isolated using a parenthesized regexp
###
# various stuff, then...
if(/$subsidiary_regexp_before($regexp_to_glean_mgs)$subsidiary_regexp_after/)
{ return $returnvalue{$1}; }

? Remember my CompSci studies, the former is O(n) in the average case and
the worst case, whereas the latter is O(1) across the board; however, I
don't know enough Perl internals to know whether the interpreter can
optimize the former better than the latter.

	-g

				Glenn Forbes Fleming Larratt
				Rice University Network Management
				glrattat_private

---------- Forwarded message ----------
Date: Tue, 27 Aug 2002 16:56:50 +0000 (UTC)
From: Jeff Schaller <schallerat_private>
To: Russell Fulton <r.fultonat_private>
Cc: "loganalysisat_private" <loganalysisat_private>
Subject: Re: [logs] perl question relating to log analysis

On 27 Aug 2002, Russell Fulton wrote:

> Those who are not interested in perl please hit DELETE now.

Yet More Perl Ahead

> > Try analysing your data and putting your most common cases first, so
> > they will match sooner and return before the rest are executed.
>
> Given that the optimizer is working over multiple statements or
> expressions I don't think the order is actually material.

I think it would. Imagine you have 3 types of log entries.
Message 1 occurs 10% of the time
Message 2 occurs 80% of the time
Message 3 occurs 10% of the time

and that you order your function as follows:

return 1 if /msg 1/
return 3 if /msg 3/
return 2 if /msg 2/

then perl has to execute two extraneous (theoretically) pattern
matches 80% of the time. I think the upshot is to order the tests
in a best-guess order of frequency:

return 2 if /msg 2/
return 1 if /msg 1/
return 3 if /msg 3/

This all assumes that you have a good idea of what your data looks
like frequency-wise /before you look at it/. I could see this
community getting that done by collating a bunch of sanitized
logs, coming up with tight REs to match various messages, and then
grinding out the various statistics.

I would also recommend playing with another "speed variable" --
ordering your regular expressions according to length. RE's with
more static text will be faster to match (or mismatch) than those
with variability (. [a-z] alternation, etc).
Eg.

if (/seven/)

can fail more quickly against "eight" than can:

if (/^....$/)

as it can fail on the initial "s" vs "e" as opposed to the
character count difference at the end.

-jeff
-- 
"Space is big.  You just won't believe how vastly, hugely,
 mind-bogglingly big it is.  I mean, you may think it's a
 long way down the road to the drug store, but that's just
 peanuts to space." -- The Hitchhiker's Guide to the Galaxy

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis