Re: [logs] perl question relating to log analysis

From: Russell Fulton (r.fultonat_private)
Date: Mon Aug 26 2002 - 22:31:02 PDT

  • Next message: Darren Reed: "Re: Re[2]: [logs] Logging: World Domination"

    On Mon, 2002-08-26 at 21:40, Shane Kerr wrote:
    > [ Warning: more Perl and/or general optimsation than log analysis ]
    
    definitely ;-)  Those who are not interested in perl please hit DELETE
    now.
    
    Firstly thanks to all who responded your suggestions were most useful.
    Since Shane covered the most options I'll repond in detail using his
    message as a template.
    
    Some general observations first:  study helps a bit as does putting o on
    the end of the REs (this surprised me since I thought it only affected
    RE which have variable substitution in them -- you lern something evert
    day;-).
    
    I did not use Benchmark module since I wanted to test this on real
    data.  What I did instead was write the script so that it would use
    differnent functions to build the code to be compiled with different
    structures.
    
    perliminary results processing a days logs from about 20 UNIX boxes:
    
    All using study and o;
    
    > 
    > Case 1:
    > 
    >      return 0 if /re1/;
    >      return 0 if /re2/;
    >      return 1 if /re3/;
    >      return 1 if /re4/;
    >      return 1 if /re5/;
    
    fastest
    
    > 
    > Case 2:
    >  
    >      return 0 if /re1/ || /re2/;
    >      return 1 if /re3/ || /re4/ || /re5/;
    > 
    
    Equal fastest
    
    > Case 3:
    > 
    >      return 0 if /(re1)|(re2)/;
    >      return 1 if /(re3)|(re4)|(re5)/;
    >
    
    factor of about 5 slower ( I did not use () since none of my patterns
    have alternation in them.
    
    I did not test the variations of if(){}elsif...  I may do it later if I
    get time.
    
    I think that this shows that the standard perl RE optimization is pretty
    good and that it is best not to try and out guess it.
    
    > Try analysing your data and putting your most common cases first, so
    > they will match sooner and return before the rest are executed.
    
    Given that the optimizer is working over multiple statements or
    expressions I don't think the order is actually material.
    
    > 
    > If any of your expressions are exact matches, /^string$/, then use eq:
    > 
    >    return 5 if ($_ eq "string");
    > 
    > If any of your expressions are simple constant substrings, / something/,
    > then you may wish to try index():
    > 
    >    return 6 if (index($_, " something") != -1);
    
    Good point, I'll try adding this stuff in later.
    
    > 
    > Good luck!
    
    Thanks!
    
    you may all now return to your real program -- World Domination! 
    
    -- 
    Russell Fulton, Computer and Network Security Officer
    The University of Auckland,  New Zealand
    
    "It aint necessarily so"  - Gershwin
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Tue Aug 27 2002 - 09:43:46 PDT