Re: [logs] perl question relating to log analysis

From: Shane Kerr (shane@time-travellers.org)
Date: Mon Aug 26 2002 - 02:40:57 PDT

  • Next message: Nick Vargish: "Re: Re[2]: [logs] Logging: World Domination"

    [ Warning: more Perl and/or general optimsation than log analysis ]
    
    Russell,
    
    On 2002-08-26 17:18:42 +1200, Russell Fulton wrote:
    > I have recently reimplemented much of the functionality of Psionic's
    > Logcheck in a perl script.  I have also added functionality to make it
    > more useful in a central log server enviroment (you can specify
    > specific checks for different hosts and have reports for different
    > hosts mailed to different admins).
    > 
    > We are now testing it in a production enviroment, when we are happy
    > with it and I have written some documentation (what's that ?? ;-) I
    > will post the script to the list for others to have a play with.
    > 
    > My immediate concern is that the perl scripts builds functions that
    > apply lots of regular expressions (REs) to each line of log files.
    > 
    > sub check {
    >     $_ = shift;
    >     study $_;   #hopefully speed up matching...
    > 
    >     return 0 if /re1/;
    >     return 0 if /re2/;
    >     return 1 if /re3/;
    >     return 1 if /re4/;
    >     return 1 if /re5/;
    >     return 2 if /re6/;
    >     return 2 if /re7/i;
    >     return 3 if /re8/;
    >     ...
    >     return 4;
    > }
    > 
    > return code tells the program what to do with this record.
    > 
    > Anyone know of any tricks to speed this up since this is the innermost
    > loop of the process any gains here should be worthwhile.  I know the
    > RE optimizer is pretty smart and that it will do some optimization
    > over statements but I have never figured out what the limitations are.
    
    The exact details of the "study" function are in the perlfunc man page:
    
        The way `study' works is this: a linked list of every character in
        the string to be searched is made, so we know, for example, where
        all the `'k'' characters are.  From each search string, the rarest
        character is selected, based on some static frequency tables
        constructed from some C programs and English text.  Only those
        places that contain this "rarest" character are examined.
    
    
    Anyway, you should probably consider using the Benchmark module,
    "perldoc Benchmark" for details.  You can then play around with various
    combinations:
    
    Case 1:
    
         return 0 if /re1/;
         return 0 if /re2/;
         return 1 if /re3/;
         return 1 if /re4/;
         return 1 if /re5/;
    
    Case 2:
     
         return 0 if /re1/ || /re2/;
         return 1 if /re3/ || /re4/ || /re5/;
    
    Case 3:
    
         return 0 if /(re1)|(re2)/;
         return 1 if /(re3)|(re4)|(re5)/;
    
    Case 4:
    
         if (/re1/) {
             return 0;
         } elsif (/re2/) {
             return 0;
         } elsif (/re3/) { 
             return 1;
         } elsif (/re4/) {
             return 1;
         } elsif (/re5/) {
             return 1;
         }
    
    Case 5:
    
         if (/re1/ || /re2/) {
             return 0;
         } elsif (/re3/ || /re4/ || /re5/) {
             return 1;
         }
    
    Case 6:
    
         if (/(re1)|(re2)/) {
             return 0;
         } elsif (/(re3)|(re4)|(re5)/) {
             return 1;
         }
    
    Make sure you run the various cases matching your various types (e.g.
    re1, re4, etc.).  I know for a fact that Case 4 is slower than Case 1,
    but Case 5 or Case 6 may be faster so I threw it in.
    
    Try analysing your data and putting your most common cases first, so
    they will match sooner and return before the rest are executed.
    
    If any of your expressions are exact matches, /^string$/, then use eq:
    
       return 5 if ($_ eq "string");
    
    If any of your expressions are simple constant substrings, / something/,
    then you may wish to try index():
    
       return 6 if (index($_, " something") != -1);
    
    Good luck!
    
    -- 
    Shane
    Carpe Diem
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Mon Aug 26 2002 - 09:22:51 PDT