I asked a perl group expert ... "To understand recursion, we must first understand recursion." ----- Forwarded by Andy Bach/WIWB/07/USCOURTS on 08/29/02 02:28 PM ----- Quoting Adam Rice wysiwygat_private I've done a lot of log-munging in Perl, and I must report that for any significant amount of logs, regexps just aren't fast enough. In some cases I've found a solution using index() and rindex() that was adequate. But once you get to that level of optimisation, Perl becomes as ugly as C, and the C solution is generally more flexible (because it doesn't have to be hand-optimised to death to achieve acceptable speed). If you have to use regexps, it's worth tinkering with them. Often with careful use of character classes, you can save Perl from having to do backtracking. Try to avoid anchoring from the end of the string... it looks like it should be fast, but in my experience it isn't. Anchor to the start of the string where it makes sense, but not if it makes the regexp more complicated. Complex regular expressions are really slow, so try breaking them down into several smaller ones. On the other hand, for doing ad-hoc queries against server logs, Perl is usually the language of choice. Cute tip: since the grep variants are way faster than Perl, use them to narrow the field before Perl does the grunt work. Say you want a list of JPEG files larger than 200k, together with how often they were served: zgrep -F ".jpg" logfile.gz | egrep ' [0-9][0-9][0-9][0-9][0-9][0-9] ' | perl -ne 'print "$1\t$2\n" if / "GET (\/[^\s\"]+)[^"]*" \d+ (\d+) / && $2>200*1024' | sort | uniq -c Always test on a subset of your logs first! Where I work, a command like this will take an hour on a full month's logs, and you'll be very annoyed if you wait that long to discover you made a typo. Tip 3: "top" is good for getting an immediate idea of how efficient your command is. Ideally you want the "gzip" process using 80% or more of the CPU. If it's only pulling 20%, it'll take four times as long. With a multi-stage pipe like this, you can easily see which stage is the bottleneck. Adam -- Adam Rice -- wysiwygat_private -- Blackburn, Lancashire, England _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Thu Aug 29 2002 - 13:39:29 PDT