On Thu, Sep 16, 2004 at 12:33:12AM +0200, Mike Blomgren wrote:
> I'm having trouble with 'sort' taking alot of cpu-time on a Solaris machine,
> and I'm wondering if anyone knows of a replacement for the gnu 'sort'
> command, which is faster and will compile on Solaris and preferably Linux
> too?
>
> I'm using sort in the standard 'cat <file> | awk '{"compute..."}' | sort |
> uniq -c | sort -n -r' type analysis.
You can get rid of the multiple sorts/uniq thing by doing it all at
once:
--- CUT HERE ---
#!/usr/bin/perl -wT
use strict;
my %msg = ();
while (<>) { chomp; $msg{$_} = $msg{$_} ? $msg{$_} + 1 : 1; }
for(sort { $msg{$a} <=> $msg{$b} } keys %msg) { print "$msg{$_}\t$_\n"; }
--- CUT HERE ---
I've found that for my datasets, the awk/sed stage is what constitues
the bulk of the bottleneck. You may want to look at optimizing that
part as well.
--
Ed Schmollinger - schmolli@private
_______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Thu Sep 16 2004 - 09:17:35 PDT