On Thu, Sep 16, 2004 at 12:33:12AM +0200, Mike Blomgren wrote: > I'm having trouble with 'sort' taking alot of cpu-time on a Solaris machine, > and I'm wondering if anyone knows of a replacement for the gnu 'sort' > command, which is faster and will compile on Solaris and preferably Linux > too? > > I'm using sort in the standard 'cat <file> | awk '{"compute..."}' | sort | > uniq -c | sort -n -r' type analysis. You can get rid of the multiple sorts/uniq thing by doing it all at once: --- CUT HERE --- #!/usr/bin/perl -wT use strict; my %msg = (); while (<>) { chomp; $msg{$_} = $msg{$_} ? $msg{$_} + 1 : 1; } for(sort { $msg{$a} <=> $msg{$b} } keys %msg) { print "$msg{$_}\t$_\n"; } --- CUT HERE --- I've found that for my datasets, the awk/sed stage is what constitues the bulk of the bottleneck. You may want to look at optimizing that part as well. -- Ed Schmollinger - schmolli@private
_______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Thu Sep 16 2004 - 09:17:35 PDT