Re: [logs] Faster unix 'sort' replacement?

Previous message: Marcus J. Ranum: "Re: [logs] Faster unix 'sort' replacement?"
In reply to: Mike Blomgren: "[logs] Faster unix 'sort' replacement?"
Next in thread: Mike Blomgren: "RE: [logs] Faster unix 'sort' replacement?"
Reply: Mike Blomgren: "RE: [logs] Faster unix 'sort' replacement?"
Reply: cadams@private: "Re: [logs] Faster unix 'sort' replacement?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

schmolli@private

On Thu, Sep 16, 2004 at 12:33:12AM +0200, Mike Blomgren wrote:
> I'm having trouble with 'sort' taking alot of cpu-time on a Solaris machine,
> and I'm wondering if anyone knows of a replacement for the gnu 'sort'
> command, which is faster and will compile on Solaris and preferably Linux
> too?
> 
> I'm using sort in the standard 'cat <file> | awk '{"compute..."}' | sort |
> uniq -c | sort -n -r' type analysis.

You can get rid of the multiple sorts/uniq thing by doing it all at
once:

--- CUT HERE ---
#!/usr/bin/perl -wT

use strict;

my %msg = ();

while (<>) { chomp; $msg{$_} = $msg{$_} ? $msg{$_} + 1 : 1; }

for(sort { $msg{$a} <=> $msg{$b} } keys %msg) { print "$msg{$_}\t$_\n"; }
--- CUT HERE ---

I've found that for my datasets, the awk/sed stage is what constitues
the bulk of the bottleneck.  You may want to look at optimizing that
part as well.

-- 
Ed Schmollinger - schmolli@private