Re: [logs] Faster unix 'sort' replacement?

From: Ed Schmollinger (schmolli@private)
Date: Thu Sep 16 2004 - 09:14:32 PDT


On Thu, Sep 16, 2004 at 12:33:12AM +0200, Mike Blomgren wrote:
> I'm having trouble with 'sort' taking alot of cpu-time on a Solaris machine,
> and I'm wondering if anyone knows of a replacement for the gnu 'sort'
> command, which is faster and will compile on Solaris and preferably Linux
> too?
> 
> I'm using sort in the standard 'cat <file> | awk '{"compute..."}' | sort |
> uniq -c | sort -n -r' type analysis.

You can get rid of the multiple sorts/uniq thing by doing it all at
once:

--- CUT HERE ---
#!/usr/bin/perl -wT

use strict;

my %msg = ();

while (<>) { chomp; $msg{$_} = $msg{$_} ? $msg{$_} + 1 : 1; }

for(sort { $msg{$a} <=> $msg{$b} } keys %msg) { print "$msg{$_}\t$_\n"; }
--- CUT HERE ---

I've found that for my datasets, the awk/sed stage is what constitues
the bulk of the bottleneck.  You may want to look at optimizing that
part as well.

-- 
Ed Schmollinger - schmolli@private



_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Thu Sep 16 2004 - 09:17:35 PDT