Re: [logs] Faster unix 'sort' replacement?

Previous message: shawn reed: "[logs] firewall reporting method"
In reply to: cadams@private: "Re: [logs] Faster unix 'sort' replacement?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

schmolli@private

On Mon, Sep 20, 2004 at 12:02:39PM -0700, cadams@private wrote:
> On Thu, Sep 16, 2004 at 11:14:32AM -0500, Ed Schmollinger wrote:
> > On Thu, Sep 16, 2004 at 12:33:12AM +0200, Mike Blomgren wrote:
> > > I'm using sort in the standard 'cat <file> | awk '{"compute..."}' | sort |
> > > uniq -c | sort -n -r' type analysis.
> > 
> > You can get rid of the multiple sorts/uniq thing by doing it all at
> > once:
> 
> Or by using GNU sort's -u option, which after getting rid of the
> unnecessary use of cat leaves:
> 
> awk ... | sort -u -n -r 

This didn't seem to give correct output for me.  It only printed a
single line (the most frequent one?) and it didn't include the number of
times that the line appeared.

I was under the impression that what we are after is output that is
sorted by frequency of unique inputs.  For example, a log that looks
like:

--- CUT HERE ---
cat
dog
cat
horse
horse
cat
horse
cat
--- CUT HERE ---

Would turn into output that looks like:
--- CUT HERE ---
1 dog
3 horse
4 cat
--- CUT HERE ---

A simple condensing sort (sort -u) would give us:
--- CUT HERE ---
cat
dog
horse
--- CUT HERE ---

And 'sort -u -n -r' (this is GNU sort) just prints:
--- CUT HERE ---
cat
--- CUT HERE ---

Am I missing something?

-- 
Ed Schmollinger - schmolli@private