Re: [logs] Log Samples Requested

From: todd glassey (todd.glassey@private)
Date: Fri Mar 12 2004 - 13:23:49 PST

  • Next message: Raffael Marty: "Re: [logs] Log Samples Requested"

    Why not cut some deal with someone operating a "Public" non-private system
    who would be willing to donate the Logs. There must be someone inside of DHS
    that can shake this loose...
    Todd Glassey
    ----- Original Message ----- 
    From: "Jon Stearley" <jrstear@private>
    To: "Marcus J. Ranum" <mjr@private>; "Rainer Gerhards"
    Cc: <loganalysis@private>
    Sent: Friday, March 12, 2004 12:27 PM
    Subject: Re: [logs] Log Samples Requested
    > > Rainer Gerhards wrote:
    > > >Having said this, on to my request: I would appreciate if the list
    > > >members (you!) could send me a few lines of their actual syslog data.
    > >
    > > Rainer - we've been trying to establish a log codex on
    > > for some time. Getting log data is like pulling teeth. :) Please, people
    > > if you have logs you are willing to share, send them to
    > > as well.
    > i've repeatedly pondered [1] an anonymizer strong enough to convince
    > people to donate their logs, but weak enough to enable meaningful
    > analysis on the converted data.  it can't just hash unique words, 'cause
    > word similarity must be preserved.  character-by-character hashing isn't
    > strong enough (right?)...  cryptography is too strong (i would think
    > so...)
    > my current top idea is to hash characters to a unique word: ie "foo fum"
    > becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh,
    > m->phar.  the words would of course be generated in per-run pseudorandom
    > fashion.  with the hash, it can be reverse converted (and thus, so could
    > the analysis results for review), but without it:
    >  - is it sufficiently obfuscated that people would feel free to share
    >    their logs?
    >  - would this preserve enough of the original data characteristics to
    >    make analysis meaningful?
    > for my analysis approach, the answer to the latter is "yes" (except that
    > i require that whitespace be preserved, ie s/(\S+)/$1/g in perlspeak).
    > for my logs, my answer to the former is "uhm, i think so..."  ;)
    > i do (only) time analysis numerically, so the above wouldn't be
    > acceptable for the timestamps (it'd loose the numeric properties i
    > need - how about just converting timestamps to seconds since log
    > start?).  but some people probably treat various msg elements
    > numerically...
    > my sense is that someone will have to demonstrate a killer analysis
    > before people will be sufficiently motivated to share their data.  ie,
    > impress me with what you can do with your own data - then i'll let you
    > try my data...
    > my 2c
    > -- 
    > +--------------------------------------------------------------+
    > | Jon Stearley                  (505) 845-7571  (FAX 845-7442) |
    > | Sandia National Laboratories  Scalable Systems Integration   |
    > +--------------------------------------------------------------+
    > [1]
    > _______________________________________________
    > LogAnalysis mailing list
    > LogAnalysis@private
    LogAnalysis mailing list

    This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 13:30:01 PST