Re: [logs] Log Samples Requested

From: Jon Stearley (jrstear@private)
Date: Fri Mar 12 2004 - 12:27:30 PST

  • Next message: Eric Fitzgerald: "RE: [logs] IIS and Windows Event log parser to generate reports"

    > Rainer Gerhards wrote:
    > >Having said this, on to my request: I would appreciate if the list
    > >members (you!) could send me a few lines of their actual syslog data.
    > Rainer - we've been trying to establish a log codex on
    > for some time. Getting log data is like pulling teeth. :) Please, people
    > if you have logs you are willing to share, send them to
    > as well.
    i've repeatedly pondered [1] an anonymizer strong enough to convince
    people to donate their logs, but weak enough to enable meaningful
    analysis on the converted data.  it can't just hash unique words, 'cause
    word similarity must be preserved.  character-by-character hashing isn't
    strong enough (right?)...  cryptography is too strong (i would think
    my current top idea is to hash characters to a unique word: ie "foo fum"
    becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh,
    m->phar.  the words would of course be generated in per-run pseudorandom
    fashion.  with the hash, it can be reverse converted (and thus, so could
    the analysis results for review), but without it:
     - is it sufficiently obfuscated that people would feel free to share
       their logs?  
     - would this preserve enough of the original data characteristics to
       make analysis meaningful? 
    for my analysis approach, the answer to the latter is "yes" (except that
    i require that whitespace be preserved, ie s/(\S+)/$1/g in perlspeak).
    for my logs, my answer to the former is "uhm, i think so..."  ;)
    i do (only) time analysis numerically, so the above wouldn't be
    acceptable for the timestamps (it'd loose the numeric properties i 
    need - how about just converting timestamps to seconds since log
    start?).  but some people probably treat various msg elements
    my sense is that someone will have to demonstrate a killer analysis
    before people will be sufficiently motivated to share their data.  ie,
    impress me with what you can do with your own data - then i'll let you
    try my data...
    my 2c
    | Jon Stearley                  (505) 845-7571  (FAX 845-7442) |
    | Sandia National Laboratories  Scalable Systems Integration   |
    LogAnalysis mailing list

    This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 13:11:28 PST