Re: [logs] Log Samples Requested

From: Raffael Marty (rmarty@private)
Date: Fri Mar 12 2004 - 13:28:45 PST

  • Next message: Marcus J. Ranum: "Re: [logs] Log Samples Requested"

    > i've repeatedly pondered [1] an anonymizer strong enough to convince
    > people to donate their logs, but weak enough to enable meaningful
    > analysis on the converted data.  it can't just hash unique words, 'cause
    > word similarity must be preserved.  character-by-character hashing isn't
    > strong enough (right?)...  cryptography is too strong (i would think
    > so...)
    I couldn't agree more! There is some research being done on this.
    Unfortunately I don't have any pointers ready. Anyone?
    > becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh,
    > m->phar.  the words would of course be generated in per-run pseudorandom
    > fashion.  with the hash, it can be reverse converted (and thus, so could
    > the analysis results for review), but without it:
    >  - is it sufficiently obfuscated that people would feel free to share
    >    their logs?  
    >  - would this preserve enough of the original data characteristics to
    >    make analysis meaningful? 
    No! If you preserve the spaces, nothing easier than a frequency analysis
    on your words. There is so much redundancy (structure) in the logs that
    reverse-engineering your approach would be very easy. 
    > i do (only) time analysis numerically, so the above wouldn't be
    > acceptable for the timestamps (it'd loose the numeric properties i 
    > need - how about just converting timestamps to seconds since log
    > start?).  but some people probably treat various msg elements
    > numerically...
    You don't need the time-stamps. Just exchange them with the time format:
    YYYY-MM-DD hh:mm:ss:mmmmm and if you care about sequentiality, just keep
    the logs in order! This is at least true if you only care about parsing
    the messages. And you could even do the same with all the other fields.
    I don't think the idea of this approach is to share attack data, but
    just log formats. So let's come up with a meta-language or even better a
    little tool that converts the logs into some kind of a meta-language.
    Nothing fancy, but analogous to the time-format example I just gave.
    That would help a lot!
    Raffael Marty, CISSP                          raffael.marty@private
    Senior Security Engineer                    Content Team @ ArcSight Inc.
    1309 South Mary Ave.         Sunnyvale, CA 94087          (408) 328 5562
    DISCLAIMER: Raffy's opinions are not necessarily ArcSight's policy.
    LogAnalysis mailing list

    This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 13:32:16 PST