Why not cut some deal with someone operating a "Public" non-private system who would be willing to donate the Logs. There must be someone inside of DHS that can shake this loose... Todd Glassey ----- Original Message ----- From: "Jon Stearley" <jrstear@private> To: "Marcus J. Ranum" <mjr@private>; "Rainer Gerhards" <rgerhards@private> Cc: <loganalysis@private> Sent: Friday, March 12, 2004 12:27 PM Subject: Re: [logs] Log Samples Requested > > Rainer Gerhards wrote: > > >Having said this, on to my request: I would appreciate if the list > > >members (you!) could send me a few lines of their actual syslog data. > > > > Rainer - we've been trying to establish a log codex on loganalysis.org > > for some time. Getting log data is like pulling teeth. :) Please, people > > if you have logs you are willing to share, send them to loganalysis.org > > as well. > > i've repeatedly pondered [1] an anonymizer strong enough to convince > people to donate their logs, but weak enough to enable meaningful > analysis on the converted data. it can't just hash unique words, 'cause > word similarity must be preserved. character-by-character hashing isn't > strong enough (right?)... cryptography is too strong (i would think > so...) > > my current top idea is to hash characters to a unique word: ie "foo fum" > becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh, > m->phar. the words would of course be generated in per-run pseudorandom > fashion. with the hash, it can be reverse converted (and thus, so could > the analysis results for review), but without it: > - is it sufficiently obfuscated that people would feel free to share > their logs? > - would this preserve enough of the original data characteristics to > make analysis meaningful? > > for my analysis approach, the answer to the latter is "yes" (except that > i require that whitespace be preserved, ie s/(\S+)/$1/g in perlspeak). > for my logs, my answer to the former is "uhm, i think so..." ;) > > i do (only) time analysis numerically, so the above wouldn't be > acceptable for the timestamps (it'd loose the numeric properties i > need - how about just converting timestamps to seconds since log > start?). but some people probably treat various msg elements > numerically... > > my sense is that someone will have to demonstrate a killer analysis > before people will be sufficiently motivated to share their data. ie, > impress me with what you can do with your own data - then i'll let you > try my data... > > my 2c > > -- > +--------------------------------------------------------------+ > | Jon Stearley (505) 845-7571 (FAX 845-7442) | > | Sandia National Laboratories Scalable Systems Integration | > +--------------------------------------------------------------+ > > [1] > http://www.securityfocus.com/archive/116/277024/2002-06-14/2002-06-20/2 > > _______________________________________________ > LogAnalysis mailing list > LogAnalysis@private > http://lists.shmoo.com/mailman/listinfo/loganalysis _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 13:30:01 PST