> Rainer Gerhards wrote: > >Having said this, on to my request: I would appreciate if the list > >members (you!) could send me a few lines of their actual syslog data. > > Rainer - we've been trying to establish a log codex on loganalysis.org > for some time. Getting log data is like pulling teeth. :) Please, people > if you have logs you are willing to share, send them to loganalysis.org > as well. i've repeatedly pondered [1] an anonymizer strong enough to convince people to donate their logs, but weak enough to enable meaningful analysis on the converted data. it can't just hash unique words, 'cause word similarity must be preserved. character-by-character hashing isn't strong enough (right?)... cryptography is too strong (i would think so...) my current top idea is to hash characters to a unique word: ie "foo fum" becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh, m->phar. the words would of course be generated in per-run pseudorandom fashion. with the hash, it can be reverse converted (and thus, so could the analysis results for review), but without it: - is it sufficiently obfuscated that people would feel free to share their logs? - would this preserve enough of the original data characteristics to make analysis meaningful? for my analysis approach, the answer to the latter is "yes" (except that i require that whitespace be preserved, ie s/(\S+)/$1/g in perlspeak). for my logs, my answer to the former is "uhm, i think so..." ;) i do (only) time analysis numerically, so the above wouldn't be acceptable for the timestamps (it'd loose the numeric properties i need - how about just converting timestamps to seconds since log start?). but some people probably treat various msg elements numerically... my sense is that someone will have to demonstrate a killer analysis before people will be sufficiently motivated to share their data. ie, impress me with what you can do with your own data - then i'll let you try my data... my 2c -- +--------------------------------------------------------------+ | Jon Stearley (505) 845-7571 (FAX 845-7442) | | Sandia National Laboratories Scalable Systems Integration | +--------------------------------------------------------------+ [1] http://www.securityfocus.com/archive/116/277024/2002-06-14/2002-06-20/2 _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 13:11:28 PST