RE: [logs] Log Samples Requested

From: Safier, Adam * (Safier@private)
Date: Fri Mar 12 2004 - 16:22:05 PST

  • Next message: Jason Haar: "Re: [logs] Log Samples Requested"

    This could be a fun project for someone who still codes (not me). Also, I
    suspect this exists out there but since it has not been mentioned on the
    list here goes.
    How about a search and replace tool that helps the user replace key
    identifying data?  
    It should prompt for things that might be replaced so you don't have to
    build a list ahead of time. A "keep unchanged" list would prevent prompting
    for some things (Tx, Rx, Port, Error, ..... or even certain columns or hex
    code in a column ... )
    For example, it would recognize:
    - IP addresses and prompt the user to "Change all IP addresses in the first
    column that are from the subnet to 10.10.x.x equivalents and
    scan other columns for replaced addresses and replace them (to retain
    matches)".  - recognize time stamps and prompt to "change date time stamps
    +/- hhhhhh:mm:ss"  (keeps time stamp relative).
    - Prompt to replace any word or hex combination not in the [user editable]
    "keep unchanged" list with a word of my choosing and pad or cut to match
    	Change "MyCompany" to:  [Hit Enter to leave unchanged]
    	I type: llama
    	output: llama0001
    Store a list of mapped values / rules to reduce prompting during future
    The program should be available on Windoze in a nice GUI with a browse
    option for filename inputs, UNIX as a command line item and maybe other
    platforms, like Apple.  Ease of use by novices or experts counts.  Output
    into a text file for user review.  Instruction on how to attach/cut&paste
    and e-mail to log-submission-loganalysis@private could pop up on
    screen at the end of the run.  Post on freeware sites as a search and
    replace tool and log anonymizer. Included info about the loganalysis list
    and need for log samples in the banner or sign off screen.
    BTW, does log analysis have to be only on syslogs?  How about output from
    applications (Oracle database log, binary logs, ...)?
    Still learning,
    -----Original Message-----
    From: Jon Stearley [mailto:jrstear@private]
    Sent: Friday, March 12, 2004 3:28 PM
    To: Marcus J. Ranum; Rainer Gerhards
    Cc: loganalysis@private
    Subject: Re: [logs] Log Samples Requested
    > Rainer Gerhards wrote:
    > >Having said this, on to my request: I would appreciate if the list
    > >members (you!) could send me a few lines of their actual syslog data.
    > Rainer - we've been trying to establish a log codex on
    > for some time. Getting log data is like pulling teeth. :) Please, people
    > if you have logs you are willing to share, send them to
    > as well.
    i've repeatedly pondered [1] an anonymizer strong enough to convince
    people to donate their logs, but weak enough to enable meaningful
    analysis on the converted data.  it can't just hash unique words, 'cause
    word similarity must be preserved.  character-by-character hashing isn't
    strong enough (right?)...  cryptography is too strong (i would think
    my current top idea is to hash characters to a unique word: ie "foo fum"
    becomes "foophlegmphlegm fooblehphar" when f->foo, o->phlegm, u->bleh,
    m->phar.  the words would of course be generated in per-run pseudorandom
    fashion.  with the hash, it can be reverse converted (and thus, so could
    the analysis results for review), but without it:
     - is it sufficiently obfuscated that people would feel free to share
       their logs?  
     - would this preserve enough of the original data characteristics to
       make analysis meaningful? 
    for my analysis approach, the answer to the latter is "yes" (except that
    i require that whitespace be preserved, ie s/(\S+)/$1/g in perlspeak).
    for my logs, my answer to the former is "uhm, i think so..."  ;)
    i do (only) time analysis numerically, so the above wouldn't be
    acceptable for the timestamps (it'd loose the numeric properties i 
    need - how about just converting timestamps to seconds since log
    start?).  but some people probably treat various msg elements
    my sense is that someone will have to demonstrate a killer analysis
    before people will be sufficiently motivated to share their data.  ie,
    impress me with what you can do with your own data - then i'll let you
    try my data...
    my 2c
    | Jon Stearley                  (505) 845-7571  (FAX 845-7442) |
    | Sandia National Laboratories  Scalable Systems Integration   |
    LogAnalysis mailing list
    LogAnalysis mailing list

    This archive was generated by hypermail 2b30 : Fri Mar 12 2004 - 18:55:04 PST