[logs] Anonymizing System Logs

From: Adam Oliner (oliner@private)
Date: Mon Jan 22 2007 - 12:47:01 PST


My colleagues and I have obtained access to a large collection of  
system logs from 5 major supercomputers, and are currently working to  
get them into a form such that they are suitable for public release.  
These are raw logs, aggregated, in some cases, from many log- 
generating components (Lustre, netwatch, eventlogs, syslog...). They  
represent, cumulatively, more than 775 million processor-hours.

The primary pieces of data that we are trying to anonymize are  
usernames, group names, pathnames, and IP/hostnames. So, we are  
looking for some input from the log analysis community.

1) Aside from some possibly-excessive pattern matching, can you  
suggest a good way of masking out this data from the unstructured  
message bodies?

2) Assuming that all such data was successfully removed, what other  
security concerns would you have? How might we address them?

We would greatly appreciate your help.

Sincerely,

  - Adam J. Oliner
    oliner@private
    Department of Computer Science
    Stanford University



_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Mon Jan 22 2007 - 22:13:14 PST