Re: Future of indexing in Autopsy and Sleuthkit

From: Matt Bergen (MBERGEat_private)
Date: Thu May 22 2003 - 09:29:17 PDT

  • Next message: Matthew M. Shannon: "Re: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"

    If you are feeling ambitious, why not give the option to the user. Take
    the suggestions you receive from this list to determine the default
    behavior of the application, and then give the user the option of
    changing that behavior if desired. In my opinion, one of the largest
    benefits to using open source software is its flexibility.
    
    
    Matt Bergen
    Lead Information Security Officer
    Wyoming Department of Employment
    
    >>> "Simson L. Garfinkel" <simsongat_private> 05/22/03 09:27AM >>>
    Paul,
    
    Here are some issues you may not have considered:
    >
    > Issue 1:
    > I think it is advisable to limit the indexed character range to only
    
    > alphanumeric characters instead of the current limitation of all 
    > printable ASCII characters.
    
    If you limit to printable ASCII characters, there will be problems for
    
    people outside the US (or people working with data outside the US). You
    
    need to be able to handle roman characters with accents. These are 
    normally represented with high-bits. If the user searches for an e, 
    they probably want to match on è and é and possibly other e's as well.
    
    Then you have the issue of Arabic, Hebrew, and 16-bit characters.
    
    At a minimum, I think that you should transparently handle codepages 
    and coerce them into 7-bit ASCII. But ideally you should handle 
    UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic.
    >
    > Issue 2:
    > Human readability of the files. A speedup in the indexed searching 
    > process and a redeuction of the size of the used files can be 
    > accomplished by changing the format of the index files. The 
    > consequence is that these cannot be read by a human anymore (No more
    
    > text-format file). The consequences are the following:
    >  - POSITIVE: Speed of searches is increased
    >  - POSITIVE: Size of used files is reduces
    >  - NEGATIVE: Files cannot be checked anymore with the human eye.
    
    I do not think that this is important. The index files should be in 
    binary; create a tool to browse or view them.
    
    
    -----------------------------------------------------------------
    This list is provided by the SecurityFocus ARIS analyzer service.
    For more information on this free incident handling, management
    and tracking system please see: http://aris.securityfocus.com 
    
    
    -----------------------------------------------------------------
    This list is provided by the SecurityFocus ARIS analyzer service.
    For more information on this free incident handling, management 
    and tracking system please see: http://aris.securityfocus.com
    



    This archive was generated by hypermail 2b30 : Thu May 22 2003 - 12:23:22 PDT