Indexed searching in Autopsy and Sleuthkit (Second release/version)

From: Paul Bakker (bakker@fox-it.com)
Date: Tue Aug 12 2003 - 00:25:40 PDT

  • Next message: Jon Bair: "Fw: Using dd.exe to make forensic images of NTFS drives"

     
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1
    
    Hello,
    
    I work at a company doing Forensic IT investigations in the Netherlands called Fox-IT (http://www.fox-it.com). We are working on an all-Linux environment for Forensic research.
    
    As the main Forensic tool we would like to use Autopsy/Sleuthkit. As it is missing some features in comparison to (commercial) Windows products, we've decided to contribute and add some new features to Autopsy and Sleuthkit. We're doing this in cooperation with Brian Carrier.
     
    One of the major missing features is indexed searching. Indexed searching greatly speeds up searches for words during investigations.
    
    In May 2003 we released a first implementation for indexed searching in Autopsy and Sleuthkit. This has resulted in a lot of feedback and feature requests.
    
    This e-mail announces the release of the second version of indexed searching in Autopsy and Sleuthkit.
    The patch can be downloaded from 
    http://www.fox-it.com/files/autopsy-indexing-2.patch.tar.gz
    (MD5 http://www.fox-it.com/files/autopsy-indexing-2.patch.tar.gz.md5)
    (MD5: 9889 52cf dcb3 a318 f3c8 9920 43b8 d6fb)
    
    This second version uses a different and better technique for indexing image files that has support for more advanced future options.
    
    The new version has the following improvements and features:
     * Tools for Indexed searching in sleuthkit.
     * Creation of necessary files integrated into Autopsy interface.
     * Indexed Search field (At the bottom of the "Keyword search" page).
     * Case insensitive searching.
     * Possibility to search for whole words only or parts of words.
     * No strings file necessary. Only the Image file is needed for indexing. The size for a normal combined index is about the same as a strings file for the same image. (This depends on the settings used for indexing).
     * Can be used to index image files of any size. (Indexing results in multiple small indexes).
     * Includes a tool to combine multiple index files of the same image.
     * The Autopsy interface is currently only useable for "small" images, because it will combine index files into a single index files thus taking a long time for very large images (> 20 Gb) Future version will add more flexibility here.
     * Support for different default index-character sets. This release lets you index using:
        - Alphabet [a-z,A-Z]
        - Alphanumeric [a-z,A-Z,0-9]
        - EMail and Alphanumeric [a-z,A-Z,0-9,.,_,-,@]
       The smaller the set, the smaller the index file.
     * Lots of flexibility for the index proces. (Specify the maximum memory usage, the minimum and maximum indexword length and more)
    
    The next version will include:
     * Folding (Mapping diacritic characters to their normal equivalent, allowing for more powerful searches.)
     * Default support for folding of the default ISO-8859-1 character set and perhaps for others too.
     * Better flexibility in the Autopsy interface.
     * Allows the use of index specification files. These files describe exactly what characters should indexed and how they should be folded. Thus allowing full control over the indexing process.
     * More documentation on the format used in the index file and the process involved.
    
    It has been tested on a Debian Linux system and on a number of forensic images.
    
    The following statistics have been gathered:
     * Index time. The index time is dependent on the index character set used, the minimum and maximum indexword size and the maximum memory that is available. Indexing a 5 Gb image with only 200 Mb of memory to use, using the Alphanumeric character set requires 74 minutes and results in 39 index files with a total size of 3.8 Gb.
     * Combine time. Multiple index files can be combined into a single index file. This decreases the size of the index file and increases the search speed. Combining requires about 33 minutes to combine 3.8 Gb of index files into a single 2.4 Gb index file (The strings file for the same image is 2.0 Gb).
     * Search time. The search time is dependent on the number of results that are returned. The more results, the longer the search as it has to access the original image file for every hit. The speedup for searching is very great.	Searches on a 5 Gb image file for a single word:
        - in less than 1 second (Resulting in 4935 hits), compared to 111 seconds using the regular grepping on the strings file.
        - in 66 seconds (Resulting in 366587 hits), compared to 111 seconds using the regular grepping on the strings file.
    
    The available patches are for Autopsy 1.72 and Sleuthkit 1.64. They add the second beta version of indexed searching to Autopsy.
     
    It is still in beta and therefore I would greatly appreciate it if people would test the indexed searching on other machines and images and send their problems, feedback and feature requests to me.
    
    All feedback is appreciated! My goal is to add useful features (like indexed searching) to Autopsy and Sleuthkit. This requires feedback! ;-)
    
    - --
    Paul Bakker
    
    Fox-IT Experts in IT Security!
    Haagweg 137 
    2281 AG RIJSWIJK 
    T 070 336 9999 
    F 070 336 9990 
    I www.fox-it.com 
    E bakker@fox-it.com
    57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225
    
    Disclaimer: This email may contain confidential information. If this message is not addressed to you, you may not retain or use the information in it for any purpose. If you have received it in error, please notify the sender and delete this message. We try to screen out viruses but take no responsibility if this email contains a virus. 
    
    -----BEGIN PGP SIGNATURE-----
    Version: PGP 8.0
    
    iQA/AwUBPziWdPjAwPuBNeIlEQIZMwCffRo3mOR3jEiXvi7snbPQkVjygscAoO39
    2kv3/Lq4ddSMliqhX4gbXdPd
    =2cRe
    -----END PGP SIGNATURE-----
    
    
    -----------------------------------------------------------------
    This list is provided by the SecurityFocus ARIS analyzer service.
    For more information on this free incident handling, management 
    and tracking system please see: http://aris.securityfocus.com
    



    This archive was generated by hypermail 2b30 : Tue Aug 12 2003 - 07:04:06 PDT