RE: Future of indexing in Autopsy and Sleuthkit

From: Paul Bakker (bakker@fox-it.com)
Date: Fri May 23 2003 - 01:02:52 PDT

  • Next message: Paul Bakker: "RE: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"

    Hi Simson,
    
    Thanks for the response
    
    > If you limit to printable ASCII characters, there will be 
    > problems for 
    > people outside the US (or people working with data outside 
    > the US). You 
    > need to be able to handle roman characters with accents. These are 
    > normally represented with high-bits. If the user searches for an e, 
    > they probably want to match on è and é and possibly other e's as well.
    > 
    > Then you have the issue of Arabic, Hebrew, and 16-bit characters.
    > 
    > At a minimum, I think that you should transparently handle codepages 
    > and coerce them into 7-bit ASCII. But ideally you should handle 
    > UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic.
    
    OK.. The problem with indexed searching is that you have to have a limited
    set of characters to search for. Otherwise it's not possible to generate
    an index file. The size of the index file grows exponentially with the size
    of the character set.
    
    That said I will possibly add the diacritic ASCII characters, but Unicode contains
    way to much characters. Therefore Unicode poses a problem....
    
    If anyone can suggest a fix/solution I would greatly appreciate that!
    
    I'm still thinking about a better solution.
    
    
    --
    Paul Bakker
    
    Fox-IT Experts in IT Security!
    Haagweg 137 
    2281 AG RIJSWIJK 
    T 070 336 9999 
    F 070 336 9990 
    I www.fox-it.com 
    E bakker@fox-it.com
    57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225
    
    Disclaimer: This email may contain confidential information. If this message is not addressed to you, you may not retain or use the information in it for any purpose. If you have received it in error, please notify the sender and delete this message. We try to screen out viruses but take no responsibility if this email contains a virus.
    
    -----------------------------------------------------------------
    This list is provided by the SecurityFocus ARIS analyzer service.
    For more information on this free incident handling, management 
    and tracking system please see: http://aris.securityfocus.com
    



    This archive was generated by hypermail 2b30 : Fri May 23 2003 - 08:40:57 PDT