RE: Future of indexing in Autopsy and Sleuthkit

From: Paul Bakker (bakker@fox-it.com)
Date: Fri May 23 2003 - 01:02:52 PDT

Next message: Paul Bakker: "RE: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"

Previous message: Matthew M. Shannon: "Re: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"
Maybe in reply to: Paul Bakker: "Future of indexing in Autopsy and Sleuthkit"
Next in thread: Jesse Kornblum: "Re: Future of indexing in Autopsy and Sleuthkit"
Reply: Jesse Kornblum: "Re: Future of indexing in Autopsy and Sleuthkit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Simson,

Thanks for the response

> If you limit to printable ASCII characters, there will be 
> problems for 
> people outside the US (or people working with data outside 
> the US). You 
> need to be able to handle roman characters with accents. These are 
> normally represented with high-bits. If the user searches for an e, 
> they probably want to match on è and é and possibly other e's as well.
> 
> Then you have the issue of Arabic, Hebrew, and 16-bit characters.
> 
> At a minimum, I think that you should transparently handle codepages 
> and coerce them into 7-bit ASCII. But ideally you should handle 
> UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic.

OK.. The problem with indexed searching is that you have to have a limited
set of characters to search for. Otherwise it's not possible to generate
an index file. The size of the index file grows exponentially with the size
of the character set.

That said I will possibly add the diacritic ASCII characters, but Unicode contains
way to much characters. Therefore Unicode poses a problem....

If anyone can suggest a fix/solution I would greatly appreciate that!

I'm still thinking about a better solution.


--
Paul Bakker

Fox-IT Experts in IT Security!
Haagweg 137 
2281 AG RIJSWIJK 
T 070 336 9999 
F 070 336 9990 
I www.fox-it.com 
E bakker@fox-it.com
57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225

Disclaimer: This email may contain confidential information. If this message is not addressed to you, you may not retain or use the information in it for any purpose. If you have received it in error, please notify the sender and delete this message. We try to screen out viruses but take no responsibility if this email contains a virus.

-----------------------------------------------------------------
This list is provided by the SecurityFocus ARIS analyzer service.
For more information on this free incident handling, management 
and tracking system please see: http://aris.securityfocus.com

Next message: Paul Bakker: "RE: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"
Previous message: Matthew M. Shannon: "Re: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit"
Maybe in reply to: Paul Bakker: "Future of indexing in Autopsy and Sleuthkit"
Next in thread: Jesse Kornblum: "Re: Future of indexing in Autopsy and Sleuthkit"
Reply: Jesse Kornblum: "Re: Future of indexing in Autopsy and Sleuthkit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Fri May 23 2003 - 08:40:57 PDT