Hi Simson, Thanks for the response > If you limit to printable ASCII characters, there will be > problems for > people outside the US (or people working with data outside > the US). You > need to be able to handle roman characters with accents. These are > normally represented with high-bits. If the user searches for an e, > they probably want to match on è and é and possibly other e's as well. > > Then you have the issue of Arabic, Hebrew, and 16-bit characters. > > At a minimum, I think that you should transparently handle codepages > and coerce them into 7-bit ASCII. But ideally you should handle > UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic. OK.. The problem with indexed searching is that you have to have a limited set of characters to search for. Otherwise it's not possible to generate an index file. The size of the index file grows exponentially with the size of the character set. That said I will possibly add the diacritic ASCII characters, but Unicode contains way to much characters. Therefore Unicode poses a problem.... If anyone can suggest a fix/solution I would greatly appreciate that! I'm still thinking about a better solution. -- Paul Bakker Fox-IT Experts in IT Security! Haagweg 137 2281 AG RIJSWIJK T 070 336 9999 F 070 336 9990 I www.fox-it.com E bakker@fox-it.com 57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225 Disclaimer: This email may contain confidential information. If this message is not addressed to you, you may not retain or use the information in it for any purpose. If you have received it in error, please notify the sender and delete this message. We try to screen out viruses but take no responsibility if this email contains a virus. ----------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com
This archive was generated by hypermail 2b30 : Fri May 23 2003 - 08:40:57 PDT