If you are feeling ambitious, why not give the option to the user. Take the suggestions you receive from this list to determine the default behavior of the application, and then give the user the option of changing that behavior if desired. In my opinion, one of the largest benefits to using open source software is its flexibility. Matt Bergen Lead Information Security Officer Wyoming Department of Employment >>> "Simson L. Garfinkel" <simsongat_private> 05/22/03 09:27AM >>> Paul, Here are some issues you may not have considered: > > Issue 1: > I think it is advisable to limit the indexed character range to only > alphanumeric characters instead of the current limitation of all > printable ASCII characters. If you limit to printable ASCII characters, there will be problems for people outside the US (or people working with data outside the US). You need to be able to handle roman characters with accents. These are normally represented with high-bits. If the user searches for an e, they probably want to match on è and é and possibly other e's as well. Then you have the issue of Arabic, Hebrew, and 16-bit characters. At a minimum, I think that you should transparently handle codepages and coerce them into 7-bit ASCII. But ideally you should handle UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic. > > Issue 2: > Human readability of the files. A speedup in the indexed searching > process and a redeuction of the size of the used files can be > accomplished by changing the format of the index files. The > consequence is that these cannot be read by a human anymore (No more > text-format file). The consequences are the following: > - POSITIVE: Speed of searches is increased > - POSITIVE: Size of used files is reduces > - NEGATIVE: Files cannot be checked anymore with the human eye. I do not think that this is important. The index files should be in binary; create a tool to browse or view them. ----------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com ----------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com
This archive was generated by hypermail 2b30 : Thu May 22 2003 - 12:23:22 PDT