Marc Slemko wrote: > Also note that filtering or encoding things is not as easy as you may > think. There are far too many very annoying things, including characterset > issues and browser specific extensions. It is if you only accept ASCII/ISO-8859-1(or another defined character class) with some simple markup extensions. The markup extension could be a small strict subset of HTML, or a completely different one. I do not understand why everyone claims that sanitizing HTML content is that hard. For most applications where it is needed, the fancy features of HTML simply isn't needed. If your are reading email, then it does not matter much if the layout does not match to 100% of what the original author intended, as long as the information content is properly presented and you know that you safely can view the content. For the case of publishing information on a shared web site using strict HTML filterin is also beneficiable as it forces all authors to use a common HTML dialect, guaranteed not to disturb the site enforced layout or presentation, and helps keeping the information authors on track for providing the information rather than fiddling around to much in layout or presentation details. If you question the validity this approach to information processing, take a visit to your closest larger news paper and study the flow of information there. You need to take separate views on information and layout. The two are quite separate from each other. Defining a strict syntax for information isn't hard, doing so for HTML layout not using pre-defined style-sheets is a tricky issue. -- Henrik Nordstrom
This archive was generated by hypermail 2b30 : Fri Apr 13 2001 - 15:33:41 PDT