I can't actually tell whether this is a joke or not. In either case, it's funny :-) Crispin SPAM/PORN FILTER wrote: >This note has been flagged as > Likely PORN > Possibly SPAM > >The following test(s) were positive > the word(s) > 'penis' > followed by the phrase(s) > 'move', 'back and forth across', and 'forcing' > > Canadian Grammar/Phrasing/Spelling > >The results of the test(s) show that this is > 77% likely PORN (23/30) > 81% likely SPAM (27/33) > > >On Tue, 2003-06-03 at 13:39, Crispin Cowan wrote: > > >>Shaun Savage wrote: >> >> >> >>>It looks at raw text. The tokens are found using a fixed set of >>>delimiters. The reason for this is the mozilla spam filter uses the >>>html tags to help determine spam, alot of spam uses 'color' font. Also >>>~ one of the delimiters is '<' '>' so it can't determine what is a html >>>tag. >>> >>> >>Thanks! >> >>Unfortunate that it is only looking at raw text. There is valuable info >>in the formatted text, precisely because of this hack of splitting words >>with HTML comments, so that word-recognizing filters like Bayes won't >>recognize "pe<!-- interruption -->nis" as "penis". The spammer can move >>the interruption back and forth across the word, put arbitrarily clean >>text (e.g. from project Gutenberg) in the "interruption", forcing 10X >>training time on the Bayesian filter. >> >>Crispin >> >> > > > -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ Chief Scientist, Immunix http://immunix.com http://www.immunix.com/shop/
This archive was generated by hypermail 2b30 : Wed Jun 04 2003 - 02:17:23 PDT