On 11-Aug-98 Vitiello, Eric (BHS) wrote: >> [From an anti-mail-exploit-procmail-filter-perl-script (see >> http://www.wolfenet.com/~jhardin/procmail-security.html):] >> > s/<BODY\s+(([^">]+("(\\.|[^"])*")?)*)ONLOAD/<BODY $1 >> DEFANGED-ONLOAD/gi; >> >> This Pattern will catch lines like >> <body onload="badthings()"> >> converted to >> <BODY DEFANGED-ONLOAD="badthings()"> >> but not >> <body onload="badthings()" onload="badthings()"> >> converted to >> <BODY onload="badthings()" DEFANGED-ONLOAD="badthings()">] >> So one onload=... will stay and act. >> >> Also things like < body ... > wont be catched. I dont know if >> those are >> leading spaces are proper HTML, but even if not, one should >> not suppose >> every bad HTML to be rejected. > > The following can Fix all of that: > > s/<\s+BODY\s+((([^">]+("(\\.|[^"])*")?)*)ONLOAD)*?\s+/<BODY $1 > DEFANGED-ONLOAD/gi; Actually, I believe the RE that you are looking for is this: s/<\s*BODY\s+((([^">]+("(\\.|[^"])*")?)*)ONLOAD)*?\s*/<BODY $1 DEFANGED-ONLOAD/gi; The \s+ will only match one or more whitespaces, meaning that <body onload="badthings()" onload="badthings()"> would not be caught, becuase there are no spaces between < and body, but \s* will match zero or more whitespace characters. This will catch <body onload="badthings()" onload="badthings()"> and < body onload="badthings()" onload="badthings()" > --Alec--
This archive was generated by hypermail 2b30 : Fri Apr 13 2001 - 14:12:07 PDT