Tutorial :Need Help to make this regex for my whiteList



Question:

I am using a rich html editor and I want to make a whitelist of the stuff that should be allowed in.

I heard that you should use a whitelist instead of black list since it is easier to do then trying to then making a blacklist.

I even seen some examples where people could hide the script tag in a css style part.

So this is a sample of what the editor generates

<span _moz_dirty="" style="font-weight: bold;">  aaaaaaaaaaaa  <br _moz_dirty=""/>  ffffffffffff  <br _moz_dirty=""/>  <span _moz_dirty="" style="text-decoration: underline;">  fffffffff  <br _moz_dirty=""/>  </span>  <span _moz_dirty="" style="text-decoration: line-through;">  aaaaaaaaaa  <br _moz_dirty=""/>  <sub _moz_dirty="">  </sub>  <sup _moz_dirty="">ggg</sup>  <sub _moz_dirty="">  </sub>  </span>  </span>  <ol _moz_dirty="">  <li _moz_dirty="">1333</li>  <li _moz_dirty="">ff</li>  </ol>  <ul _moz_dirty="">  <li _moz_dirty="">ggg</li>  <li _moz_dirty="">ff</li>  </ul>  <div _moz_dirty="" style="margin-left: 40px;">  ffffff  <br _moz_dirty=""/>  </div>  fff  <br _moz_dirty=""/>  <br _moz_dirty=""/>  <a _moz_dirty="" href="http://">ffff</a>  <br _moz_dirty="" type="_moz"/>  <span _moz_dirty="" style="font-weight: bold;">  <span _moz_dirty="" style="text-decoration: underline;"/>  </span>  

So I guess a my white list would be allows these tags with the right class names

<span>  style - font-weight: bold, text-decoration: underline, margin-left, margin-right  <br />  <a>  <ol>  <ul>  <li>  

So I am trying to make a regex that I can pop into my C# code to check for only these tags.

So I tried to start with the style stuff

style="[^font\-style|weight]+\s*:\s*[bold|italic]+\s*;\s*"  

but it does not work. I tried to change things around from the sample I gave you but nothing shows up.


Solution:1

You are using square brackets, which create a character class; you should instead use parenthesis to indicate an alternative, i.e.

font-(style|weight)  

The + is redundant (you don't want one or more, right?).
I think your regexs should be something like

Regex regex = new Regex(@"font-(style|weight)\s*:\s*(bold|italic)\s*;\s*");  

Another thing: '^' indicates beginning of line/string, so you should remove it.


Solution:2

Siilar to this question: How do I filter all HTML tags except a certain whitelist?


Solution:3

escape your backslashes?


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »