Tutorial :Libs for HTML sanitizing



Question:

I'm looking for a html sanitizer which I can call per API to sanitise strings which I get from my webapp. Are there some useful easy to use libs available? Does anyone knows maybe one or two?

I don't need something big it just must be able to find unclosed tags and close them.


Solution:1

JTidy may help you.


Solution:2

https://github.com/OWASP/java-html-sanitizer is now marked ready for production use.

A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.

You can use prepackaged policies

Sanitizers.FORMATTING.and(Sanitizers.LINKS)  

or the tests show how you can configure your own easily:

new HtmlPolicyBuilder()      .allowElements("a")      .allowUrlProtocols("https")      .allowAttributes("href").onElements("a")      .requireRelNofollowOnLinks()  

or write custom policies to do things like changing h1s to divs with a certain class:

new HtmlPolicyBuilder()      .allowElements("h1", "p")      .allowElements(          new ElementPolicy() {            public String apply(String elementName, List<String> attrs) {              attrs.add("class");              attrs.add("header-" + elementName);              return "div";            }          }, "h1"))  


Solution:3

Apart from JTidy you can also take a look at:
Nekohtml
TagSoup
Getting text in HTmL document


Solution:4

The HTML Parser JSoup also supports sanitisation by policy: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer


Solution:5

http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »