Tutorial :C# Regular Expression to replace custom html tag



Question:

My application collects HTML content provided by internal users that is used to dynamically build articles on company web site.

I want to implement a feature whereby users can surround a word/phrase in the HTML content with a special tag called <search>....</search> and when the content is saved in the database, the application will convert <search>WORD/PHRASE</search> to say www.google.com/?q=WORD/PHRASE after encoding the word or phrase.

I think regular expressions can be used to achieve this functionality but need some guidance on how to go about it since there could be more than one <search>....</search> tag in the HTML content.

Any help with examples is appreciated.


Solution:1

Something like this should work:

string data = @"some text <search>search term 1</search> some more text <search>another search term</search>";  Console.WriteLine(Regex.Replace(data, @"(?:<search>)(.*?)(?:</search>)", @"<a href=""http://www.google.com/?q=$1"">$1</a>"));  


Solution:2

You should consider using an HTML DOM to parse the contents rather then regular expressions. Regexes meant to parse html are notorious for both being complicated and having unexpected bugs.


Solution:3

You might try

Regex.Replace(strMyHtmlInputString, "\<search\>(.+?)\<\/search\>", "www.google.com/?q=\1")  

The question mark in the first grouping means "group as little as possible to match this group".


Solution:4

Regular Expressions are bad at handing XML/HTML data. You're better off using a real HTML or XML reading API. Regular Expressions run into problems when you're dealing with HTML that has nested tags within it, for example.

If you're getting tag-soup HTML, which you most likely are, you won't be able to use .NET's native XmlDocument class without a lot of stress. You should look into the HtmlAgilityPack, which has an API exactly like the XmlDocument's, but it includes some HTML specific things such as cleaning up HTML to be well-formed.

This example uses the XmlDocument class, but using the HtmlAgilityPack's HtmlDocument should be very similar (only using an HtmlDocument instead of an XmlDocument). This should replace the <search /> tag with the link to Google.

XmlDocument doc = new XmlDocument();  doc.LoadXml(xml);  XmlNode searchTag = doc.SelectSingleNode("//search");  XmlElement linkTag = doc.CreateElement("a");  linkTag.InnerXml = searchTag.InnerXml;  linkTag.Attributes["href"].Value = "http://google.com/?q=" + linkTag.InnerText;  searchTag.ParentNode.ReplaceChild(searchTag, linkTag);  

Disclaimer: I have not tested this example code above, but it should work.


Solution:5

Should be pretty easy with greedy matching, assuming you can't nest search tags.

Replacing on

<search>(.*?)</search> is going to be key.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »