Tutorial :Using Regex in Yahoo pipes to “clean” RSS feeds


Need some help in creating a Yahoo Pipe that strips certain elements from an rss feed. To clerify: I would use the regex code on Yahoo Pipes. I presume the regex syntax is universal?

I've broken the question up to some sub-questions:

  1. What would be the regex for removing/striping a specific html tag (has its own class)? Content

  2. How can I strip links from linked images but keep image markup?

  3. How can I add sequential classes to all links found in a feed item? If there are 5 links in a single feed item, they would be given classes: link001, link002, link003, link004, link005...

Due to new account limitation code examples can be found here: Using Regex in Yahoo pipes

Regex is not exactly my forte... so any help would be greatly appreciated! Thanks a lot!


Regular expression syntax certainly isn't universal. See my regex flavor comparison. Unfortunately the Yahoo Pipes docs don't say what regex flavor they use. The examples look like Perl-style regexes, so that's what I'll use.

To remove a specific HTML tag (e.g. span) with a specific class attribute (e.g. someclass), search for:


and replace with:


The above regex will fail if the span tag you're trying to remove contains a nested span tag.

To delete any a tag that has an img tag as the first thing in its content, search for:


and replace with:


The third item in your question cannot be done with regular expressions alone. You'll need a facility to increment the number in the replacement. I don't know if Yahoo Pipes supports something like that. You don't really need a regex. Simply search for the text <a and replace with <a class="link001"

Of course, all the caveats about manipulating HTML/XML with regular expressions apply. The regexes work on the examples you gave, but they may not work as intended on every possible piece of HTML.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »