Tutorial :Regex for tree structures?


Are there regular expression equivalents for searching and modifying tree structures? Concise mini-languages (like perl regex) are what I am looking for.

Here is an example that might clarify what I am looking for.

<root>    <node name="1">      subtrees ....    </node>    <node name="2">      <node name="2.1">       data      </node>      other subtrees...    </node>  </root>  

An operation that would be possible on the above tree is "move subtree at node 2.1 into the subtree at node 1." The result of the operation might look something like..

<root>    <node name="1">      subtrees ....      <node name="2.1">       data      </node>    </node>    <node name="2">      other subtrees...    </node>  </root>  

Search and replace operations like find all nodes with atleast 2 children, find all nodes whose data starts with "a" and replace it with "b" if the subtrees have atleast 2 other siblings, etc. should be supported.

For strings, where the only dimension is across the length of the string, we can do many of above operations (or their 1D equivalents) using regular expressions. I wonder if there are equivalents for trees. (instead of a single regex, you might need to write a set of transformation rules, but that is ok).

I would like to know if there is some simple mini language (not regex per.se, but something that is as accessible as regex via libraries, etc..). to perform these operations? Preferably, as a python library.


TSurgeon and Tregex from Stanford is capable of doing that. You can download the library from http://nlp.stanford.edu/software/tregex.shtml


I don't know a general purpose langugae that can do that, but it seems to me that you are looking for something like XPath.


There is TXL for pattern based tree rewriting.

Tree rewriting with patterns is also done with parser toolkits such as ANTLR

Code generation with bottom up tree rewriting, google BURS or BURG.


Navigating through a binary search tree requires state (in which node I am?) and comparisons (is that value less or greater than that?), things that cannot be done by a finite state automaton.

Sure, you can search for the node with a given value, but how then you could, for example, remove a node that isn't a leaf if you don't know its parent?

And even if you know the parent via the information supplied by the node, how do you determine the minimum of the left subtree, remove it and place it in the node?

I think you're asking too much to FSA.


This article gives some tasty hints about recursive Perl regular expressions, but honestly it's rare to see a tree structure approached this way.

More typically, one would write a state machine style parser, that might use regexes to parse each particular node in the tree.

Expat is probably a good example to look at.


Pattern Matching, provided by languages such as Scala, F#, Erlang and Haskell (I'm sure there's more) is designed to succinctly manipulate data structures like trees, esp when used with recursion.

here is a very high level view of what pattren matching can do in Scala. The examples shown really don't do pattern matching justice.

Wikipedia has a couple of references to pattern matching, too. Here and here.


I'm somewhat surprised that XSLT hasn't come up as an answer. Granted, I don't think it's a particularly elegant language, and most existent solutions tend to favour procedural approaches rather than pattern matching, and it's gotten a mighty bad rep from being blindly applied just because it's XML being applied to XML -- but otherwise it fits the bill. Pity its canonical representation is so verbose, though...

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »