Tutorial :Expressing recursion in LINQ



Question:

I am writing a LINQ provider to a hierarchal data source. I find it easiest to design my API by writing examples showing how I want to use it, and then coding to support those use cases.

One thing I am having trouble with is an easy/reusable/elegant way to express "deep query" or recursion in a LINQ statement. In other words, what is the best way to distinguish between:

from item in immediate-descendants-of-current-node where ... select item  

versus:

from item in all-descendants-of-current-node where ... select item  

(Edit: please note neither of those examples above necessarily reflect the structure of the query I want. I am interested in any good way to express recursion/depth)

Please note I am not asking how to implement such a provider, or how to write my IQueryable or IEnumerable in such a way that allows recursion. I am asking from the standpoint of a person writing the LINQ query and utilizing my provider - what is an intuitive way for them to express whether they want to recurse or not?

The data structure resembles a typical file system: a folder can contain a collection of subfolders, and a folder can also contain a collection of items. So myFolder.Folders represents all the folders who are immediate children of myFolder, and myFolder.Items contains all the items immediately within myFolder. Here's a basic example of a site hierachy, much like a filesystem with folders and pages:

(F)Products      (F)Light Trucks          (F)Z150              (I)Pictures              (I)Specs              (I)Reviews          (F)Z250              (I)Pictures              (I)Specs              (I)Reviews          (F)Z350              (I)Pictures              (I)Specs              (I)Reviews          (I)Splash Page      (F)Heavy Trucks      (F)Consumer Vehicles      (I)Overview   

If I write:

from item in lightTrucks.Items where item.Title == "Pictures" select item  

What is the most intuitive way to express an intent that the query get all items underneath Light Trucks, or only the immediate ones? The least-intrusive, lowest-friction way to distinguish between the two intents?

My #1 goal is to be able to turn this LINQ provider over to other developers who have an average understanding of LINQ and allow them to write both recursive and list queries without giving them a tutorial on writing recursive lambdas. Given a usage that looks good, I can code the provider against that.

Additional clarification: (I am really sucking at communicating this!) - This LINQ provider is to an external system, it is not simply walking an object graph, nor in this specific case does a recursive expression actually translate into any kind of true recursive activity under the hood. Just need a way to distinguish between a "deep" query and a "shallow" one.

So, what do you think is the best way to express it? Or is there a standard way of expressing it that I've missed out on?


Solution:1

Linq-toXml handles this fine, there is an XElement.Elements()/.Nodes() operation to get immediate children, and a XElement.Descendents()/DescendentNodes() operations to get all descendents. Would you consider that as an example?

To summarize Linq-to-Xml's behavior... The navigation functions each correspond to an axis type in XPath (http://www.w3schools.com/xpath/xpath_axes.asp). If the navigation function selects Elements, the axis name is used. If the navigation function selects Nodes, the axis name is used with Node appended.

For instance, there are functions Descendants() and DescendantsNode() correspond to XPath's descendants axis, returning either an XElement or an XNode.

The exception case is not surprisingly the most used case, the children axis. In XPath, this is the axis used if no axis is specified. For this, the linq-to-xml navigation functions are not Children() and ChildrenNodes() but rather Elements() and Nodes().

XElement is a subtype of XNode. XNode's include things like HTML tags, but also HTML comments, cdata or text. XElements are a type of XNode, but refer specifically to HTML tags. XElements therefore have a tag name, and support the navigation functions.

Now its not as easy to chain navigations in Linq-to-XML as it is XPath. The problem is that navigation functions return collection objects, while the navigation functions are applied to non-collections. Consider the XPath expression which selects a table tag as an immediate child then any descendant table data tag. I think this would look like "./children::table/descendants::td" or "./table/descendants::td"

Using IEnumerable<>::SelectMany() allows one to call the navigation functions on a collection. The equivalent to the above looks something like .Elements("table").SelectMany(T => T.Descendants("td"))


Solution:2

Well, the first thing to note is that actually, lambda expressions can be recursive. No, honestly! It isn't easy to do, and certainly isn't easy to read - heck, most LINQ providers (except LINQ-to-Objects, which is much simpler) will have a coughing fit just looking at it... but it is possible. See here for the full, gory details (warning - brain-ache is likely).

However!! That probably won't help much... for a practical approach, I'd look at the way XElement etc does it... note you can remove some of the recursion using a Queue<T> or Stack<T>:

using System;  using System.Collections.Generic;    static class Program {      static void Main() {          Node a = new Node("a"), b = new Node("b") { Children = {a}},              c = new Node("c") { Children = {b}};          foreach (Node node in c.Descendents()) {              Console.WriteLine(node.Name);          }      }  }    class Node { // very simplified; no sanity checking etc      public string Name { get; private set; }      public List<Node> Children { get; private set; }      public Node(string name) {          Name = name;          Children = new List<Node>();      }  }  static class NodeExtensions {      public static IEnumerable<Node> Descendents(this Node node) {          if (node == null) throw new ArgumentNullException("node");          if(node.Children.Count > 0) {              foreach (Node child in node.Children) {                  yield return child;                  foreach (Node desc in Descendents(child)) {                      yield return desc;                  }              }          }      }  }  

An alternative would be to write something like SelectDeep (to mimic SelectMany for single levels):

public static class EnumerableExtensions  {      public static IEnumerable<T> SelectDeep<T>(          this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)      {          foreach (T item in source)          {              yield return item;              foreach (T subItem in SelectDeep(selector(item),selector))              {                  yield return subItem;              }          }      }  }  public static class NodeExtensions  {      public static IEnumerable<Node> Descendents(this Node node)      {          if (node == null) throw new ArgumentNullException("node");          return node.Children.SelectDeep(n => n.Children);      }  }  

Again, I haven't optimised this to avoid recursion, but it could be done easily enough.


Solution:3

I'd go with implementing it in such a way as to have control over how deeply I want to query as well.

Something like Descendants() would retrieve Descendants through all levels while Descendants(0) would retrieve immediate children, Descendants(1) would get children and grandchildren and so on...


Solution:4

I would just implement two functions to clearly differentiate between the two options (Children vs. FullDecendants), or an overload GetChildren(bool returnDecendants). Each can implement IEnumerable, so it would just be a matter of which function they pass into their LINQ statement.


Solution:5

You might want to implement a (extension) Method like FlattenRecusively for your type.

from item in list.FlattenRecusively() where ... select item  


Solution:6

Rex, you've certainly opened an interesting discussion, but you seem to have eliminated all possibilities - that is, you seem to reject both (1) having the consumer write recursive logic, and (2) having your node class expose relationships of greater than one degree.

Or, perhaps you haven't entirely ruled out (2). I can think of one more approach which is nearly as expressive as the GetDescendents method (or property), but might not be quite so 'ponderous' (depending on the shape of your tree)...

from item in AllItems where item.Parent == currentNode select item  

and

from item in AllItems where item.Ancestors.Contains(currentNode) select item  


Solution:7

I'd have to agree with Frank. have a look at how LINQ-to-XML handles these scenarios.

In fact, I'd emulate the LINQ-to-XML implementation entirely, but change it for any Data type. Why reinvent the wheel right?


Solution:8

Are you okay with doing the heavy lifting in your object? (it's not even that heavy)

using System;  using System.Collections;  using System.Collections.Generic;  using System.Linq;    namespace LinqRecursion  {      class Program      {          static void Main(string[] args)          {              Person mom = new Person() { Name = "Karen" };              Person me = new Person(mom) { Name = "Matt" };              Person youngerBrother = new Person(mom) { Name = "Robbie" };              Person olderBrother = new Person(mom) { Name = "Kevin" };              Person nephew1 = new Person(olderBrother) { Name = "Seth" };              Person nephew2 = new Person(olderBrother) { Name = "Bradon" };              Person olderSister = new Person(mom) { Name = "Michelle" };                Console.WriteLine("\tAll");              //        All              //Karen 0              //Matt 1              //Robbie 2              //Kevin 3              //Seth 4              //Bradon 5              //Michelle 6              foreach (var item in mom)                  Console.WriteLine(item);                Console.WriteLine("\r\n\tOdds");              //        Odds              //Matt 1              //Kevin 3              //Bradon 5              var odds = mom.Where(p => p.ID % 2 == 1);              foreach (var item in odds)                  Console.WriteLine(item);                Console.WriteLine("\r\n\tEvens");              //        Evens              //Karen 0              //Robbie 2              //Seth 4              //Michelle 6              var evens = mom.Where(p => p.ID % 2 == 0);              foreach (var item in evens)                  Console.WriteLine(item);                Console.ReadLine();            }      }        public class Person : IEnumerable<Person>      {          private static int _idRoot;            public Person() {              _id = _idRoot++;          }            public Person(Person parent) : this()          {              Parent = parent;              parent.Children.Add(this);          }            private int _id;          public int ID { get { return _id; } }          public string Name { get; set; }            public Person Parent { get; private set; }            private List<Person> _children;          public List<Person> Children          {              get              {                  if (_children == null)                      _children = new List<Person>();                  return _children;              }          }            public override string ToString()          {              return Name + " " + _id.ToString();          }            #region IEnumerable<Person> Members            public IEnumerator<Person> GetEnumerator()          {              yield return this;              foreach (var child in this.Children)                  foreach (var item in child)                      yield return item;          }            





        
Previous
Next Post »