
Question:
With the following data
string[] data = { "a", "a", "b" };
I'd very much like to find duplicates and get this result:
a
I tried the following code
var a = data.Distinct().ToList(); var b = a.Except(a).ToList();
obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.
Solution:1
When runtime is no problem, you could use
var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();
Good old O(n^n) =)
Edit: Now for a better solution. =) If you define a new extension method like
static class Extensions { public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input) { HashSet<T> hash = new HashSet<T>(); foreach (T item in input) { if (!hash.Contains(item)) { hash.Add(item); } else { yield return item; } } } }
you can use
var duplicates = data.Duplicates().Distinct().ToArray();
Solution:2
Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.
from g in (from x in data group x by x) where g.Count() > 1 select g.Key;
--OR if you prefer extension methods
data.GroupBy(x => x) .Where(x => x.Count() > 1) .Select(x => x.Key)
Where Count() == 1
that's your distinct items and where Count() > 1
that's one or more duplicate items.
Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:
var g = (from x in data group x by x).ToList(); // grouping result // duplicates from x in g where x.Count() > 1 select x.Key; // distinct from x in g where x.Count() == 1 select x.Key;
When creating the grouping a set of sets will be created. Assuming that it's a set with O(1)
insertion the running time of the group by approach is O(n)
. The incurred cost for each operation is somewhat high, but it should equate to near linear performance.
Solution:3
Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon