Tutorial :how to get distinct records in datatable?



Question:

I am using C# + VS2008 + .Net + ASP.Net + IIS 7.0 + ADO.Net + SQL Server 2008. I have a ADO.Net datatable object, and I want to filter out duplicate/similar records (in my specific rule to judge whether records are duplicate/similar -- if record/row has the same value for a string column, I will treat them as duplicate/similar records), and only keep one of such duplicate/similar records.

The output needs to be a datatable, may output the same datatable object if filter operation could be operated on the same datatable object.

What is the most efficient solution?


Solution:1

Are you using .NET 3.5? If you cast your data rows, you can use LINQ to Objects:

var distinctRows = table.Rows.Cast<DataRow>().Distinct(new E());    ...    public class E : IEqualityComparer<DataRow>  {      bool IEqualityComparer<DataRow>.Equals(DataRow x, DataRow y)      {          return x["colA"] == y["colA"];      }        int IEqualityComparer<DataRow>.GetHashCode(DataRow obj)      {          return obj["colA"].GetHashCode();      }  }  

Or an even simpler way, since you're basing it on a single column's values:

var distinct = from r in table.Rows.Cast<DataRow>()                 group r by (string)r["colA"] into g                 select g.First();  

If you need to make a new DataTable out of these distinct rows, you can do this:

var t2 = new DataTable();  t2.Columns.AddRange(table.Columns.Cast<DataColumn>().ToArray());  foreach(var r in distinct)  {      t2.Rows.Add(r);  }  

Or if it would be more handy to work with business objects, you can do an easy conversion:

var persons = (from r in distinct                 select new PersonInfo                 {                     EmpId = (string)r["colA"],                     FirstName = (string)r["colB"],                     LastName = (string)r["colC"],                 }).ToList();    ...    public class PersonInfo  {      public string EmpId {get;set;}      public string FirstName {get;set;}      public string LastName {get;set;}  }  

Update

Everything you can do in LINQ to Objects can also be done without it: it just takes more code. For example:

var table = new DataTable();  var rowSet = new HashSet<DataRow>(new E());  var newTable = new DataTable();  foreach(DataColumn column in table.Columns)  {      newTable.Columns.Add(column);  }  foreach(DataRow row in table.Rows)  {      if(!rowSet.Contains(row))      {          rowSet.Add(row);          newTable.Rows.Add(row);      }  }  

You could also use a similar strategy to simply remove duplicate rows from the original table instead of creating a new table.


Solution:2

You can do a select into with a group by clause, so not duplicates are created. Then drop the old table and rename the table into which you selected to the original table name.


Solution:3

I would do this in the database layer:

SELECT Distinct...  FROM MyTable  

Or if you need aggregates:

SELECT SUM(Field1), ID FROM MyTable  GROUP BY ID  

Put the SELECT statement in a stored procedure. Then in .net make a connection to the database, call the stored procedure, execute .ExecuteNonQuery(). Return the rows in a datatable and return the datatable back to your UI.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »