
Question:
I am using C# + VS2008 + .Net + ASP.Net + IIS 7.0 + ADO.Net + SQL Server 2008. I have a ADO.Net datatable object, and I want to filter out duplicate/similar records (in my specific rule to judge whether records are duplicate/similar -- if record/row has the same value for a string column, I will treat them as duplicate/similar records), and only keep one of such duplicate/similar records.
The output needs to be a datatable, may output the same datatable object if filter operation could be operated on the same datatable object.
What is the most efficient solution?
Solution:1
Are you using .NET 3.5? If you cast your data rows, you can use LINQ to Objects:
var distinctRows = table.Rows.Cast<DataRow>().Distinct(new E()); ... public class E : IEqualityComparer<DataRow> { bool IEqualityComparer<DataRow>.Equals(DataRow x, DataRow y) { return x["colA"] == y["colA"]; } int IEqualityComparer<DataRow>.GetHashCode(DataRow obj) { return obj["colA"].GetHashCode(); } }
Or an even simpler way, since you're basing it on a single column's values:
var distinct = from r in table.Rows.Cast<DataRow>() group r by (string)r["colA"] into g select g.First();
If you need to make a new DataTable out of these distinct rows, you can do this:
var t2 = new DataTable(); t2.Columns.AddRange(table.Columns.Cast<DataColumn>().ToArray()); foreach(var r in distinct) { t2.Rows.Add(r); }
Or if it would be more handy to work with business objects, you can do an easy conversion:
var persons = (from r in distinct select new PersonInfo { EmpId = (string)r["colA"], FirstName = (string)r["colB"], LastName = (string)r["colC"], }).ToList(); ... public class PersonInfo { public string EmpId {get;set;} public string FirstName {get;set;} public string LastName {get;set;} }
Update
Everything you can do in LINQ to Objects can also be done without it: it just takes more code. For example:
var table = new DataTable(); var rowSet = new HashSet<DataRow>(new E()); var newTable = new DataTable(); foreach(DataColumn column in table.Columns) { newTable.Columns.Add(column); } foreach(DataRow row in table.Rows) { if(!rowSet.Contains(row)) { rowSet.Add(row); newTable.Rows.Add(row); } }
You could also use a similar strategy to simply remove duplicate rows from the original table instead of creating a new table.
Solution:2
You can do a select into
with a group by
clause, so not duplicates are created. Then drop the old table and rename the table into which you selected to the original table name.
Solution:3
I would do this in the database layer:
SELECT Distinct... FROM MyTable
Or if you need aggregates:
SELECT SUM(Field1), ID FROM MyTable GROUP BY ID
Put the SELECT statement in a stored procedure. Then in .net make a connection to the database, call the stored procedure, execute .ExecuteNonQuery(). Return the rows in a datatable and return the datatable back to your UI.
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon