Tutorial :How would you compare large amounts of data from a database?


For example. 2 Datasources, the 1 database you are allowed to meddle with, the other you are not. The 2 datasources contains Person records and you need to perform the following functions.

  1. Find matching persons from both data sources
  2. Run business rules on these matched persons (this might be to check whether the person has received a salary increase and so forth
  3. Mark the orphan persons and persist the person's to a database.

The reason for the question is performance and ensuring that an optimal solutions is used. Looping through each and every matched person might be slow but doing it in code is definitely maintainable.

Note: There might be 40k Persons to run business rules on.


I'm working on an application that performs very similar actions. The only thing mine doesn't do is manipulate the information (it just reads the data).

What I did was to import the data from 1 source into a DataSet, and then, when it comes time to compare with the database, I just select from the database table where the unique identifier is the same as a value from my dataset (you could use LIKE in SQL as an alternative), and if anything is found, it returns values from the database.

So lets say I search for an ID Number (which should be unique to everybody). If the database has the ID Number I searched for in the table I searched in, then I'll get information about that person displayed on my form for me. If not, I can display an error message saying that no matches could be found.

Hope this helps.


If I were doing this in a SQL Server database, I would use SSIS to process the data and perform the change actions required or mark the people as potential matches for someone to confirm before changes are made. One reason why I woudl do this is that in differnt systems the names are likely not to match and doing a compare using fuzzy logic might be a better, more accurate way to find the mathces than using plain old SQL. For instance the same person might be John Jones in one system and Johnny Jones in the other.

It also helps with systems that are trying to match up to persst a table that stores the unique record identifier from each system, so that as changes are made, they can still remain matched. o when Mary Smith marries Johnny Jones and is changed to Mary Jones in one system, her information can still be matche to Mary Smith in the other.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »