Tutorial :Linq to SQL vs Serialization



Question:

Say I have a few tables in the MSSQL database, each with about 5-10 attributes. There are some simple associations between the tables, but each of the table have 500,000 to 1,000,000 rows.

There is an algorithm that runs on that data (all of it), so before running the algorithm, I have to retrieve all the data from the database. The algorithm does not change the data, only reads it, so I just need to retrieve the data.

I am using LINQ to SQL. To retrieve all the data takes about two minutes. What I want to know is whether the serialization to file and then deserialization (when needed) would actually load the data faster.

The data is about 200 MB, and I don't mind saving it to disk. So, would it be faster if the objects were deserialized from the file or by using LINQ 2 SQL DataContext?

Any experiences with this?


Solution:1

I would argue that LINQtoSQL may not be the best choice for this kind of application. When you are talking about so many objects, you incur quite some overhead creating object instances (your persistent classes).

I would choose a solution where a stored procedure retrieves only the necessary data via ADO.NET, the application stores it in memory (memory is cheap nowadays, 200MB should not be a problem) and the analyzing algorithm is run on the in-memory data.

I don't think you should store the data on file. In the end, your database is also simply one or more files that are read by the database engine. So you either

  • let the database engine read your data and you analyze it, or
  • let the database engine read your data, you write it to file, you read the file (reading the same data again, but now you do it yourself) and you analyze the data

The latter option involves a lot of overhead without any advantages as far as I can see.

EDIT: If your data changes very infrequently, you may consider preprocessing your data before analyzing and caching the preprocessed data somewhere (in the database or on the file system). This only makes sense if your preprocessed data can be analyzed (a lot) faster than the raw data. Maybe some preprocessing can be done in the database itself.


Solution:2

You should try to use ADO.NET directly without the LINQ to SQL layer on top of it, i.e. using an SqlDataReader to read the data.

If you work sequentially with the data, you can get the records from the reader when you need them without having to read them all into memory first.


Solution:3

If you have a process that operates on most of the data in a database... then that sounds like a job for a stored procedure. It won't be object oriented, but it will be a lot faster and less brittle.


Solution:4

Since you are doing this in C# and your database is MsSql (since you use Linq to Sql), could you not run your code in a managed stored procedure? That would allow you to keep your current code as it is, but loading the data would be much faster since the code was running in the sql server.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »