Tutorial :How many rows of data is too many rows of data?



Question:

Is there some hard and fast rule about how big is too big for a SQL table?

We are storing SCORM tracking data in a name/value pair format and there could be anywhere from 4-12 rows per user per course, down the road is this going to be a bad thing since there are hundreds of courses and thousands of users?


Solution:1

I personally have had tables in production with 50 million rows, and this is small compared with I have heard. You might need to optimize your structure with partitioning but until you test your system in your environment you shouldn't waste time doing that. What you described is pretty small IMHO

I should add I was using SQL Server 2000 & 2005, each DBMS has its own sizing limitations.


Solution:2

The magic number is billions. Until you get to billions of rows of data, you're not talking about very much data at all.

Do the math.

4-12 rows per user per course,... hundreds of courses and thousands of users?

400,000 to 1,200,000 rows. Let's assume 1000 bytes per row.

That's 400Mb to 1.2Gb of data. You can buy 100Gb drives for $299 at the Apple store. You can easily spend more than $299 of billable time sweating over details that don't much matter any more.

Until you get to 1Tb of data (1,000 Gb), you're not talking about much data at all.


Solution:3

100 (courses) * 1000 (users) * 10 (records) is only a million. That's the low end, but a decent database ought to handle it okay.

What sounds iffy are Name/Value pairs. That will limit your ability to correctly index things, which will be critical to good performance.


Solution:4

No hard and fast rule, but there is a hard and fast way to get a number.

Write a program to populate your table with dummy data roughly approximating the expected form of the actual data (e.g. similar regularity, characters, patterns, etc.) Run performance tests against it using actual queries with the dummy data, gradually increasing the number of rows in the table, perhaps by steps of 1000 or 10000 rows.

At the cusp of when the query performance (e.g. queries completed per second) becomes unacceptable, you'll have your "too big" number of rows.


Solution:5

I once worked on a web form system with over 300 million rows in their name/value pair table. Many of the forms had over 300 rows per form submission. Performance wasn't too bad actually, but it was a total PITA to query from! My sql writing ability definitely improved over the life of this gig.

But IMHO, if you have any say get rid of it in favor of a standard normalized table.


Solution:6

Not really. It all depends on your business needs, and you'll have to buy the product that supports your estimated row count.


Solution:7

No, there isn't really any hard rule about how many rows you can have in a table, it depends a lot on how much data there is in the rows, and how well the data can be indexed.

A quick estimate on the figures that you stated gives something like tens of millions of rows. That's certainly not too much, but it's enough that it could be a problem if you aren't a bit careful.

Perhaps the table could be normalized? Does the same names occur a lot, so that you could put the names in a separate table and use the id in the table?


Solution:8

I don't think there is really a limit here, but drive space. BUT PLEASE add good indexes while its small, becuase when the table is huge indexes will take a lot longer to add. Plus if you have bad indexes queries will slow down as it gorws and people will complain when there is really nothing wrong, but a crappy to no index.


Solution:9

I've worked on databases where we tried to create tables with 2B rows of data - that doesn't work, we got to 500M and re-designed. One of the biggest gotchas of working with such large table was the time taken to do deletions - I often see the approach where old records are archived and then deleted from the main table. If the table is big enough that deletion will run for many hours as the indexes are rebuilt.

Not sure where the cut off is but gut feel indicates a table > 10M rows is probably too big. Our approach was to partition data by date, so we ended up with a table for a week of data, and another summary table for months, and another summary for years - very common in DataWarehousing. BTW this was on SQL 7.0, interested to know if the DB's are better at this type of stuff yet?


Solution:10

Your question prompts more questions than answers.

  • what database engine are you using? Its hard to fashion you a good answer without this.
  • what is the table structure? Depending on your datatype, how your table will layout on disk will depend on this.
  • why not store one record per user/course? As you are storing SCORM data, I assume this means you are storing standard SCORM data like completion, success, attempts, totaltime, etc. There's no need to create multiple rows for this.

I've built a few databases storing SCORM data, and I've never had to go with a tag/value system like you suggest.

One thing you want to remember though is its not the # of rows in the table, its the SIZE (in bytes) of the table. Simply:

table size = row size (avg) * number of rows

The question to ask is, "how big a table is too big"?


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »