Tutorial :Clustered index insert performance on not-so-unique column?



Question:

In your experience, at about how many records does insert performance become intolerable when using a clustered index on not-typically-unique columns?

A good example I can think of would be the comments table for stack overflow. Do you think that insert performance would be acceptable if the comments table had a clustered index on their foreign key to the answers or questions table? I'm assuming that this would result in the fastest read performance for the way that comments are typically queried.

I often read that the clustered index should be reserved for uniquely-valued columns, but what if the table in question was most often queried by this index?


Solution:1

You should always try to keep your clustered indexes unique. For tables that have a ton of inserts something like a int identity is a good choice cause the page being inserted on will often be in memory reducing disk access.

If you do not make your clustered index unique, SQL server will do it for you, cause it still needs to be able to somehow find particular rows. Maintaining the uniquifier will cost something.

So what if you want the clustered index on a comments table to be the post id. This can be useful cause looking up all the comments for a post becomes really fast, all the information is in the same area on disk.

No problems, make the index unique by adding more columns to it: Eg:

create unique clustered index pk_comment(post_id, comment_id)    

But... having this index means that your index is no longer monotonically increasing which may effect insert performance. It could also affect the amount of page splitting.

So, my recommendation would be to keep it simple and just chuck a primary key on comment_id, then profile the application add in covered indexes as required. Only if the underlying way the data is layed out on disk becomes a problem, you should consider complicating stuff.


Solution:2

It Depends on:

  • The size of the rows
  • On Fill Factor (i.e. space left in the index)
  • The number of non-clustered indexes on the table
  • How often the index is reorganised (note: not so important when the clustered index is on a montonically increasing key)

You should benchmark for your particular situation.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »