Tutorial :Most Efficient (Fast) T-SQL DELETE For Many Rows?



Question:

Our server application receives information about rows to add to the database at a rate of 1000-2000 rows per second, all day long. There are two mutually-exclusive columns in the table that uniquely identify a row: one is a numeric identifier called 'tag' and the other is a 50character string called 'longTag'. A row can have either a tag or a longTag; not both.

Each row that comes in off the socket may or may not already exist in the table. If it exists, that row must be updated with the new information. If it doesn't exist, it must be added. We are using SQL 2005 and in a few cases even SQL 2000, so we cannot use the new MERGE keyword.

The way I am doing this now is to build a giant DELETE statement that looks like this:

DELETE from MyRecords  WHERE tag = 1  OR tag = 2  OR longTag = 'LongTag1'  OR tag = 555  

...where each incoming row has its own 'OR tag = n' or 'OR longTag = 'x'' clause.

Then I perform an XML Bulk Load using ISQLXMLBulkLoad to load all the new records at once.

The giant DELETE statement sometimes times out, taking 30 seconds or longer. I'm not sure why.

As records come in off the socket they must either be inserted or they must replace existing rows. Is the way I'm doing it the best way to do it?

EDIT: The ratio of new rows to replacement rows is going to be very heavily slanted toward new rows. In data I have seen coming from production, there will typically be 100-1000 new rows for each correction.

EDIT 2: Both the inserts and the deletes must be handled as a single transaction. If either the insert or the delete fails, they must both be rolled back, leaving the table in the same state it was in before the inserts & deletes began.

EDIT 3: Regarding NULL tags. I need to first briefly describe the system a little more. This is a database for a trading system. MyTable is a trades table containing two kind of trades: so-called "day trades" and so-called "opening positions." Day trades are simply trades -- if you were an options trader and you did a trade, that trade would be a day trade in this system. Opening positions are basically a summary of your portfolio up until today. Both opening positions and day trades are stored in the same table. Day trades have tags (either longTags or numeric tags), and opening positions do not. There can be duplicate rows for opening positions -- that is fine & normal. But there cannot be duplicate rows for day trades. If a day trade comes in with the same tag as some record already in the database, then the data in the table is replaced with the new data.

So there are 4 possibilities for the values in tag & longTag:

1) tag is non-zero & longTag is empty: this is a day trade with a numeric identifier. 2) tag is zero and longTag has a non-empty character value. This is a day trade with an alphanumeric identifier. 3) tag is zero and longTag is empty: this is an opening position. 4) tag is non-zero and longTag has a non-empty character value. This is prevented from every happening by our server software, but if it were to happen the longTag would be ignored and it would be treated the same as case #1. Again, this does not happen.


Solution:1

An OR (or an in) almost works as if each OR operand is a different query. That is, it turns into a table scan, and for each row, the database has to test each OR operand as a predicate, until it finds a match or runs out of operands.

The only reason to package this up is to make it one logical unit of work. You could also wrap a bunch of deletes in a transaction, and only commit when all finish successfully.

Quassnoi makes an interesting suggestion -- to use a table --, but since he then uses INs and ORs, it comes out the same.

But try this.

Create a new table that mirrors your real table. Call it u_real_table. Index it on tag and longTag.

Put all your incoming data into u_real_table.

Now, when you're ready to do your bulk thing, instead join the mirror table o the real table on tag. From the real table, delete all the tag'd rows in the u_real_table:

delete real_table from real_table a      join u_real_table b on (a.tag = b.tag);  insert into real_table select *      from u_real_table where tag is not null;  

See what we did here? Since we're joining only on tag, there's a greater chance the tag index can be used.

First we deleted everything new, then we inserted the new replacements. We could also do an update here. Which is faster depends on your table structure and its indices.

We didn't have to write a script to do it, we just had to have inserted the records in u_real_table.

Now we do the same thing for longTags:

delete real_table from real_table a      join u_real_table b on (a.longTag = b.longTag);  insert into real_table select *      from u_real_table where longTag is not null;  

Finally, we clear out u_real_table:

delete from u_real_table;  

Obviously, we wrap the whole each delete/insert pair in a transaction, so that the delete only becomes real when the subsequent insert succeeds, and then we wrap the whole thing in another transaction. Because it is a logical unit of work.

This method reduces your manual work, reduces the possibility of a manual error, and has some chance of speeding up the deletes.

Note that this relies on missing tags and longTags correctly being null, not zero or the empty string.


Solution:2

I think splitting the giant DELETE statement into 2 DELETE may help.

1 DELETE to deal with tag and a separate DELETE to deal with longTag. This would help SQL server to choose to use indexes efficiently.

Of course you can still fire the 2 DELETE statements in 1 DB round-trip.

Hope this helps


Solution:3

Check out this video which showcases how to do a 'nibbling' delete. The process works well and can definitely reduce the locking/collision problems you're seeing:

http://www.sqlservervideos.com/video/nibbling-deletes


Solution:4

Something like this could streamline the process (you would simply INSERT the rows, no matter if they already exist - no need for an up-front DELETE statement):

CREATE TRIGGER dbo.TR_MyTable_Merge      ON  dbo.MyTable      INSTEAD OF INSERT  AS   BEGIN    SET NOCOUNT ON;      BEGIN TRANSACTION      DELETE MyTable     FROM   MyTable t INNER JOIN inserted i ON t.tag = i.tag       DELETE MyTable     FROM   MyTable t INNER JOIN inserted i ON t.longTag = i.longTag      INSERT MyTable     SELECT * FROM inserted      COMMIT TRANSACTION      SET NOCOUNT OFF;  END  

EDIT: Previously Combined DELETE statement broken up into two separate statements, to enable optimal index use.

Not using DELETE at all, but rather UPDATEing the affected/duplicate rows in place will be easier on the indexes.


Solution:5

Maybe:

DELETE FROM MyRecords  WHERE  tag IN (1, 2, 555) -- build a list  OR longTag IN ('LongTag1')  

I suspect indexes would help your deletes but drastically slow your inserts, so I wouldn't play with that too much. But then my intuition isn't exactly perfect yet, you might be able to tune FillFactor or other items to get around that issue, and the one thing I do know for sure it that you really want to profile both anyway.

Another option is to load new inserts into a temp table (named something like InputQueue), and then join the temp table on MyRecords to handle filtering updates. This would also make it easy to do the update in two steps: you could delete Tags and longTags as separate operations and that might turn out to be much more efficient.


Solution:6

Using ORs may cause a table scan - can you break it up into four statements? Wrapping each in a transaction may also speed things up.

DELETE from MyRecords  WHERE tag = 1    DELETE from MyRecords  WHERE tag = 2    DELETE from MyRecords  WHERE tag = 555    DELETE from MyRecords  WHERE longTag = 'LongTag1'  


Solution:7

Seems that your table is not indexed on (tag) and (longTag)

Build two indexes: one on (tag), one on (longTag)

If you are planning to delete a really big number of records, then declare two table variables, fill them with values and delete like this:

DECLARE @tag TABLE (id INT);  DECLARE @longTag TABLE (id VARCHAR(50));    INSERT  INTO  @tag  VALUES (`tag1`)    INSERT  INTO  @tag  VALUES (`tag2`)    /* ... */    INSERT INTO @longTag  VALUES ('LongTag1')    /* ... */      DELETE  FROM    MyRecords r  WHERE   r.tag IN (SELECT * FROM @tag)          OR r.longTag IN (SELECT * FROM @longTag)  

You may also try to perform a two-pass DELETE:

DELETE  FROM    MyRecords r  WHERE   r.tag IN (SELECT * FROM @tag)    DELETE  FROM    MyRecords r  WHERE   r.longTag IN (SELECT * FROM @longTag)  

and see what statements runs longer, to see if there's an issue with the indexes.


Solution:8

Indexing:

Consider using an indexed persisted computed column for longTag which stores a checksum of longTag. Instead of indexing 'LongTag1', you index a 4-byte int value (86939596).

Then your look-ups are [hopefully*] faster, and you just have to include the longTag value in the query/delete. Your code would be slightly more complex, but the indexing is likely to be much more efficient.

* Requires testing


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »