Tutorial :using a transaction to avoid a race



Question:

I'm writing a daemon to monitor creation of new objects, which adds rows to a database table when it detects new things. We'll call the objects widgets. The code flow is approximately this:

1: every so often:  2:   find newest N widgets (from external source)  3:   foreach widget  4:     if( widget not yet in database )  5:       add rows for widget  

The last two lines present a race condition, since if two instances of this daemon are running at the same time, they may both create a row for widget X if the timing lines up.

The most obvious solution would be to use a unique constraint on the widget identifier column, but that isn't possible due to the database layout (it's actually allowed to have more than one row for a widget, but the daemon shouldn't ever do this automatically).

My next thought would be to use a transaction, since this is what they're intended for. In the ADO.NET world, I believe I would want an isolation level of Serializable, but I'm not positive. Can someone point me in the right direction?

Update: I did some experimentation, and the Serialized transaction doesn't appear to resolve the issue, or at least not very well. The interesting case is described below, and assumes that only one table is involved. Note that I'm not positive about the lock details, but I think I have it right:

Thread A: Executes line 4, acquiring a read lock on the table  Thread B: Executes line 4, acquiring a read lock on the table  Thread A: Tries to execute line 5, which requires upgrading to a write lock     (this requires waiting until Thread B unlocks the table)  Thread B: Tries to execute line 5, again requiring a lock upgrade     (this requires waiting until Thread A unlocks)  

This leaves us in a classic deadlock condition. Other code paths are possible, but if threads A and B don't interleave, there's no synchronization issue anyway. The end result is that a SqlException is thrown on one of the threads, after SQL detects the deadlock and terminates one of the statements. I can catch this exception and detect the particular error code, but that doesn't feel very clean.

Another route I may take is to create a second table that tracks widgets seen by the daemon, where I can use a unique constraint. This still requires catching and detecting certain error codes (in this case, integrity constraint violations), so I'm still interested in a better solution, if somebody can think of one.


Solution:1

Generally, you should always use transactions if you have more then one process or thread using the database at the same time.

Isolation level "serializable" should actually work. It does not allow that data read from one transaction is changed by another. But it locks a lot and should not generally be used, because it slows down the application and there is a higher risk of dead locks.

Alternatives:

  • you only can lock the whole table to make sure that nobody writes into it while you are checking if something is (not) there. The problem is, only one can write to the table at the same time, which means that this slows downs everything a lot. (You could search for the data, if it is not there, lock the table and search again before inserting. This is a common pattern.)
  • Honestly, you should think about the fact that two transactions try to insert the same row at the same time. This is probably where you should start solving it. Only one daemon should be responsible for the same data.
    • make every daemon handling its own data
    • make every daemon calling a service instead of the database. There you can put things together before inserting it.

By the way, you need a unique identifier for the data to clearly identify it anyway. How can you search for it in the database if you don't have a unique identifier?


Solution:2

How do you check 'if (widget not yet in database)'. If this is written in sql in form of 'Select', you can use 'Select for update' to allow only one daemon instance to be able to do that at a time. By using transaction for last two line, and use 'select for update' to lock, you will avoid race.

I am sure there is an equivalent in ado.net.


Solution:3

If you've got SQL Server 2008 (or another DBMS that supports it), consider using a MERGE statement. That can be written to do an insert only if the row does not exist.


Solution:4

(it's actually allowed to have more than one row for a widget, but the daemon shouldn't ever do this automatically)

Then it's not the database that's the problem. it's the fact that you've got multiple daemons that don't know the've detected the same object. Seems to me that's where you need to focus your attention. From the point of view of the database, they're both submitting perfectly legitimate rows.

What happens if you change:

1: every so often:  2:   find newest N widgets (from external source)  3:   foreach widget  4:     if( widget not yet in database )  5:       add rows for widget  

to

1: every so often:  2:   while there are unhandled widgets  2:       find first unhandled widget  4:       if( widget not yet in database )  5:           add one row for widget  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »