Tuesday, January 8, 2008

Lock object sharing with hashes

In web applications we frequently need to serialize access to resources in concurrent applications, since ASP.NET is inherently concurrent. A typical scenario is that we have several loosely connected objects that all apply to the same user, and we need to ensure single-threaded access during a read-read or read-modify-write cycle to get a consistent view or update.
A user is usually represented via some sort of string identifier, perhaps an e-mail address. In C# what we what to do is something like:
lock (user) {     // Do something that needs single-thread access }
The problem is what are we to use as a lock object? C# can use any object as a lock, but which one to pick? We must ensure that multiple threads will always get the right object instance, regardless of when in the application life-time the need arises, so in effect these objects must live for the life of the application. This can lead to massive memory consumption, assume a system with one million users - after a while we'll have to keep one million objects around probably in hash table indexed by the e-mail string. That can mean some serious memory problems.

One approach would be to clean up the table when no-one is using a specific lock object, but this is complicated and fraught with it's own threading problems.

After a few false starts, I came up with the following scheme which has now been tested in the wild and been found quite effective as a trade-off between memory and lock contention.

In actual fact, there are usually a rather limited number of possible concurrent actions, limited basically by the number of threads that are active. This number is typically 100 per processor in ASP.NET, and in most applications even with many users the number of actual concurrent requests at any given time is even fewer. So, assuming a 100 concurrent threads, and assuming that they will only acquire one user lock (our example here) at a time, we really only need at most 100 lock objects - not a million. But how to do this?

The algorithm I've come up with is probably not new, but I've not seen it before in the literature nor when actively searching on the web, so it's at least a bit unusual. Here's how it works:

1. Allocate an array with a fixed number of elements, perhaps twice the number of estimated concurrent accesses.
2. Fill the array with objects to be used as locks.
3. To acquire a lock for a given user, generate an index into the lock object array by taking a hash of the user identifier typically with the GetHashCode() method and then take that module the number of lock objects. This is the index into the lock table, use the indexed object and lock.

At best, you'll get a free lock and acquire the lock.

At second best, another thread is actually holding the lock for the same user, and your thread is put on hold as it should be.

At worst, another thread is actually holding the lock but for a different user that happens to use the same lock when calculating the index via the hash modulo the lock array size. By having good hash algorithms and an appropriate number of locks in relation to the number of concurrent accesses, this should be a very infrequent occurrence. But even if it happens, nothing bad happens except that your thread will have to wait a little longer than was absolutely necessary.

This simple algorithm will require a fixed number of locks in relation to the level of concurrency instead of the number of potential objects that require locking, and at a very low cost. Sample code follows:
public class LockObjects {     private object[] _lockObjects;     public LockObjects(int numberOfLockObjects)     {         _lockObjects = GetInitialLockObjects(numberOfLockObjects);     }     public LockObjects()         : this(20)     {     }     private object[] GetInitialLockObjects(int numberOfLockObjects)     {         object[] lockObjects = new object[numberOfLockObjects];         for (int i = 0; i < lockObjects.Length; ++i)         {             lockObjects[i] = new object();         }         return lockObjects;     }     public virtual object GetLockObject(params string[] keys)     {         int lockHash = 0;         foreach (string key in keys)         {             lockHash += key.ToLowerInvariant().GetHashCode();         }         lockHash = Math.Abs(lockHash) % _lockObjects.Length;         return _lockObjects[lockHash];     } }