Monday, May 10, 2021

ConcurrentDictionary.GetOrAdd() may not work as you think!

It's concurrent - not lazy

We had a problem with random crashes in a customers web. It was not catastrophic, it would get going again but still it was there in the logs and we want every page visit to work.

A bit of investigation with an IL-code decompiler followed, which by the way is absolutely the best thing since sliced bread! I found code equivalent to the following in a third party vendors product:
private readonly ConcurrentDictionary<string, Type> _dictionary
    = new ConcurrentDictionary<string, Type>();

private readonly object _lock = new object();

public Type GetType(string typename)
{
    return _dictionary.GetOrAdd(typename,
        (t) =>
        {
            lock (_lock)
            {
                return DynamicallyGenerateType(t);
            }
        });
}
The thing here is that DynamicallyGenerateType() can only be called once per typename, since what it does is emit code into an assembly, and if you do that twice you get a System.ArgumentException: Duplicate type name within an assembly .

No-one wants that, so the author thought that it would be cool to use a ConcurrentDictionary<TKey, TValue> since the GetOrAdd() method guarantees that it will get an existing value from the dictionary, or add a new one using the provided value factory and then return the value.

It looks good, reasonable, and works almost all of the time. Key word here is: almost.

What the concurrent dictionary does is it uses efficient and light-weight locking to ensure that the dictionary can be concurrently accessed and updated in a consistent and thread safe manner.

It does not guarantee a single one-time lazy call to the value factory used to add a value if it's not in the dictionary.

Sometimes, under heavy initial load, the value factory passed as the second argument to GetOrAdd() will be called twice (or more). What the concurrent dictionary guarantees is that the value for the provided key will only be set once, but the value factory may be called multiple times with the result thrown away for all calls except the race-winning one!

This is clearly documented but it's easy to miss. This is not a case of the implementation not being thread safe, as is stated in some places. It's very thread safe. But the value factory may indeed be called multiple times!

To fix it, add a Lazy-layer on top, because Lazy<T> by default does guarantee that it's value factory is only called once!
private readonly ConcurrentDictionary<string, Lazy<Type>> _dictionary
    = new ConcurrentDictionary<string, Lazy<Type>>();

private readonly object _lock = new object();

public Type AddType(string typename)
{
    return _dictionary.GetOrAdd(typename,
        (t) =>
        {
            lock (_lock)
            {
                return DynamicallyGenerateTypeLazy(t);
            }
        }).Value;
}

Now, although we may instantiate more than one Lazy<Type> instance, and then throw it away if we loose the GetOrAdd-race, that's a minor problem and it works as it should.

Please note that this is only true as long as you use the default LazyThreadSafetyMode.ExecutionAndPublication mode.

The additional lock may look confusing, but it was in the original code and makes sense in this context, because while the concurrent dictionary and lazy layer guarantees that only one call per value of  'typename' is made to DynamicallyGenerateTypeLazy(), it does not guarantee that multiple threads do not call it it concurrently with different type names and this may wreak havoc with the shared assembly that the code is generated to.

No comments:

Post a Comment