michael.gr: Coherence: The Assertable Lock

Abstract

A Software Design Pattern for concurrent systems which makes race conditions something that can be asserted against and thus deterministically eliminated rather than stochastically reduced or minimized. (Subject, of course, to the thoroughness of the assertions.)

Image by reginasphotos from pixabay.com

A description of the problem

Every Software Engineer who has dealt with concurrency knows that it is hard. The bane of concurrency is race conditions: when a thread accesses data without taking into account the fact that the data is shared with other concurrently running threads which may alter that data at any unforeseeable moment in time.

(Useful pre-reading: About these papers)

There exist two kinds of race conditions that I can think of, let's call them physical and logical. (I just made up these terms, perhaps they have already been studied and given other names, but I am unaware of that.)

Physical Race Conditions happen due to the way the underlying hardware works. One example is trying to read a variable consisting of who machine words, thus requiring two successive read operations which are not atomic, while another thread is simultaneously writing to that variable, resulting in garbage being read. Another example is two threads simultaneously performing increment operations on the same memory location, where a memory increment is implemented by the CPU as a non-atomic sequence of read-increment-write operations, resulting in some of the increments being lost.
Logical Race Conditions happen when application logic fails to account for concurrency. For example, checking whether a collection contains a value, and if not, adding the value to the collection: when two threads try to do this, it will sometimes happen that they will both find that the collection does not contain the value in question, and will both add it, resulting in a duplicate. Depending on whether the implementation of the collection allows duplicates or not, this will result either in soft malfunction, (a duplicate where it was not intended,) or in hard failure due to the collection throwing a "duplicate element" exception.

Note that logical race conditions can occur even if we have taken all necessary precautions (i.e. locking) to avoid physical race conditions. Incidentally, this is the reason why many of the so-called "concurrent" collections like the "concurrent map" are of very limited use: sure, they guarantee that they will not crash and burn, but they do not guarantee correct results.

Race conditions exhibit a disastrous combination of unfortunate qualities:

Non-deterministic: you cannot reproduce them at will, they just appear to happen at random, so you can almost never use the debugger to troubleshoot them.
Sensitive to troubleshooting instrumentation: not only they never manifest while single-stepping through code, but if you introduce extra code to detect them, they may seemingly disappear, because they are highly dependent on timing. The moment you remove the instrumentation however, they may start manifesting again.
Elusive: their effects are usually observed not at the moment that they occur but after the fact, so it is difficult to tell what happened and why it happened.
Confusing: sometimes, the malfunction that they cause seems at first impossible to happen, requiring extensive troubleshooting before the realization sinks in that it must be due to a race condition.
Multi-faced: in many cases the effects of a race condition differ on each manifestation, so you are never sure whether you are chasing one issue or several issues at the same time.
Untestable: there is no unit test that can catch race conditions or guarantee their absence.
Treacherous: a race condition which happens on average once every million seconds of usage may take months before it manifests in your development environment, and yet once there are a million customers using your software, there will be one customer encountering it roughly every second.
Catastrophic: program state corruption tends to result in complete failure of the software.

Solutions that try to avoid the problem

Since concurrency with locks is so hard, a number of mechanisms have been invented that try to implement concurrency without locking.

Immutability (functional programming): if all program state is immutable, then there is no possibility of one thread modifying some state while another thread is trying to read it, because there is no state that can be modified. Therefore, no locking is necessary.

Disadvantages:

Functional programming and immutability are not ubiquitous, and it is yet to be seen whether they will ever become ubiquitous.
Many implementations of Functional Programming are not purely functional, they mix mutability with immutability, so the problem remains.
Functional programming is only common in high-level systems running on garbage-collecting virtual machines. It is rare in mid-level systems and virtually absent in low-level systems.
Many of the data structures that give the illusion of mutability while their inner workings are immutable tend to be computationally more expensive than their straightforward mutable counterparts.

Message-passing: threads never share any data, instead they only work on data that they exclusively own, and they exchange data by means of immutable messages passed through message queues. Essentially, in the entire system there is only one little piece of code which employs locking, and that is the concurrent message queue implementation. The idea is that we should be able to get at least that small part right.

Disadvantages:

Performance suffers as the amount of data exchanged among threads increases.
Performance also suffers since data can never be manipulated in-place, it must be placed in a message, the message must be posted into a queue, a thread context switch must usually occur for the receiving thread to process the message, and then the reverse path must be followed for the original thread to receive the result. (When manipulating data in-place, a thread context switch will only occur when attempting to obtain a lock while another thread already holds that lock, which may be a rare occurrence.)
Nowadays in order to avoid the tedious creation of countless message classes you are more likely to just post a lambda into the message queue, but then you have a lambda which is declared in one thread but executed in another thread, so you still have to be extremely careful with what that lambda is allowed to touch.

Other: exotic mechanisms such as the single-writer principle of the Rust programming language.

Disadvantages:

They tend to require compiler support. (So, a mechanism that can be implemented in any language would still be of value.)

So, avoiding race conditions when practicing concurrency by means of locking is an existing problem in need of a solution.

A deeper look at the problem

At the heart of the race condition problem lies the "to lock or not to lock" conundrum. The choice of what to do lies in a continuum between two absurd extremes:

Ultra-fine grain locking: Always lock every single little piece of mutable state when accessing it, and only while accessing it.
Ultra-coarse grain locking: Place a global lock on the entirety of your mutable state on program start and release it on program exit.

Obviously, neither of these extreme approaches would work.

Ultra-fine grain locking would result in an unreasonable amount of bloat in all code that we write, it would suffer performance-wise, and although it would eliminate physical race conditions, it would do nothing for logical race conditions.
Ultra-coarse grain locking would not work either because increasing the lifetime of a lock also increases the chances that other threads will be blocked, with the absurd extreme of the lock lifetime being equal to program runtime resulting in all threads becoming permanently blocked and no actual sharing of any mutable state ever taking place.

So, the answer to the "to lock or not to lock" conundrum always lies somewhere in-between:

Always lock for as long as necessary, but try not to lock longer than necessary.

This leads to the number one cause of race conditions:

Trying to lock for as long as necessary but to avoid locking longer than necessary means that there will always be code which is accessing mutable state without first acquiring a fine grain lock, and instead is assuming that a coarser grain lock has already been acquired by some other code higher up the call-tree. (Remember, computer science trees are upside-down.) This assumption leaves open the possibility of human error, as the programmer who wrote the code higher up the call-tree may have forgotten to lock, thus putting all code below it at risk of race conditions.

This situation is so widespread that it may be hard to realize its full extent: every single time we invoke a standard runtime library mutable collection class (which is one of the most frequent things we do) we are engaging in this assumption: the collection is not concurrency aware, so it is not placing any locks, but it is manipulating mutable state, so under conditions of concurrency it will fail unless a lock is in place. Essentially, the collection class is doing its job while praying that someone up the call tree has remembered to acquire the necessary lock.

The grain of locks affects two things: performance and correctness.

Choosing the grain of the locks in the most performant way is more of an art than a science, requiring a master of the craft to do it right, and that is okay: experts will always be useful. If no expert is available, performance might end up being suboptimal, but the software will still run, so strictly speaking the expert is not necessary.
Choosing the grain of the locks in such a way that the program remains correct is also more of an art than a science as things stand today, so it also requires a master of the craft to do it right; however, if we want to be thinking of our profession as a science rather than an art, we cannot have software that tends to crash and burn unless a master of the craft has written it. Therefore, we need a mechanism for detecting and protecting ourselves against the human error which is practically inevitable when an apprentice rather than a master touches the code, or even when the master touches the code while having a bad day.

Restating the problem

A very important first step in solving a problem often is to restate it using terminology that is more conducive to solving it. The term "Race Condition" is somewhat cumbersome because it refers to an unfortunate event which may or may not happen, depending on non-deterministic circumstances. The original 1954 paper by David Huffman, titled "The synthesis of sequential switching circuits", which contains the first known mention of the term, regards race conditions as something which may exist when a certain instability is detected, so even the original sense referred to events that may potentially occur.

However, if we care about software correctness, then we do not want to be leaving anything to chance, so the fact that the unfortunate event may happen is irrelevant: if circumstances can arise at all which would potentially allow a race condition to occur, then for all practical purposes it must be assumed that the race condition will occur. Therefore, the race conditions themselves should be of no interest to us; what should be of interest is modes of operation that allow race conditions to occur. We will call them Race Modes.

A Race Mode is an erroneous mode of operation in which a race condition can potentially occur.

A piece of software either enters race modes, or it does not. If it does enter race modes, then race conditions may occur, and as already established, it should be presumed that they will occur. If the software never enters any race modes, then no race conditions can occur.

So, the problem has been restated from "avoiding race conditions" to "avoiding race modes". The difference may be subtle, but it is important enough to make.

The solution

In restating the problem as described above we have set ourselves a new goal: how to assert against race modes. If race modes can be asserted against, then the concurrency problem stops being subject to chance and becomes quite deterministic instead: if our software runs and no assertion failures occur, then it never enters a race mode, and therefore no race conditions are possible. (Note that the assertions are not meant to catch race conditions; the assertions are meant to catch race modes.)

The mechanism that I have come up with for asserting against race modes is called Coherence and in its simplest form it can be thought of as an abstraction of an Assertable Lock.

There are two things you can do with coherence: enter it, and assert it. (By entering coherence we mean executing a piece of code while in coherence, so once that piece of code is done executing, coherence will be exited.) So, coherence gives us the ability to:

Take measures at certain places in our code to prevent entering a race mode.
Ensure in all other places in our code that the necessary measures have been taken to guarantee we are not in a race mode.

Thus, coherence saves us from doing ultra-fine grain locking and from making assumptions about locking: by turning locks into something assertable, we do not have to acquire a lock every single time we touch mutable state, but we can assert that a lock has been acquired by code higher up the call tree. Since assertions can compile to nothing on the release build, this is a zero-runtime-cost solution.

The name Coherence was chosen as opposed to Assertable Lock because Coherence is meant to be a high level abstraction. The use of an abstraction is necessary for two reasons:

Coherence is meant to be asserted ubiquitously by any code that accesses mutable state, even by general purpose code such as the standard collection classes. However, general purpose code tends to be (and should remain) agnostic of concurrency, so it should not be burdened with such a low-level and concurrency-specific concept as locking.

Depending on the concurrency characteristics of the execution environment, there can be different implementations of coherence, some of which do not even involve actual locking, so using the term 'Lock' would be inaccurate and misleading.

Examples of possible coherence implementations depending on the concurrency characteristics of the execution environment:

In a strictly single-threaded environment, a dummy implementation is needed which never places any locks and never fails a coherence check.
In a share-nothing environment, a simple implementation will suffice which never places any locks and only fails a coherence check if the currently executing thread is not the thread that owns the mutable state, i.e. the thread in which the mutable state was created.
In a thread-pooled, share-nothing environment, a somewhat more elaborate implementation is needed which takes into account the fact that the thread which owns the mutable state may not necessarily be the thread that created the mutable state, since threads are picked from a pool.
In a multi-threaded environment with a small amount of shared state, a singular locking implementation will suffice which enters coherence by obtaining a lock on the totality of the shared state and fails the coherence check when that lock has not been obtained. This represents a coarse grain lock, so it might result in sub-optimal performance, but it has the advantage of being simple and avoiding deadlocks.
In a multi-threaded environment with a large amount of shared state and high thread contention over it, necessitating finer grain locking for good enough performance, a plural coherence implementation can be used which allows placing independent locks on independent subsets of the shared state, and fails a coherence check when the lock corresponding to a particular subset of state has not been obtained. Care must be exercised to always enter and assert the correct instance of coherence for each subset of state, and to avoid deadlocks in doing so.
Regardless of the concurrency characteristics of the execution environment, when the lifetime of a certain piece of mutable state is confined within a single call tree, a simple coherence implementation will again suffice which does not place a lock and simply asserts that the current thread is the thread in which the state was created. (To guard against the mutable state somehow escaping the scope of the call tree in which it was meant to be confined.)

Note that Coherence only allows asserting its state and purposefully disallows testing its state. In other words, you cannot have an "if coherence is entered then..." construct. This is done so as to prevent misuse and to allow for high performance coherence implementations that, on the release build, may not have explicit knowledge of whether coherence has been entered or not.

Note that unlike most existing locking mechanisms, which explicitly allow a thread to obtain a lock multiple times, coherence explicitly disallows re-entrance. I have chosen to do it this way because my approach to Software Engineering is "leave nothing to chance", so if you are unsure whether you have already obtained a lock on something, and you would like the locking mechanism to be forgiving in case you try to lock twice, then you must be doing something wrong. It is my firm belief that when a piece of framework is in a position of alerting you that you are doing something wrong, it should be alerting you that you are doing something wrong. Of course it may be that I am wrong here, and unbeknownst to me there exist legitimate reasons for having to allow coherence reentrancy; this remains to be seen.

Further Research

As mentioned earlier, in multi-threaded environments with a large amount of shared state and high thread contention over it, performance concerns often necessitate dividing the state into subsets and having an individual lock for each subset, so that different subsets can be locked independently of each other.

Unfortunately, when we do this, we run the danger of entering deadlocks, and as it stands, the plural coherence implementation, which is suitable for these scenarios, does not address the issue of deadlocks.

Some research is necessary to determine whether the plural coherence implementation could do any of the following:

Detect a deadlock once it happens and provide diagnostic information.
Detect a deadlock once it happens and somehow take corrective measure.
Detect the possibility of deadlocks and alert the programmer by means of hard error.
Be structured in such a way as to make deadlocks impossible.

2020-12-21

Coherence: The Assertable Lock