Abstract
A Software Design Pattern for concurrent systems which makes race conditions something that can be asserted against and thus deterministically eliminated rather than stochastically reduced or minimized. (Subject, of course, to the thoroughness of the assertions.)
Image by reginasphotos from pixabay.com |
A description of the problem
Every Software Engineer who has dealt with concurrency knows that it is hard. The bane of concurrency is race conditions: when a thread accesses data without taking into account the fact that the data is shared with other concurrently running threads which may alter that data at any unforeseeable moment in time.
(Useful pre-reading: About these papers)
There exist two kinds of race conditions that I can think of, let's call them physical and logical. (I just made up these terms, perhaps they have already been studied and given other names, but I am unaware of that.)
- Physical Race Conditions happen due to the way the underlying hardware works. One example is trying to read a variable consisting of who machine words, thus requiring two successive read operations which are not atomic, while another thread is simultaneously writing to that variable, resulting in garbage being read. Another example is two threads simultaneously performing increment operations on the same memory location, where a memory increment is implemented by the CPU as a non-atomic sequence of read-increment-write operations, resulting in some of the increments being lost.
- Logical Race Conditions happen when application logic fails to account for concurrency. For example, checking whether a collection contains a value, and if not, adding the value to the collection: when two threads try to do this, it will sometimes happen that they will both find that the collection does not contain the value in question, and will both add it, resulting in a duplicate. Depending on whether the implementation of the collection allows duplicates or not, this will result either in soft malfunction, (a duplicate where it was not intended,) or in hard failure due to the collection throwing a "duplicate element" exception.
Note that logical race conditions can occur even if we have taken all
necessary precautions (i.e. locking) to avoid physical race conditions. Incidentally, this
is the reason why many of the so-called "concurrent" collections like the
"concurrent map" are of very limited use: sure, they guarantee that they
will not crash and burn, but they do not guarantee correct results.
Race conditions exhibit a disastrous combination of unfortunate qualities:
- Non-deterministic: you cannot reproduce them at will, they just appear to happen at random, so you can almost never use the debugger to troubleshoot them.
- Sensitive to troubleshooting instrumentation: not only they never manifest while single-stepping through code, but if you introduce extra code to detect them, they may seemingly disappear, because they are highly dependent on timing. The moment you remove the instrumentation however, they may start manifesting again.
- Elusive: their effects are usually observed not at the moment that they occur but after the fact, so it is difficult to tell what happened and why it happened.
- Confusing: sometimes, the malfunction that they cause seems at first impossible to happen, requiring extensive troubleshooting before the realization sinks in that it must be due to a race condition.
- Multi-faced: in many cases the effects of a race condition differ on each manifestation, so you are never sure whether you are chasing one issue or several issues at the same time.
- Untestable: there is no unit test that can catch race conditions or guarantee their absence.
- Treacherous: a race condition which happens on average once every million seconds of usage may take months before it manifests in your development environment, and yet once there are a million customers using your software, there will be one customer encountering it roughly every second.
- Catastrophic: program state corruption tends to result in complete failure of the software.
Solutions that try to avoid the problem
Since concurrency with locks is so hard, a number of mechanisms have been invented that try to implement concurrency without locking.
-
Immutability (functional programming): if all program state is
immutable, then there is no possibility of one thread modifying some state
while another thread is trying to read it, because there is no state that
can be modified. Therefore, no locking is necessary.
Disadvantages: - Functional programming and immutability are not ubiquitous, and it is yet to be seen whether they will ever become ubiquitous.
- Many implementations of Functional Programming are not purely functional, they mix mutability with immutability, so the problem remains.
- Functional programming is only common in high-level systems running on garbage-collecting virtual machines. It is rare in mid-level systems and virtually absent in low-level systems.
- Many of the data structures that give the illusion of mutability while their inner workings are immutable tend to be computationally more expensive than their straightforward mutable counterparts.
-
Message-passing: threads never share any data, instead they only
work on data that they exclusively own, and they exchange data by means of
immutable messages passed through message queues. Essentially, in the
entire system there is only one little piece of code which employs
locking, and that is the concurrent message queue implementation. The idea
is that we should be able to get at least that small part right.
Disadvantages: - Performance suffers as the amount of data exchanged among threads increases.
- Performance also suffers since data can never be manipulated in-place, it must be placed in a message, the message must be posted into a queue, a thread context switch must usually occur for the receiving thread to process the message, and then the reverse path must be followed for the original thread to receive the result. (When manipulating data in-place, a thread context switch will only occur when attempting to obtain a lock while another thread already holds that lock, which may be a rare occurrence.)
- Nowadays in order to avoid the tedious creation of countless message classes you are more likely to just post a lambda into the message queue, but then you have a lambda which is declared in one thread but executed in another thread, so you still have to be extremely careful with what that lambda is allowed to touch.
-
Other: exotic mechanisms such as the single-writer principle of
the Rust programming language.
Disadvantages: - They tend to require compiler support. (So, a mechanism that can be implemented in any language would still be of value.)
A deeper look at the problem
- Ultra-fine grain locking: Always lock every single little piece of mutable state when accessing it, and only while accessing it.
- Ultra-coarse grain locking: Place a global lock on the entirety of your mutable state on program start and release it on program exit.
- Ultra-fine grain locking would result in an unreasonable amount of bloat in all code that we write, it would suffer performance-wise, and although it would eliminate physical race conditions, it would do nothing for logical race conditions.
- Ultra-coarse grain locking would not work either because increasing the lifetime of a lock also increases the chances that other threads will be blocked, with the absurd extreme of the lock lifetime being equal to program runtime resulting in all threads becoming permanently blocked and no actual sharing of any mutable state ever taking place.
Always lock for as long as necessary, but try not to lock longer than necessary.This leads to the number one cause of race conditions:
Trying to lock for as long as necessary but to avoid locking longer than necessary means that there will always be code which is accessing mutable state without first acquiring a fine grain lock, and instead is assuming that a coarser grain lock has already been acquired by some other code higher up the call-tree. (Remember, computer science trees are upside-down.) This assumption leaves open the possibility of human error, as the programmer who wrote the code higher up the call-tree may have forgotten to lock, thus putting all code below it at risk of race conditions.
- Choosing the grain of the locks in the most performant way is more of an art than a science, requiring a master of the craft to do it right, and that is okay: experts will always be useful. If no expert is available, performance might end up being suboptimal, but the software will still run, so strictly speaking the expert is not necessary.
- Choosing the grain of the locks in such a way that the program remains correct is also more of an art than a science as things stand today, so it also requires a master of the craft to do it right; however, if we want to be thinking of our profession as a science rather than an art, we cannot have software that tends to crash and burn unless a master of the craft has written it. Therefore, we need a mechanism for detecting and protecting ourselves against the human error which is practically inevitable when an apprentice rather than a master touches the code, or even when the master touches the code while having a bad day.
Restating the problem
The solution
- Take measures at certain places in our code to prevent entering a race mode.
- Ensure in all other places in our code that the necessary measures have been taken to guarantee we are not in a race mode.
The name Coherence was chosen as opposed to Assertable Lock because Coherence is meant to be a high level abstraction. The use of an abstraction is necessary for two reasons:
- Coherence is meant to be asserted ubiquitously by any code that accesses mutable state, even by general purpose code such as the standard collection classes. However, general purpose code tends to be (and should remain) agnostic of concurrency, so it should not be burdened with such a low-level and concurrency-specific concept as locking.
- Depending on the concurrency characteristics of the execution environment, there can be different implementations of coherence, some of which do not even involve actual locking, so using the term 'Lock' would be inaccurate and misleading.
- In a strictly single-threaded environment, a dummy implementation is needed which never places any locks and never fails a coherence check.
- In a share-nothing environment, a simple implementation will suffice which never places any locks and only fails a coherence check if the currently executing thread is not the thread that owns the mutable state, i.e. the thread in which the mutable state was created.
- In a thread-pooled, share-nothing environment, a somewhat more elaborate implementation is needed which takes into account the fact that the thread which owns the mutable state may not necessarily be the thread that created the mutable state, since threads are picked from a pool.
- In a multi-threaded environment with a small amount of shared state, a singular locking implementation will suffice which enters coherence by obtaining a lock on the totality of the shared state and fails the coherence check when that lock has not been obtained. This represents a coarse grain lock, so it might result in sub-optimal performance, but it has the advantage of being simple and avoiding deadlocks.
- In a multi-threaded environment with a large amount of shared state and high thread contention over it, necessitating finer grain locking for good enough performance, a plural coherence implementation can be used which allows placing independent locks on independent subsets of the shared state, and fails a coherence check when the lock corresponding to a particular subset of state has not been obtained. Care must be exercised to always enter and assert the correct instance of coherence for each subset of state, and to avoid deadlocks in doing so.
- Regardless of the concurrency characteristics of the execution environment, when the lifetime of a certain piece of mutable state is confined within a single call tree, a simple coherence implementation will again suffice which does not place a lock and simply asserts that the current thread is the thread in which the state was created. (To guard against the mutable state somehow escaping the scope of the call tree in which it was meant to be confined.)
Note that unlike most existing locking mechanisms, which explicitly allow a thread to obtain a lock multiple times, coherence explicitly disallows re-entrance. I have chosen to do it this way because my approach to Software Engineering is "leave nothing to chance", so if you are unsure whether you have already obtained a lock on something, and you would like the locking mechanism to be forgiving in case you try to lock twice, then you must be doing something wrong. It is my firm belief that when a piece of framework is in a position of alerting you that you are doing something wrong, it should be alerting you that you are doing something wrong. Of course it may be that I am wrong here, and unbeknownst to me there exist legitimate reasons for having to allow coherence reentrancy; this remains to be seen.
Further Research
- Detect a deadlock once it happens and provide diagnostic information.
- Detect a deadlock once it happens and somehow take corrective measure.
- Detect the possibility of deadlocks and alert the programmer by means of hard error.
- Be structured in such a way as to make deadlocks impossible.
No comments:
Post a Comment