michael.gr: Object Lifetime Awareness

The Thinker (French: Le Penseur) by Auguste Rodin (From Wikipedia)

Abstract

Garbage collectors have given us a false sense of security with respect to what happens to an object once we stop thinking about it. The assumption is that it will be magically taken care of, but this does not always go as hoped, resulting in memory leaks and bugs due to failure to perform necessary cleanup. Tools for troubleshooting such problems are scarce, and not particularly helpful, so finding and fixing such problems is notoriously difficult.

A methodology is presented, which differs from current widespread practices, for maintaining awareness of, and exercising full deterministic control over, the lifetime of certain objects in a garbage-collected environment. We issue hard errors in the event of misuse, and accurate diagnostic messages in the event of omissions, thus improving the robustness of software and lessening the troubleshooting burden.

(Useful pre-reading: About these papers)

Definition

An object can be said to have a concept of lifetime if at some point it must perform some cleanup actions, after which it must never be accessed again.

A first look at the Problem

One of the original promises of garbage collectors was that we should not have to worry about the lifetime of objects; however, there exist various known situations where objects do, by their very nature, have an inherent notion of lifetime, so we do have to worry about it; for example:

Objects that model real-world processes with an inherent concept of lifetime, such as:

A user's visit to a web site, represented by a web session which at some moment expires.
The printing of a document, represented by a print job which at some moment completes.

Objects implementing application behaviors with a clearly defined end, such as:

A dialog window which at some moment gets dismissed and ceases to exist.

Additionally, there exist certain programmatic constructs which require a notion of lifetime; for example:

An event observer which must at some point unregister from the event source that it had previously registered with.
A database transaction which must at some point end, either by committing it or rolling it back.
Generally, any situation where:

We must remember to undo something which was previously done.
Some initialization must be balanced by some corresponding cleanup.

Furthermore, any object which contains an object that has a notion of lifetime needs to have a notion of its own lifetime, so as to be in a position of controlling the lifetime of the contained object. Thus, there tends to be a need for objects with a notion of lifetime to form a containment hierarchy whose root is the main application object.

Unfortunately, in garbage collected environments, object lifetime is not given as much attention as it deserves. Software architectures tend to underestimate its importance, give it only a partial treatment, and invariably do it in ad-hoc ways, without any clearly defined methodology or aiding infrastructure. All to often, an object with an inherent notion of lifetime is built without explicit knowledge of it; instead, its lifetime is treated only implicitly. Thus, the software design has no knowledge of, and no control over, the lifetime of that object, and relies on the garbage collector to magically take care of it.

Once we leave an object up to the garbage collector to take care of, we completely relinquish control over what happens next: there are no guarantees as to when the object will be collected, or even as to whether it will in fact be collected; there will be nothing to inform us of the outcome, and we have no way of influencing the outcome. Thus, when object-lifetime related trouble happens, it is by its nature very difficult to troubleshoot, diagnose, and fix; nonetheless, most programmers try to avoid dealing with object lifetime if they can, and each time problems pop up, they try to fix them on an as needed basis.

The following kinds of trouble are common:

Direct failure to perform necessary cleanup: the false sense of security offered by the garbage collector sometimes makes programmers forget that it only reclaims unused memory, it does not do any other cleanup for us, such as unregistering observers from event sources. This usually needs to be done manually, and it requires that the observer must have a notion of lifetime. An event source could in theory be asserting that every single observer did eventually remember to deregister; however, such a technique would require not only observers to have a notion of lifetime, but also the event source itself. Thus, there is no widespread use of such a technique, because there is no widespread use of object lifetime awareness in the first place.
Memory leaks: in an ideal world, the magic of the garbage collector would always be strong and true, but in practice it is not, due to subtle human error such as inadvertently keeping around a reference to an object, thus preventing it from being garbage collected. Lack of object lifetime awareness only exacerbates this problem.
Troubleshooting in the dark: an object with a notion of lifetime can either be alive or dead. If the lifetime of the object is explicit, we can always inspect that state with the debugger. If not, then we never know whether the object that we are looking at is meant to be alive or not.
Inability to detect misuse: a very common mistake is continuing to access an object even after its lifetime is over. When the object has no explicit knowledge of its own lifetime, it cannot assert against such mistakes.

Existing mechanisms

Garbage collectors and their associated execution environments do provide some machinery which is related to the topic of object lifetime management, namely finalization, disposal, "automatic" disposal, and weak references, but as we shall see this machinery alone is woefully inadequate.

Finalization

In Java we have the `Object.finalize()` method, and in C# we have "destructors", which are actually not destructors at all, they are finalizers, too. [1] The garbage collector will invoke the finalizer of an object right before reclaiming the memory occupied by it, so that the object can in theory perform some cleanup at that moment. Unfortunately, this mechanism is notoriously unreliable:

An object will not be finalized unless it first becomes eligible for collection, and in order for that to happen, it must first become unreachable. However, the object may remain reachable due to subtle mistakes such as unknowingly keeping a reference to it in some list, thus resulting in objects which never get finalized.
When an object does become eligible for collection, the moment that it will actually be collected largely depends upon the whim of the garbage-collector, which is non-deterministic, both according to the documentation and as observed by experimentation. There are no guarantees as to when an object will be collected, or even as to whether it will ever be collected, despite being eligible.

If the garbage collector works in aggressive mode, (common in servers,) the object will be collected sooner rather than later, but how soon depends on variables that we practically have no control over, such as at what rate existing objects are becoming eligible for collection, how many objects are pending to be finalized, etc. So:

Even though the object may be finalized within milliseconds, there are no guarantees as to how many milliseconds, and also please note that "milliseconds" is still a far cry from "now".
Even if the object gets finalized as quickly as possible, this is still going to happen in a separate thread, so finalization will always be desynchronized from the set of instructions which rendered the object eligible for finalization. If these instructions are followed by another set of instructions that in any way rely on finalization having already taken place, the second set of instructions will almost always fail.

If the garbage collector works in non-aggressive mode, (common in desktop and console applications,) then:

The object might not be collected unless the virtual machine starts running out of memory, which is at an entirely unknown and usually distant time in the future.
The object may still not be collected at all if our software completes before exhausting its available memory. If we are only dealing with unmanaged resources that belong to the current process, they will be automatically reclaimed upon process termination, so all will be good, but if we are dealing with resources that are external to our process, (e.g. controlling a peripheral,) these resources will not be released.

The garbage collector orchestrates collection and finalization according to memory availability concerns, but not according to other concerns which it has no knowledge of; consequently, if we are acquiring instances of a certain scarce resource at a high rate, the garbage collector will not hasten collection and finalization in response to the scarcity of that resource, because it has no knowledge of that scarcity. However, if that resource only gets recycled when collection and finalization occurs, and if we do not happen to be allocating and freeing memory fast enough to trigger frequent enough collection and finalization, then we will be consuming the resource faster than it is being recycled, so we will run out of it, despite everything seemingly being done right.
Finalization is unordered and does not respect containment hierarchy, which means that when the finalizer is invoked on an object, a random subset of the objects contained in this object may have already been finalized. This is a completely chaotic situation which makes it impossible to get anything non-trivial done within a finalizer.
The chaotic and non-deterministic conditions under which the finalizer executes make it virtually impossible to test any code that you put in a finalizer, so virtually all finalizers are written on a best-effort basis: if it seems to work as written, it will hopefully keep working, fingers crossed.
Finalization is documented as having a high performance cost, so the standing advice is that it is best to minimize its use, or avoid it altogether if possible.

Therefore, relying on finalization does not give us control over anything, on the contrary it takes control away from us. The official literature of both Microsoft DotNet and the Java Virtual Machine recommends using finalization for the purpose of releasing unmanaged resources, which is not just unhelpful but actually wrongful advice to give. The entire software industry has been blindly following this advice without first questioning its correctness, which has resulted in lots of buggy software out there.

[1] The language feature that C# calls "destructor" is a misnomer; it is not a destructor, it is a finalizer, and the choice of the tilde syntax to denote finalizers in C# as if they were C++ destructors has caused nothing but confusion. Microsoft has been reluctantly acknowledging this (⬀) and quietly correcting their terminology in their documentation. (⬀)

Disposal

In Java we have the `Closeable` interface, and in C# we have the `IDisposable` interface. The benefit of using these interfaces is that they allow explicit (i.e. deterministic and synchronous) triggering of cleanup, instead of relying on finalization to trigger it.

A C#-only note: In the C# world there is an understanding that `IDisposable` may also be used for performing regular cleanup at the end of an object's lifetime; however, its primary reason of existence is still regarded as being the release of unmanaged resources, so people are trying to use it for both purposes. At the same time, the releasing of unmanaged resources is still regarded as something that must always be attempted during finalization, so people are trying to write disposal methods which must work both when explicitly invoked during normal program flow, and when invoked by the finalizer. Needless to say, the complexity of this task is daunting, and the result is incredible amounts of confusion. The `Dispose(bool)` pattern (⬀) has been invented to help manage the chaos, but the result is still preposterously complicated, it suffers from boilerplate code on every single object that implements the pattern, it is largely untestable, and what is most disappointing is that absolutely no thinking seems to have gone in the direction of avoiding all this chaos in the first place.

Overall, the problem with disposal is that it is very easy to accidentally omit, and when it is omitted there is usually nothing to tell us that we did something wrong, other than performance degradation and inexplicable malfunction. As a matter of fact, the designers of both Java and C# anticipated the inevitability of such omissions, so they invented finalization as a fallback mechanism which is hoped to save the day despite the omissions. However, since finalization is notoriously unreliable, it is not a solution either; it is more like implementing an insurance policy by purchasing lottery tickets.

Unfortunately, the availability of finalization, and its deceitful promise of making everything right by magic, has steered programmers to regard disposal as largely optional, while in reality it is essential. All that disposal needs in order to be actually useful is a mechanism that will warn us when we forget to perform it, instead of a mechanism that will try to magically fix our omissions.

"Automatic" disposal

Both Java and C# provide special "automatic" disposal constructs, namely the `try-with-resources` statement of java, and the `using` keyword of C#, both of which implicitly invoke the disposal method even if an exception is thrown. However:

The only thing that is automatic about these constructs is that if you remember to use them, then they will save you from having to write some code that disposes an object; unfortunately,

You may very easily forget to use them.
You may very easily forget that your object requires disposal and therefore be unaware of the fact that you should have used them.

These constructs can only be used in the simplistic scenario where the lifetime of an object is fully contained within the scope of a single method; unfortunately, in all but the most trivial situations what we actually have is objects which are contained within other objects and live for a prolonged time, so the method that creates them is different from the method that destroys them. In all these cases, the automatic disposal constructs are of no use whatsoever, and the programmer must remember to do everything right.

Weak References

A weak reference is an object which receives special treatment by the execution environment, to achieve something which is not normally possible. It contains a reference to a target object, which is disregarded by the garbage collector when determining whether the target object is accessible. Therefore, if there are no other references to the target object, then the target object is allowed to be garbage-collected, at which point the reference inside the weak reference object is replaced with null.

Weak references do not actually help us manage the lifetime of objects, but they have been suggested as a mechanism that can help us design things so that there is no need to manage the lifetime of objects. The idea is that we can implement the observer pattern using weak references, so that observers do not need to remember to unregister themselves from the event source; instead, they can simply be allowed to become garbage-collected, and the event source will subsequently forget them.

This approach suffers from a number of drawbacks:

Weak references might save us from having to worry about the lifetime of event observers, but they do nothing for a wide range of other situations that require cleanup at the end of an object's lifetime.
Weak observers run the danger of being prematurely garbage-collected. When this happens, it is very difficult to troubleshoot, and the fix tends to require tricks and hacks.
Weak references are a bit too low level, a bit too esoteric, and a bit like magic, so suggesting their widespread use by the average programmer is a bit of a tough proposition.
The use of weak references represents a step backwards from the stated goal of gaining more control over the inner workings of our software.

A deeper look at the problem

In a language like C++, which has proper destructors, the lifetime of an object is well defined, and the compiler does all the work necessary to guarantee that this lifetime will end at the exact right moment, as long as we are using either local storage or smart pointers. However, in garbage-collected languages we have none of that; the lifetime of objects is not well defined by the language, so there is virtually nothing that the compiler can do for us. (As we have already shown, the `try-with-resources` statement of Java and the `using` keyword of C# are of very limited usefulness.)

In order to implement necessary cleanup at the end of an object's lifetime in garbage-collected languages, programmers either rely on finalization, or explicitly invoke objects to let them know that their lifetime is over. As we have already shown, finalization is asynchronous and non-deterministic, so it is unsuitable for basing any essential function of our software upon it, which means that explicit object lifetime termination is the only viable option.

Unfortunately, explicit object lifetime termination suffers from its own range of problems:

There is usually nothing in the code to give us a hint that we should place a call to end the lifetime of an object.
When we forget to end the lifetime of an object, there is never any immediate error to tell us that we forgot.
The problems that subsequently occur tend to be subtle, so we often do not notice them until a considerable time after the fact. For example, forgetting to unregister an observer from an event source turns the observing object into a memory leak, and causes the observing method to keep being needlessly invoked by the event source, to perform actions that in the best case only waste clock cycles without any value or effect, and in the non-best case cause malfunction.
When the malfunction does get noticed, it often seems inexplicable and does not tend to point to the source of the problem.
Even when we discover that a certain malfunction is due to the lifetime of an object not having been ended, it is usually difficult to tell at which point in the code it should have been ended. Often, in order to know this, we first need to know where in the code the object was allocated, but this information is not normally available.
Writing tests to catch omissions in object lifetime control is not only hard and tedious, but it also requires testing against the implementation rather than testing against the interface, which violates recommended best practices. (To test whether object A properly ends the lifetime of object B, we have to mock B and ensure that its lifetime termination method is invoked by A, but if we do this then we are by definition testing against the implementation of A.)

The Solution

Whereas it is generally true that "if you do everything right there will be no problems", this is a very bad rule to live by, because it completely disregards another very important rule which says "there will be mistakes". Reliance on everything being done right tends to result in brittle software designs, because some things will inevitably go wrong. We are definitely not advocating designs that are tolerant of mistakes; however, a software design must at the very least offer means of detecting mistakes and responding to them with hard error, diagnostic messages, or both; a design which relies on mistakes not being made, and at the same time is incapable of detecting the mistakes that will nonetheless be made, is doomed to run into trouble.

Object lifetime awareness is a design pattern for writing robust software. It begins by acknowledging that in garbage collected languages there tends to be widespread uncertainty with respect to the lifetime of objects, which results in bugs that are very difficult to troubleshoot and fix. While the garbage collector would ideally be taking care of a lot of things, by its nature it cannot take care of everything, and in practice it often does not even take care of things that it is expected to, due to subtle human error.

The impetus behind object lifetime awareness is that we have had enough of this uncertainty, so we are taking matters into our own hands by establishing definitive knowledge of the lifetime of our objects and taking full control over it. When an object has an inherent notion of lifetime, this notion must always be made explicit, and handled in a certain structured and recognizable manner, so that when mistakes occur, we receive hard errors and diagnostic messages, allowing us to fix problems without troubleshooting in the dark.

Specifically, every lifetime-aware object must do the following:

Encapsulate an "alive" state which:

Starts as "true".
At some moment transitions to "false".
Is not exposed.
It can be asserted.
Can be inspected with a debugger.

On debug runs, the lifetime-aware object must respond with hard error to any attempt to invoke any of its public instance methods once its lifetime is over.
On debug runs, the lifetime-aware object must discover any omission to end its lifetime, and generate a diagnostic message if so. (More on how to achieve this later.)

Please note that the definition of a "debug run" varies depending on which language you are using:

In C# it is a run of the debug build.
In Java it is a run with assertions enabled.

Please also note that automated test runs are usually debug runs.

Luckily we do not have to add lifetime awareness to all objects, we only need to add it to objects that belong to one or more of the following categories:

Objects that by their nature have a concept of lifetime, such as timers, windows, files, network connections, notification suppressors, etc.
Objects that once initialized, are known to have some cleanup to do eventually.
Objects with which other objects may register in some way. (To ensure that each object that registers does not forget to unregister.)
Objects that contain (own) other lifetime-aware objects.

In each system the objects that can benefit from lifetime awareness tend to be relatively few, while the majority of objects can continue being blissfully unaware of their lifetime, letting the garbage collector handle it.

In certain rare cases, a lifetime-aware object may control its own lifetime; however, far more often, the lifetime of an object is meant to be controlled by other objects. In these cases, the lifetime-aware object should implement the disposal interface of the language, primarily in order to document the fact that it is lifetime-aware, and secondarily so that the "automatic" disposal mechanism of the language can be used when the opportunity arises.

In Java, that would be an object implementing the `Closeable` interface, thus allowing us to sometimes make use of the `try-with-resources` statement.
In C#, that would be the an object implementing the `IDisposable` interface, thus allowing us to sometimes make use of the `using` keyword.

Please note that the use of these interfaces here has nothing to do with releasing unmanaged resources; The goal is object lifetime awareness, while the releasing of unmanaged resources is at best a side note and largely a red herring in this discussion. It is true that the original intention of these interfaces was to allow releasing unmanaged resources, but there is absolutely nothing, either in the interfaces themselves, or in the language specifications, or in the respective compilers, or in the respective runtime environments, which says that this has to be the only purpose of these interfaces, or the only way they should be used, or the only way they can be used. So, here we are using them for something else. Please completely disregard the issue of unmanaged resources for now, we will address them later.

By making objects aware of their own lifetime, we achieve the following:

Any discrepancy between an object's expected alive state and its actual alive state (i.e. whether we think it should be alive vs. whether it actually is alive) can be asserted against and therefore be swiftly and infallibly detected without any need for white-box testing.
The alive state of an object can be explicitly and deterministically controlled without ever having to rely on finalization to do it for us.
All necessary cleanup can be done when the alive state transitions to false, thus ensuring that each initialization action is always balanced by its corresponding cleanup action. This includes ending the lifetime of any contained (owned) objects, unregistering the object from whatever it had previously registered with, etc.
At the end of the object's lifetime we can take whatever extra measures are within our power to take in order to ensure that the lifetime of other objects is being correctly managed. For example, we can assert that any objects which had previously registered with this object have by now unregistered themselves.
More broadly, we construct our software to be in complete control over its inner workings, instead of leaving things to chance.

Detecting omissions

The main thing which makes object lifetime awareness a viable proposition is the promise of useful diagnostic messages in response to omissions to explicitly end the lifetime of objects. Without such diagnostic messages, object lifetime awareness would not be much different from existing practices.

Interestingly enough, (or perversely enough, depending on how you would like to see it,) the mechanism that we leverage in order to detect such omissions is the garbage collector itself. The idea is that an object can check during finalization whether it is still alive or not: if it discovers that it is being finalized while still alive, then this means that the programmer forgot to explicitly end the lifetime of the object at an earlier moment.

It is very important to note that once we detect that an object is still alive during finalization, we specifically refrain from repeating the widespread mistake of trying to correct the problem: we most certainly do not attempt to end the lifetime of the object at that moment; instead, we only generate a diagnostic message, alerting the programmer that they forgot to end the lifetime of the object at an earlier time. This is important because the checks performed during finalization are meant to be of a strictly diagnostic nature, (a quality assurance mechanism if you wish,) so they are only meant to be performed on debug runs, so our software better be working correctly without them on release runs.

One might protest that an object which has accidentally become a memory leak will never be finalized, so it will never discover that its lifetime was not ended. Luckily, this can be taken care of with a bit of infrastructural support and a bit of discipline: During application shutdown we ensure that our system undergoes an orderly and thorough cleanup phase, where all remaining lifetime-aware objects are terminated. Typically, this simply means ending the lifetime of the main application object, and this should cascade throughout the entire containment hierarchy, ending the lifetime of all objects. Once this cleanup phase is complete, and if this is a debug run, we force a full garbage collection, and we wait for it to complete before exiting the application. In doing so, we ensure that all finalizers are invoked, and this includes the finalizers of any objects that were inadvertently memory-leaked. Thus, any omission to end the lifetime of an object is detectable in the worst case during application shutdown. For this to work optimally, some extra discipline is necessary, for example avoiding to directly or indirectly anchor lifetime-aware objects in static storage.

In actual practice most omissions to control the lifetime of objects happen without the objects necessarily also becoming memory-leaked, so the objects do get garbage-collected, so the omissions are detected at various moments during runtime when garbage collection occurs. For this reason, it is beneficial on debug runs to introduce regular forced garbage collection, thus detecting omissions as soon as possible after they happen. The right moment to force garbage collection tends to be:

On web servers, immediately after servicing each client request.
On desktop applications, immediately after each application logic idle event.
(An application logic idle event is similar to the graphical user interface idle event, except that it happens less frequently, i.e. not after every single event from the input system such as a mouse move, but only after the application logic has actually had some work to do.)
On data processing systems with a main loop, at the end of each iteration of the main loop.

It is worth stressing that forced garbage collection only needs to be employed as a diagnostic tool, and only on debug runs. On release runs there is never a need to force garbage collection because all object lifetime control issues are presumed to have already been addressed.

Forced garbage collection can also be used as a diagnostic tool during automated software testing; however, if our tests are fine-grained, (as the case usually is with unit tests,) it is advisable to refrain from forcing garbage collection after each test, because a full run of the garbage collector tends to be expensive, so its frequent use may multiply the total run time of a test suite by a very large factor. The ideal is to perform just one forced garbage collection at the end of all tests, and if any object lifetime control failures are detected, then and only then do another run of all tests with forced garbage collection enabled after each test, to detect precisely in which tests the failures occur.

In order to force garbage collection at will during testing, one needs a testing framework which supports this, and I am not aware of any, but if you do not make use of any exotic features of your existing testing framework, it is easy to write your own and take control yourself.

Addendum: Lifeguards

For an object to be aware of its own lifetime and to issue diagnostic messages when its lifetime is not properly controlled, a certain amount of functionality is needed, and we do not want to be coding this functionality by hand in each class that we write, so we will be delegating as much of the work as possible to a separate class. An appropriate name for such a class would be `ObjectLifetimeGuard`, but this is a mouthful, so we will simply abbreviate it to `LifeGuard`.

The lifeguard exposes only 2 methods:

A lifetime-assertion method which is invoked to assert that the lifeguard is still alive.
An end-of-lifetime method which is invoked to let the lifeguard know that its lifetime is over.

With the introduction of the lifeguard, each lifetime-aware class only needs to do the following:

Obtain during construction, and fully encapsulate, an instance of `LifeGuard`.
Perform an assertion at the beginning of each public instance method, (by definition only on debug runs, since it is an assertion,) which simply delegates to the lifetime-assertion method of the lifeguard.
Implement the object disposal interface of the language at hand, performing whatever cleanup actions are necessary, and then delegating to the end-of-lifetime method of the lifeguard.

The lifeguard does the following:

On debug runs, it encapsulates an `alive` state which starts as `true`.
It implements the is-alive-assertion method as follows:

On debug runs, it returns `true` if the object is alive, and throws an exception if not.
On release runs, it always throws an exception, because it is only meant to be invoked from within assertions, and assertions are not meant to execute on release runs.

It implements the end-of-lifetime method as follows:

On debug runs, it first asserts that the object is currently alive, and then transitions the alive state to false.
On release runs, it does nothing.

On debug runs it defines a finalizer which checks whether the object is still alive during finalization, and generates a diagnostic message if so.

Notes:

A lifeguard is obtained by invoking a factory method instead of using the `new` keyword, because this method will return something different depending on whether this is a debug run or a release run. The factory can come in the form of a `static` method for simplicity, or it can come in some other form if necessary.
The interface of the lifeguard has been designed in such a way that its alive state can be asserted without being exposed. This has the effect of:
- Preventing misuse
- Allowing for a high performance implementation for release runs which does not even contain that state.
- Still allowing the alive state to be inspected with a debugger on debug runs.
Lifetime-aware objects that have a need for some similar state which is queryable must implement it separately. The fact that on debug runs this state will be mirroring the `alive` state of the lifeguard is irrelevant.
In certain environments which support asynchronous method invocations it might be impossible to guarantee that no method is ever invoked past end of lifetime; these are exceptions to the rule, which need special handling by means of `if` statements instead of assertions. Since the lifeguard only allows asserting the alive state without exposing it, such objects will need to implement their own `alive` state in parallel to the lifeguard.
As a rule, triggering hard error is preferable over generating diagnostic messages; however, an omission to end the lifetime of an object can only be detected during finalization, and by that time it is already too late for any fail-fast measures, so what we have here is an exception to the rule: in this particular case, it is okay if we just generate a diagnostic message. If needed, extra measures can be taken to alert the programmer to not forget to look at the diagnostic messages.
The diagnostic message generated in the event of an omission to end the lifetime of an object is meant to include a stack trace, complete with source filenames and line numbers, showing precisely where in the source code the object was allocated, to help us easily locate and fix the problem.

Unfortunately, this stack trace needs to be collected by the lifeguard during construction, just in case it will need to be displayed during finalization, but in many environments collecting a stack trace is unreasonably expensive, so if each lifeguard instantiation was to involve collecting a stack trace, this would run the danger of slowing down our debugs runs to the point of making them unusable. (Obtaining a stack trace with source filenames and line numbers a few dozen times per second incurs a noticeable penalty on the JVM, while under DotNet the penalty is catastrophically more severe.)

For this reason, a special procedure is necessary: by default, stack traces are not collected, so a lifeguard which detects an omission to end the lifetime of an object reports only enough information to help us identify the class of the containing object. Once we know the class, we can go to the source code and flip a flag which enables stack trace collection for lifeguards of that specific class only, so that we can then re-run and obtain a message which includes a stack trace. Once we have solved the problem, we put the flag back to its default value to avoid the performance hit.

Both in C# and in Java there is an established tradition which says that methods involved in the closing or disposing of things should be forgiving, in the sense that multiple invocations should be permitted with no penalty. In my opinion this practice is ill-conceived, so instead I prescribe an end-of-lifetime method which asserts that it is never invoked twice. This is in line with the overall theme of object lifetime awareness, which is to gain greater control over the inner workings of our software. I am perfectly aware of the fact that this is parting ways with a tradition cherished by the entire industry; it is perfectly fine to part ways with traditions when you know better, especially since another term for tradition is "capricious progress-stopper".
The lifeguard is designed in such a way that on release runs it contains no state and performs no action; therefore, it need not be instantiated once per lifetime-aware object; instead, it can be a singleton, and all lifetime-aware objects can receive the same reference to its one and only dummy instance. Thus, the performance cost of using the lifeguard on release runs is near zero.

An implementation of lifeguard in C# is as follows:

#nullable enable

using Sys = System;

using Collections = System.Collections.Generic;

using System.Linq;

using SysDiag = System.Diagnostics;

using SysComp = System.Runtime.CompilerServices;

public abstract class LifeGuard : Sys.IDisposable

{

public static LifeGuard Create( bool collectStackTrace = false, //

[SysComp.CallerFilePath] string? callerFilePath = null, //

[SysComp.CallerLineNumber] int callerLineNumber = 0 )

{

if( !DebugMode )

return ProductionLifeGuard.Instance;

Assert( callerFilePath != null );

if( collectStackTrace )

return new VerboseDebugLifeGuard( 1 );

return new TerseDebugLifeGuard( callerFilePath!, callerLineNumber );

}

public abstract void Dispose();

public abstract bool IsAliveAssertion();

private sealed class ProductionLifeGuard : LifeGuard

{

public static readonly ProductionLifeGuard Instance = new ProductionLifeGuard();

private ProductionLifeGuard() { } //nothing to do

public override void Dispose() { } //nothing to do

public override bool IsAliveAssertion()

=> throw new Sys.Exception(); //never invoke on a release build

}

private class DebugLifeGuard : LifeGuard

{

private bool alive = true;

private readonly string message;

protected DebugLifeGuard( string message )

{

this.message = message;

}

public sealed override void Dispose()

{

Assert( alive );

alive = false;

System.GC.SuppressFinalize( this );

}

public sealed override bool IsAliveAssertion()

{

Assert( alive );

return true;

}

protected static string GetSourceInfo( string? filename, int lineNumber )

=> $"{filename}({lineNumber})";

~DebugLifeGuard()

{

SysDiag.Debug.WriteLine( "Object still alive!" );

SysDiag.Debug.WriteLine( message );

}

public override string ToString() => alive ? "" : "END-OF-LIFE";

}

private sealed class TerseDebugLifeGuard : DebugLifeGuard

{

public TerseDebugLifeGuard( string callerFilePath, int callerLineNumber )

: base( $" {GetSourceInfo( callerFilePath, callerLineNumber )}" )

{ }

}

private sealed class VerboseDebugLifeGuard : DebugLifeGuard

{

public VerboseDebugLifeGuard( int framesToSkip )

: base( buildMessage( framesToSkip + 1 ) )

{ }

private static string buildMessage( int framesToSkip )

=> string.Join( "\r\n", getStackFrames( framesToSkip + 1 ) //

.Select( getSourceInfoFromStackFrame ) );

private static SysDiag.StackFrame[] getStackFrames( int framesToSkip )

{

var stackTrace = new SysDiag.StackTrace( framesToSkip + 1, true );

SysDiag.StackFrame[] frames = stackTrace.GetFrames()!;

Sys.Type type = frames[0].GetMethod().DeclaringType;

Assert( typeof(Sys.IDisposable).IsAssignableFrom( type ) );

return frames.Where( f => f.GetFileName() != null ) //

.ToArray();

}

private static string getSourceInfoFromStackFrame( SysDiag.StackFrame frame )

{

string sourceInfo = GetSourceInfo( frame.GetFileName(), frame.GetFileLineNumber() );

return $" {sourceInfo}: {frame.GetMethod().DeclaringType}.{frame.GetMethod().Name}()";

}

Note that in theory, `private readonly string message` may have already been finalized by the time the destructor attempts to use it. In reality, I have never encountered this happening. If it becomes a problem, a simple `string.Intern()` could be used to permanently anchor these strings in memory, and that is okay despite the fact that it essentially introduces a memory leak, because it is only applicable to debug runs.

`DebugMode` is defined as follows:

public static bool DebugMode

{

get

{

#if DEBUG

return true;

#else

return false;

#endif

}

This allows us to minimize the use of `#if DEBUG`, which is ugly and cumbersome, and often results in code rot in the `#endif` part, which is only discoverable when trying to compile the release build.

Addendum: Ad-hoc alive states

Object lifetime awareness comes with a piece of advice:

Avoid ad-hoc alive states, implement them as separate lifetime-aware objects instead.

What this means is that a class should refrain from exposing a pair of methods for entering and exiting some special state of that class, and instead it should expose only one method which creates a new lifetime-aware object to represent that special state, and to exit the state when its lifetime is ended. Then, if the class has any methods which may only be invoked while in that special state, these methods must be moved into the special state object, so that they are not even available unless the special state has been entered.

By following this advice we split the interface of our object into smaller interfaces that are more simple and intuitive, we clearly document what is going on by making use of the lifetime-awareness pattern, and we take advantage of the error-checking and diagnostic facilities of the lifetime-awareness mechanism.

An example of an interface which could have benefited from this advice is the JDBC API. This interface exposes a multitude of methods for dealing with a relational database, and among them it exposes a pair of methods for beginning and ending a transaction. A better way of structuring that interface would have been to expose a single method for creating a new transaction object, which in turn ends the transaction when disposed. Then, all the data manipulation methods would be moved into that object, so that it is impossible to manipulate data unless a transaction is active.

Addendum: Unmanaged Resources

As we have shown, by leveraging hard error and diagnostic messages on debug runs and test runs, the object lifetime awareness pattern guarantees cleanup at the end of an object's lifetime.

Conveniently enough, this cleanup can, and should, include the releasing of unmanaged resources.

This in turn means that we never need to involve finalization for this task, not even as a fallback mechanism: unmanaged resources can be released infallibly, deterministically, and synchronously, i.e. always right now, as opposed to at some unknown moment later in time, if at all. This also means that on release runs we do not need finalization at all.

In essence, the releasing of unmanaged resources loses the special status that it has enjoyed so far, and becomes regular cleanup just as any other kind of cleanup. Our software sees to it that all necessary cleanup is always performed, without leaving anything to chance, and without any distinctions between really important cleanup and not-so-important cleanup.

C#-only note: This also means that there is no more need for that `Dispose(bool)` nonsense, either.

Further research and recommendations

Lifetime aware objects may benefit from a lifetime control service being propagated throughout the containment hierarchy so that they can register and unregister from it, thus:

Eliminating the need for a static factory of lifeguard;
Allowing us to at any given moment traverse the entire graph of lifetime-aware objects to see who is still alive;
Making it impossible to inadvertently construct a lifetime-aware object without having explicit knowledge of the fact that it is lifetime-aware, since the lifetime control service must be passed to its constructor.

Object lifetime awareness has the theoretic potential of completely eliminating all finalization overhead. Unfortunately, as things stand today, this potential cannot be realized, because existing runtime environments still offer essential classes that make unconditional use of finalization; e.g. classes that represent files, sockets, etc. These environments could benefit from new implementations of such essential classes that make use of the object lifetime awareness pattern so as to also avoid finalization. (While at it, please also note that these same classes could really benefit from not being needlessly multithreading-aware; when we have a use for multithreading awareness, we can add it ourselves, thank you.)

Additionally, if it could be definitively established that finalization is to be used only for the purpose of generating diagnostic messages, then the entire machinery implementing finalization in runtime environments could be greatly simplified from the monster of complexity that it is today. Consider, for example, that garbage collectors are currently built to handle such preposterous situations as "object resurrection", which is what may happen if a finalizer decides to anchor an object in memory, thus taking an object which had previously become eligible for collection and making it not eligible anymore. If finalization could be made trivial, then object resurrection could become impossible, or it could result in hard error rather than having to be handled.

Also see my previous post michael.gr - Mandatory disposal vs. the "Dispose-disposing" abomination

2021-01-03

Object Lifetime Awareness