michael.gr: On UML (oh, do not get me started)

The UML logo, by Object Management Group®, Inc. from uml.org; Public Domain.

This post is intended as support material for another post of mine; see michael.gr - The Deployable Design Document.

The Universal Modeling Language (UML) (Wikipedia) was intended to be a standard notation for expressing software designs, and to replace the multitude of ad-hoc notations that software architects have been using on various mediums such as whiteboard, paper, and general-purpose box-and-arrow diagram-drawing software. The idea was that by following a standard notation which prescribes a specific way of expressing each concept, every diagram would be readily and unambiguously understood by everyone.

It has miserably failed.

(Useful pre-reading: About these papers)

UML is probably very close to the top of the list of things that everyone mentions, but nobody uses, and this is due to a number of good reasons:

It is incredibly comprehensive, to the point where its sheer size acts as a very strong deterrent to most people attempting to learn it. There are about 20 different types of diagrams for different purposes, each with its own complete set of meticulously detailed notation and rules. UML actually begins to make sense once you realize that it has mostly been an effort to catalogue every imaginable type of diagram used in software development, and standardize the notation used in it, while most of these diagram types are actually irrelevant, or very seldom relevant, to our daily job. However, even if you pick a single diagram type that you happen to have some use for, and decide to learn just that one, the notation is still so comprehensive that the task is daunting.
Most of UML is so rarely useful that it is not worth the learning effort. In the extremely rare event that a software development team is to have a meeting in which they could benefit from having an Interaction Overview Diagram to point at, it will be a lot easier to use some ad-hoc but intuitive notation to get the point across, than to only schedule the meeting after every single one of the attendees has completed a UML course to refresh upon the intricacies of the UML Interaction Overview Diagram.
The type of UML diagram that has received most attention in the software engineering profession is the UML Class Diagram (see Wikipedia) which deals with representing a class, the structure of a class, and its relationships with other classes. Unfortunately:

The UML Class Diagram insists on prescribing a very specific type of notation for everything about a class, and this notation is not always intuitive, thus posing the same obstacles to understanding as posed by program code written in apocryphal syntax and convoluted structure: in both cases, it is all jargon. This might not be an issue for those who have already gone through the trouble of learning the jargon, but the uninitiated are bound to question the usefulness of the entire exercise.
The UML Class Diagram prescribes its notation in excruciatingly meticulous detail, so there is no information hiding, and no abstraction: the amount of information contained in a UML Class Diagram is roughly the same as the amount of information contained in a C or C++ header file, or in a Java Interface, so there is virtually nothing to be gained by looking at one vs. looking at the other, which in turn seriously begs the question of why should we be doing double book-keeping.
The UML Class Diagram is much too low-level and too finely detailed to be pertinent to software systems design, where the unit of interest is the system component, corresponding to an entire module, rather than to individual classes within a module. It is also becoming even less pertinent as classes are becoming less important in programming due to the modern shift towards functional rather than object-oriented programming.

UML is mostly used as documentation, meaning that its role tends to be indicative or suggestive, and usually non-enforceable and non-materializable. This means that mistakes made in the use of the meticulously detailed notation generally go undetected, or might be detected by colleagues, but not by automated validation tools, because for most types of UML diagrams, there exist no such tools.
UML is trying to solve problems which do not exist: When a human needs to communicate something to a machine, this has to be done in a perfectly inambiguous fashion, which makes special notation necessary, i.e. jargon. However, when there is no machine involved, and a human simply needs to communicate something to other humans, what matters most is to get the point across, so jargon is actually undesirable, despite the unambiguousness that it would bring. That is okay, because humans thrive in ambiguity. In other words, UML is an attempt to apply a rigid engineering discipline to a form of communication which is fine as it is: free and fluid. (One of the "Three Amigos" that created UML had a military background; coincidence? maybe.)
In an attempt to make UML more pertinent to the software development process, some UML tools offer some automatic code generation features. Unfortunately, automatic code generation is almost always a bad idea, because each time the design changes, code generation must be re-applied, but this invariably results in the following bad things happening to code that has already been hand-written by programmers:

Hand-written code is overwritten with auto-generated code and thus forever lost, or
Hand-written code does not compile anymore due to dependencies on automatically generated definitions which have now changed, or, more often,
Both of the above.

The idea that you can apply automatic code generation once and never repeat it stems from the "all design up-front" doctrine, which may have been strong back in the 1990s when the foundations of UML were laid down, but the doctrine died soon thereafter, and it has been dead for decades now.
From the plethora of diagram types offered by UML, the only one that could perhaps be useful in our daily jobs is the UML Component Diagram (see Wikipedia) but there exist no tools that I am aware of that are capable of either guiding the composition of such a diagram from existing software components, or materializing such a diagram into a running system. Furthermore, if any such tools were to be introduced, they are unlikely to be well-received, because by now people have developed a distaste towards UML and anything associated with it.
UML literature follows a lofty standardspeak writing style which is incomprehensible. I tried looking up the term "collaboration" and here is what I found in IBM literature:

In UML diagrams, a collaboration is a type of structured classifier in which roles and attributes co-operate to define the internal structure of a classifier.

There are two problems with this:

The definition depends on other definitions. This happens everywhere in UML. So, in order to understand a certain term you first have to understand other terms, and quite often the definitions make circles, so in order to understand anything you have to have superpowers.

This kind of looks like a recursive definition. They may be implying that there is something hierarchical in the nature of the concept, but they are not saying it. Definitions are written with the goal of being correct, not with the goal of being understood.

Okay, let's look at the next sentence:

You use a collaboration when you want to define only the roles and connections that are required to accomplish a specific goal of the collaboration.

Surprise! Recursion again. Sorry, but now it makes absolutely no sense. And that's how it goes with UML.

To summarize:

UML is insufferably baroque.

It should have never been, and it should cease to be.

It should be let go into the good night.

Mandatory Grumpy Cat Meme. UML: I hate it.

2022-08-16

On UML (oh, do not get me started)

No comments:

Post a Comment