2012-01-03

Pernicious Local Variable Initialization

Introduction

Pernicious Local Variable Initialization is the cargo cult programming (W) practice of pre-initializing a local variable with some default value, even though that value will be overwritten further down. Another term I have for it is Shotgun Initialization, because it is applied indiscriminately, and on a "just in case" basis, without a valid intent or purpose. Most importantly, it is usually done without knowledge of the dangers involved in doing so.

(Useful pre-reading: About these papers)

What it is

Again and again I see programmers writing code like the following:

    string a = ""; 
    if( some_condition() )
        a = something();
    else
        a = other_thing();
    make_use_of( a );

In this example we have a variable which is initialized with a default value, and then in all execution paths further down it gets overwritten with another value before it is read. You may see it with initializing string variables with the empty string as in the example, or you may see it with setting integers to zero, pointers to null, etc.

A surprisingly large number of programmers are under the impression that a plain local variable declaration like "string a;" is incomplete. They have trained themselves to see such declarations as missing something important, without which bad things are going to happen. As a result, they believe that when a local is declared it must always be pre-initialized with some value, even when a meaningful value is not yet available.

The belief is so popular, that it enjoys alleged "best practice" status, even "common knowledge" status, despite it being dead wrong.

Why people do it

The practice of indiscriminately pre-initializing all variables was not always wrong. It started back in the dark ages of the first C compilers, when it was kind of necessary. Compilers back then had a combination of unfortunate characteristics:

  1. They required all local variables within a function to be declared up-front.
  2. They were not smart enough to detect an attempt to read an uninitialized variable.

Back in those days, accidental reading of uninitialized variables was a very common mistake, leading to many a monstrous bug. (See michael.gr - The Mother of All Bugs.) After having to troubleshoot and fix a few bugs of this kind, every new programmer would quickly learn to always pre-initialize every single local variable without asking why.

The practice of blindly pre-initializing everything continued well into the 1990s, even though by that time C and C++ compilers were fully capable of issuing warnings about uninitialized variables. The practice continued because programmers were refusing to believe that they could be out-smarted by a compiler, so they were either not enabling, or deliberately disabling the associated warnings. Apparently, programmers would rather make undetected mistakes than have a piece of software embarrass them by pointing out their mistakes to them. (And the trend continues strong today, judging by the universal trendiness of untyped programming languages.)

After decades of blindly pre-initializing everything, it became a cargo cult habit, so programmers keep doing this today, also in modern languages like Java and C#, without really knowing why they are doing it, nor asking themselves if there are any downsides to this practice.

And as it turns out, there are.

What is wrong with it

First of all, there is a violation of the Principle of Least Astonishment (W). When I see that a variable is initialized to a certain value, I am tempted to assume, based on the type of the variable and the initial value, that it has a certain role to play in the algorithm which follows. For example, seeing an integer variable being initialized with zero prepares me to see this variable being used as a counter, or as an accumulating sum; when that is what I expect, it is rather disappointing to look further down only to discover that none of that happens, and the variable is overwritten with something entirely different before it is ever used.

However, that's just a minor annoyance.

It gets worse than that.

When a variable receives a value only once, it is an effectively immutable variable. However, when a variable receives a value twice, then it becomes a mutable variable. If you have any self-esteem whatsoever, you are using a modern Integrated Development Environment (IDE) which is syntax-coloring mutable variables differently from immutable ones. Thus, your IDE will be coloring that variable as a mutable one, despite the fact that it was never meant to be mutable. This is a misleading signal, and coping with it causes cognitive overhead.

But wait, it gets worse.

Some data types do not have default values that you can pre-initialize a variable with, so the desire to always pre-initialize variables might lead you to misuse the type system.  For example, some languages (e.g. C#) support explicit nullability of reference types. This means that you cannot pre-initialize a reference variable with null if that reference variable happens to be non-nullable. If, in your desire to pre-initialize everything, you decide to turn a non-nullable reference into a nullable reference, then you have just done a disservice to yourself, and to anyone else who will ever look at that code, by making the code considerably more complicated than it needed to be. The same applies to enums: people often add special "invalid" or "unknown" values to their enums, to accommodate their craving for pre-initialization. Such counterfeit values also constitute an act of self-sabotage by adding needless complexity to everything.

And oh look, it gets even a lot worse.

Modern compilers of most mainstream programming languages do extensive data flow analysis and are fully capable of pointing out any situation where a variable is used without first having been initialized. Thus, accidental use of uninitialized variables is not supposed to be a problem today.

  • If you say "but I do not see any such warnings" then you are trying to write code without first having figured out how to enable all warnings that your compiler can issue. Do not do that. I do not care if you are writing software that will save the world, stop whatever it is that you are doing, figure out how to enable all warnings, enable them, and only then continue coding.
  • If you say "but my compiler does not support issuing such warnings" then you are using the wrong compiler. Stop using that compiler, and start using a different one.
  • If you say "but there is no such compiler for the language I use" then throw away everything and start from scratch with a different language. I do not care what it takes; in the 3rd millennium you cannot be programming without all warnings enabled.

Once you have warnings about uninitialized variables, the superfluous initialization of a variable becomes bad practice, because it circumvents other checks that the compiler does for you, and opens up the possibility of error:

If you begin by pre-initializing a variable with a value which is by definition meaningless, (since a meaningful value is not yet known at that time, otherwise you would have just used that meaningful value and you would be done,) then as far as the compiler can tell, the variable has been initialized. The compiler does not know that the initial value is meaningless. Thus, if you forget further down to assign a meaningful value to that variable, the compiler will not be able to warn you. So, you have deliberately sent yourself back in time, to the dark ages of the first C compilers, where warnings for uninitialized variables had not been invented yet. Essentially, you have achieved the exact opposite of what you were trying to accomplish.

Luckily, modern compilers are not only capable of issuing a warning if you attempt to use an uninitialized variable; they are also capable of issuing a warning when you unnecessarily initialize a variable. Alas, programmers that keep making these mistakes tend to have both of those warnings disabled.

Conclusion

Make sure you have all warnings enabled, and never initialize any variable before you have a meaningful value to assign to it.



(This post has evolved from an original answer of mine on CodeReview.StackExchange.com.)

1 comment:

  1. Your point would be 200% right if only compilers always worked right. For example, when you have an if statement checking AAA Is Nothing, VS warns you that AAA might be Nothing during this call, so be careful... I guess I have to thank Microsoft for that, but the point stands. :)

    ReplyDelete