2012-01-03

Pernicious Local Variable Initialization

Again and again I see programmers doing the following:
    string variable = string.Empty; 
    if( some.condition )
        variable = some.value;
    else
        variable = some.other.value;
You may see it with setting strings to String.Empty as in the example, or you may see it with setting integers to zero, pointers to null, etc. A very large number of programmers believe that when declaring a local variable you should always pre-initialize it with some value. The belief is so popular, that it practically enjoys "common knowledge" status, and is considered "best practice", despite being dead wrong. I call it The Practice of Pernicious Local Variable Initialization. Here is why.

The practice was not always bad. It started back in the dark ages of the first C compilers, when it was kind of a necessity. Compilers back then had a combination of unfortunate characteristics:

(Useful pre-reading: About these papers)
  1. They required all local variables in a function to be declared up-front.
  2. They did not require each variable to be initialized when declared. (A consequence of #1.)
  3. They were not smart enough to issue a warning if you tried to read a variable before writing it.
Back in those days, accidental use of uninitialized variables was a very common mistake, leading to many a monstrous bug. (See michael.gr - The Mother of All Bugs.) After having to troubleshoot and fix a few bugs of this kind in the first few days on the job, a new programmer would quickly learn to always pre-initialize every single local variable without asking why.

The practice of blindly pre-initializing variables continued well into the first half of the 1990s, even though by that time C and C++ compilers were capable of issuing warnings about uninitialized variables, because programmers were in the habit of either not enabling, or deliberately disabling, the warnings. Apparently, programmers would rather not have a piece of software suggest that it is smarter than themselves. (Besides, the pleasure of making undetected mistakes always was very popular, and still remains popular, judging by the universal trendiness of untyped scripting languages.)

After decades of blindly pre-initializing everything, it became a cargo cult habit, so programmers keep doing this today without really knowing why they are doing it, nor asking themselves if there are any downsides to this practice. 

And as it turns out, there are.

First of all, there is a violation of the principle of least surprise. When I see a variable being initialized to a certain value, I have to assume that this value has a certain role to play in the algorithm which follows. For example, seeing an integer being set to zero makes me think that what follows must be a loop using that integer as an accumulating sum; seeing a string variable being initialized with an empty string literal makes me expect a loop which will keep adding parts to it. So, when that's what I expect, it is rather disappointing when I look further down only to discover that none of that happens, and the value of the variable is just overwritten with something entirely different.

However, that's just a minor annoyance. 

It gets far worse than that.

By pre-initializing a variable with a value which is by definition meaningless, (since a meaningful value is not yet known at that time, otherwise you would just set that meaningful value to the variable and you would be done,) you are circumventing the safety checks of your compiler, and you are opening up the possibility of error: if you forget to assign a meaningful value to your variable further down, the compiler will not warn you, because as far as the compiler can tell, the variable has already received an initial value. The compiler does not know that the initial value is meaningless.

Luckily, modern compilers are not only capable of warning you if you attempt to use an uninitialized variable, but also warning you when you engage in Pernicious Local Variable Initialization. (They are just being polite and calling it "unnecessary" or "superfluous" instead of "pernicious".) Alas, programmers that keep making this mistake tend to have that warning disabled, too.

1 comment:

  1. Your point would be 200% right if only compilers always worked right. For example, when you have an if statement checking AAA Is Nothing, VS warns you that AAA might be Nothing during this call, so be careful... I guess I have to thank Microsoft for that, but the point stands. :)

    ReplyDelete