2018-04-11

GitHub project: mikenakis-rumination

NOTE: 
This project has been retired. The github link does not even work anymore. 
This page only serves historical documentation purposes.


Making plain old java objects aware of their own mutations.



The mikenakis-rumination logo.
Based on original from free-illustrations.gatag.net
Used under CC BY License.

2018-04-09

GitHub project: mikenakis-testana

A command-line utility for running only those tests that actually need to run.


The mikenakis-testana logo, profile of a crash test dummy by Mike Nakis
Used under CC BY License.

2018-04-08

GitHub project: mikenakis-classdump

A command-line utility for dumping the contents of class files.



The mikenakis-classdump logo.
Based on an image found on the interwebz.


2018-04-07

GitHub project: mikenakis-bytecode

A lightweight framework for manipulating JVM bytecode.



The mikenakis-bytecode Logo, an old-fashioned coffee grinder.
by Mike Nakis, based on original work by Gregory Sujkowski from the Noun Project.
Used under CC BY License.

2018-04-05

My Very Own™ Coding Style

Foreword


In my career I have experimented a lot with coding styles, mostly on pet projects at home, but also in workplaces where each developer was free to code in whatever way they pleased, or in workplaces where I was the only developer.

My experimentation has been in the direction of achieving maximum objective clarity and readability, disregarding convention, custom, precedent, and the shock factor: the fact that a particular style element might be alien to others plays very little role in my evaluation of the objective merits of the element.

The counter-argument (the argument in favor of following convention) says that whatever benefits might be offered by a coding style cannot possibly outweigh the benefit of presenting others with a familiar coding style. This is of course true, and that's why it makes sense for an organization to choose a traditional coding style. However, I am not a company; I am an individual, and my own projects are mine. Furthermore, my counter-counter-argument is that I firmly believe that tradition is a synonym for progress stopper.

So, over the years I have tried many things, once even radically changing my coding style in the middle of a project. (Modern IDEs make it very easy to do so.)  Some of the things I tried I later abandoned, others I permanently adopted.

So, my coding style today is the result of all this experimentation. If it looks strange to you, keep in mind that every single aspect of it has been deliberately chosen to be this way by someone who was not always coding like that, and who one day decided to start coding like that in the firm belief that this way is objectively better. 

In moving on with each of these changes over the years, I had to overcome my own subjective distaste of the unfamiliar, for the benefit of what I considered to be objectively better. So, if you decide to judge my coding style, please first ask yourself to what extent you are willing to overcome the same.

My Very Own™ Coding Style


I use this coding style for languages that belong to the C syntax family, for example C, C++, Java, and C#. These are languages with curly braces, a reduced set of keywords, and a moderate amount of parentheses. I hardly ever program in any other language, but when I do, I apply whatever parts of this coding style are applicable.

  • Tabs vs. Spaces: Tabs
    • I use tabs for indentation, because this allows different developers to view the code with the amount of indentation that they are accustomed to, without having to reformat the code. 
      • Spaces should never be used for indentation.
      • Tabs should never be used for anything other than indentation.
  • Tabular formatting: No
    • Tabular formatting refers to inserting spaces within statements in consecutive lines of code to align parts of the statements into columns across those lines of code.  So, for example, in statements that are of the form `variable-type variable-name = initializer-expression;` spaces would be inserted after the variable-types to align all the variable-names in a column, and more spaces would be inserted after the variable-names to align all the equals-signs in a column. 
    • I used to be a big fan of this; however:
      • Generics make this less appealing, because most type definitions might be short, but one generic type definition might be very long, resulting in lots of seemingly unnecessary whitespace. 
      • A change in one line of code may result in re-alignment of many lines around it, and diff tools are not smart enough to account for this, so the possibility of merge conflicts skyrockets. 
    • Thus, at some point my verdict became to drop tabular formatting.
  • Spaces:
    • Before or after unary operators: Never
    • Around binary operators: Always
    • Around ternary operators: Always
    • Before a comma: Never
    • After a comma: Always
    • Before opening parenthesis of function argument list: Never
    • Before opening parenthesis of flow-control keyword: Never
    • Inside parenthesized expressions: Never
    • Around parameter lists: Always
      • This means that a function call must look like this: `foo( a, b );` Note that there is a space after `(` and a space before `)`. 
      • This applies not only to function calls, but also to function declarations and to keywords that accept parameters.
      • Parameterless functions can still be coded like this: `foo();` because the rule is carefully worded to call for spaces around parameter lists, not spaces inside parentheses. When you are invoking a parameterless function there is no parameter list, therefore no spaces.
      • Note that although parameter lists require spaces, parenthesized expressions require no spaces, and therein lies the advantage of this pair of rules: it suddenly becomes clear which parenthesis belongs to a function call, and which parenthesis belongs to an expression. For example, passing an expression as a parameter to a function looks like this: `foo( a, (b + c) );`
      • Note that certain C# constructs like `typeof()` and `nameof()` are expressions, not functions, therefore their arguments must not be padded with spaces.
  • Right Margin Column: 120
    • In May of 2020 Linus Torvalds declared that the number of characters per line in the Code Style of the Linux Kernel was to be increased from 80 characters to 100 characters. That's laughable. We have had widescreen monitors since the beginning of the century. We can easily do 120 characters. I sometimes do 160 characters.
  • Hard right margin: No
    • The right margin is not meant to be a hard limit: if a line needs to be longer, make it longer. It is fine to push uninteresting stuff off the screen horizontally in order to fit more interesting stuff inside the screen vertically.
  • New line after attributes (C#) / annotations (Java): Never
    • This may push the function definition quite a bit far to the right, and that's fine.
    • If a function has lots and lots of attributes/annotations, it might look very ugly, but that's okay, because it happens very rarely, and when it does happen, maybe that is exactly how it should be: beautiful things should look beautiful, and ugly things should look ugly.
  • Empty lines before a block-style comment: One
    • If a block-style comment appears in code, there must always be a blank line before it.
    • This way, we are clearly indicating that the comment refers to the following line.
  • Empty lines within functions: Zero
    • Quite often programmers like to use blank lines to visually separate pieces of code that are conceptually different. The problem is, the blank line gives no hint about the concepts involved, so it is entirely useless to anyone but the person who inserted it. If it is worth leaving a blank line, then it is worth adding a block-style comment explaining why, in which case a blank line before the comment is also necessary due to a previous rule. Better yet, move the conceptually different code into a different function, and give that function a descriptive name, so that you need neither comment nor blank line.
  • Empty lines before a function: One
    • The rule which requires a blank line before a block comment covers all the cases where a function is preceded by a block comment that describes the function. However, quite often functions have descriptive names, rendering explanatory comments unnecessary. For these cases, we mandate separating functions with a blank line.
  • Empty lines between fields: Zero
    • If you really need a blank line between two fields, you must insert a block comment.
  • Empty lines anywhere else: Zero
    • Some people have the habit of leaving one or more blank lines in various odd places according to some ad-hoc rules that exist only in their head. The problem is, it is impossible to teach such rules to an automatic reformatting tool. Therefore, there shall be no such rules. There should never be any spurious blank lines anywhere.
  • Curly braces: Allman 
    • See http://en.wikipedia.org/wiki/Indent_style#Allman_style
    • Each opening brace and each closing brace is on a separate line, the braces are at the same indentation level as the controlling statement, and the code in the block is one indentation level deeper. 
      • Luckily, this is the curly brace style of C#. 
      • Unluckily, this is not the curly brace style of Java. 
        • I do not care; this is my coding style even when I code in Java. The Egyptian curly brace style which is so popular in the Java world is absolutely retarded.
  • Braces on single statement blocks: Never (unless the language requires them)
    • The "always" choice seems to be very popular; that's retarded. 
    • The "sometimes" choice also seems to be popular, but I strive for consistency.  
    • Note that in some languages some keywords have been introduced that require curly braces even if the controlled block consists of a single statement, for example the try-catch-finally clause in C++, Java, and C#. I greatly resent this.
  • Nesting: Always consistent*
    • Some people like writing quick one liners, for example `if( x ) return 0;` all in one line. That's unacceptable. 
    • Some people refrain from nesting the `case` labels in a `switch` statement, or if they do, then they refrain from nesting the code under the `case` labels. That's unacceptable. 
    • In C#, people quite often refrain from nesting the classes within their namespaces. That's unacceptable. 
    • In C#, people quite often do not nest cascaded `using` statements. That's unacceptable. 
    • The only case where I sometimes violate this rule, and I am not yet completely decided on how to go about it, is with single statement functions in Java, which I sometimes code in one line, not because I believe this is correct, but because I am expressing a wish that Java would offer a functional style of declaring functions the way C# does.
  • Type identifier casing: SentenceCase 
    • Even if the type is private.
  • Constant identifier casing: SentenceCase
    • Even if the constant is private.
  • Private member identifier casing: camelCase
    • A very popular choice is prefixing the identifier with `_` or `m_`; that's unacceptable.
  • Public member identifier casing:
    • C#: SentenceCase 
    • Java: camelCase
      • The camelCase choice of Java is retarded, but it would be too heretic even for me to go against it, mainly because there exist tools that use reflection to guess what methods are getters and setters, and everything goes haywire if the capitalization is not what these tools expect.
  • Static fields: Same as other fields
    • Note: this explicitly means that static fields must not be named differently from other fields. Some people like doing weird things like prefixing static fields with `s_`. That's not only mighty ugly, but also entirely unnecessary, because any half-decent IDE will color-code static fields for you.
  • Acronyms: SentenceCase
    • In other words, never use "GUID"; always use "Guid". The acronym becomes a word, so that it can be added to the spell-checker. 
    • Speaking of spell checkers:
      • The spell-checker must always be on
      • Every commit must pass inspection by the spell-checker
      • The spell-checker wordlist must be committed like any other file
      • The spell-checker wordlist must pass code review like anything else. 
    • That's how the quality of the codebase can be protected despite contributions from people with poor command of the English language.
  • Explicit `this`: Never
    • Unless a field is receiving its value from a method or constructor parameter, in which case the parameter must have the exact same name as the field, and subsequently `this` is necessary in order to refer to the field.
  • Use of `var`: Rarely
    • Only for non-trivial types, and only when the type is obvious.
    • Of course, you might ask, when is the type obvious? The answer is simple: the type is obvious only when the name of the type is present on the right side of the assignment.
  • Naming of files and classes: One class per file, exact same name
    • In Java this is standard, but there is one exception:
      • Java makes it impossible to access constructor parameters from field initializers. The solution to this is to pass the constructor parameter to the superclass, so that it can be stored in a protected member, so that it can be accessed by the field initializers of descendants. Quite often, we invent superclasses for no reason other than to be able to do just that. In these cases, it is okay (preferable even) if the superclass is package-private, and declared in the same file as the descendant.
    • In C# one class per file with exact same name is not standard, so it is worth stating. Again, there are a few exceptions: 
      • It is okay to declare all the classes that make up a small class hierarchy in a single file, as long as the file is named after the base class of the hierarchy.
      • It is also okay to declare trivial types like enums and delegates in the same file as the class that they conceptually belong to.
  • Namespace imports (C# only): Inside namespace declarations
    • Most people import their namespaces outside of their namespace declarations. This style guide mandates the opposite: namespaces must be imported inside namespace declarations. In other words, first we open our namespace, then we declare our imports, then we declare our class.
    • This is in accordance with the Principle of Smallest Scope, i.e. any given thing must have the smallest scope that it can possibly have.
  • Namespace Aliases (C# only)
    • For namespaces defined in the solution: Never
      • If you have defined a namespace in your solution, then you should never need to alias it. If it conflicts with a namespace defined outside your solution, then you should alias the external namespace.
    • For namespaces defined outside of the solution: Almost always
      • I have the habit of aliasing all external namespaces so as to make it evident exactly where each type is coming from. So for example, I never do `using System.Text` and reference `Encoding.UTF8`; I always do `using SysText = System.Text` and then I reference `SysText.Encoding.UTF8`. I make an exception for namespaces `System` and `System.Collections.Generic`.
  • Non-ANSI characters: Via Unicode Escape Sequences
    • That's because every once in a while some tool will garble non-ANSI characters by accident, and a) that's the kind of error that you will usually have no tests for, while b) even if there is a test, the non-ANSI character in the test might be also garbled, causing the test to pass, while it should fail.
  • Miscellaneous
    • If something can be private, it must be private.
    • If something can be final/readonly, it must be final/readonly.
    • If something can be final/sealed, it must be final/sealed.
    • If something can be of a less-derived type, it must be of a less-derived type
      • Unless you want to document something important; for example, you may want to use a `List` instead of a `Collection` to indicate that order matters.
    • If a string literal can be replaced with `nameof`, it must be replaced with `nameof`.
    • If a pair of parentheses can be omitted, it must be omitted.
      • Unless operator precedence is unclear and requires clarification. 
      • Note that this means that the expression after the `return` keyword must never be parenthesized.
    • Overriding methods must not have documentation comments. The documentation comment of an override is the documentation comment of the method it overrides.

2018-04-04

GitHub project: mikenakis-agentclaire

NOTE: 
This project has been retired. The github link does not even work anymore. 
This page only serves historical documentation purposes.


A Java Agent to end all Java Agents.


The mikenakis-agentclaire logo
based on a piece of clip art found on the interwebz.

2018-04-03

Open Source but No License

I have posted some small projects of mine on GitHub, mainly so that prospective employers can appreciate my skills. I am not quite ready to truly open source them, so I published them under "No License".  This means that I remain the exclusive copyright holder of these creative works, and nobody else can use, copy, distribute, or modify them in any way, shape or form. More information here: choosealicense.com - "No License" (https://choosealicense.com/no-permission/).

Pretty much the only thing one can legally do with these creative works is view their source code and admire it.

GitHub says that one can also make a copy of my projects, (called fork in GitHub parlance,) but I am not sure what one would gain from doing that, because you cannot legally do anything with the forked code other than view it and admire it.  Even more information here: Open Source SE - GitHub's “forking right” and “All rights reserved” projects (https://opensource.stackexchange.com/q/1154/10201)

(Okay, if you compile any of my projects and run it once or twice in order to check it out, I promise I will turn a blind eye.)

If you want to do anything more with any of these projects, please contact me.

2018-04-02

On JUnit's random order of test method execution

This is a rant about JUnit, or more precisely, a rant about JUnit's inability to execute test methods in natural method order. 

Definition: Natural method order is the order in which methods appear in the source file.

What is the problem?


Up until and including Java 6, when enumerating the methods of a java class, the JVM would yield them in natural order. However, when Java 7 came out, Oracle changed something in the internals of the JVM, and this operation started yielding methods in random order.

Apparently, JUnit was executing methods in the order in which the JVM was yielding them, so as a result of upgrading to Java 7, everybody's tests started running in random order. This caused considerable ruffling of feathers all over the world.

Now, the creators of the Java language are presumably running unit tests just like everyone else, so they probably noticed that their own tests started running in random order before releasing Java 7 to the world, but apparently they did not care.

Luckily, the methods are still being stored in natural order in the class file, they only get garbled as they are being loaded by the class loader, so you can still discover the natural method order if you are willing to get just a little bit messy with bytecode. 

However, that's too much work, and it is especially frustrating since the class loader is in a much better position to correct this problem, but it doesn't.  (The class loader messes up the method order probably because it stores them in a HashMap, which yields its contents in Hash order, which is essentially random. So, fixing the problem would probably have been as simple as using a LinkedHashMap instead of a HashMap.)

People asked the creators of JUnit to provide a solution, but nothing was being done for a long time, allegedly because if You Do Unit Testing Properly™, you should not need to run your tests in any particular order, since there should be no dependencies among them. So, the creators of JUnit are under the incredibly short-sighted impression that if you want your tests to run in a particular order, it must be because you have tests that depend on other tests.

When the creators of JUnit finally did something to address the issue, (it did not take them long, only, oh, until Java 8 came out,) their solution was completely half-baked: the default mode of operation was still random method order, but with the introduction of a special annotation one could coerce JUnit to run test methods either in alphabetic order, (which is nearly useless,) or in some other weird, ill-defined, so-called "fixed" order, which is not alphabetic, nor is it the natural order, but according to them it guarantees that the methods will be executed in the same order from test run to test run. (And is completely useless.) 

So, apparently, the creators of JUnit were willing to do anything except the right thing, and even though JUnit 5 is said to have been re-written from scratch, the exact same problem persists.

Why is this a problem?


Well, let me tell you why running tests in natural method order is important:

We tend to test fundamental features of our software before we test features that depend upon them, so if a fundamental feature fails, we want that to be the very first error that will be reported. (Note: it is the features under test that depend upon each other, not the tests themselves!)

The test of a feature that relies upon a more fundamental feature whose test has already failed might as well be skipped, because it can be expected to fail, but if it does run, reporting that failure before the failure of the more fundamental feature is an act of sabotage against the developer: it is sending us looking for problems in places where there are no problems to be found, and it is making it more difficult to locate the real problem, which usually lies in the test that failed first in the source file.

To give an example, if I am writing a test for my awesome collection class, I will presumably first write a test for the insertion function, and further down I will write a test for the removal function. If the insertion test fails, the removal test does not even need to run, but if it does run, it is completely counter-productive to be shown the results of the removal test before I am shown the results of the insertion test. If the insertion test fails, it is game over. As they say in the far west, there is no point beating a dead horse. How hard is this to understand?

Another very simple, very straightforward, and very important reason for wanting the test methods to be executed in natural order is because seeing the test method names listed in any other order is brainfuck.