Mike Nakis on Code Craftsmanship

I was once asked what are my favorite means of ensuring the quality of the code that I write. Off the top of my head I could give a few answers, but it occurred to me afterwards that I could of course have said a lot more. I will try to make a list here. I could probably write a book on this.

  • Assert everything.  When I look at code, I don't ask myself "should I assert that?" Instead, I ask myself "is there anything that I forgot to assert?" The idea is to assert everything that could possibly be asserted, leave nothing assertable unasserted. Assertions take care of white-box testing your code, so that software testing can then be confined to the realm of strictly black-box testing, as it should. Assertions cost nothing, so you can go wild with them: go ahead and assert that your array is sorted before performing binary search on it; verify that your binary search worked correctly by comparing its result against the result of a linear search for the same item; yes, the time complexity of these assertions is far greater than the time complexity of the operation they guard; that's fine, because remember, assertions cost nothing, since they are only enabled on development builds. For more information about why assertions are awesome, read this: https://blog.michael.gr/2014/09/assertions-and-testing.html
  • Do black-box testing, not white-box testing. Heed the advice that says test against the interface, not the implementation. Unit Testing is testing against the implementation, so despite the entire software industry's addiction to it, it is bad practice and should be avoided. Do Incremental Integration Testing instead, which only tests interfaces. With code that is choke-full of assertions, it works really well. Incidentally, this means that mocking, despite being an admirably nifty trick, should never be used: if you are using mocks in your tests, then you are doing white-box testing, so you are doing it wrong. For more information about why white box testing is bad, read this: https://blog.michael.gr/2021/12/white-box-vs-black-box-testing.html
  • Minimize state, maximize immutability. Design so that as much code as possible is dealing with data that is immutable. Re-examine every single class which contains mutable members, and many chances are you will find that it could be replaced with an immutable class.  Even if not, you might discover that many of its members could be immutable.
    Eschew frameworks, technologies, and techniques that prevent or hinder immutability. If you are using auto-wiring, use constructor injection only, so that you can store in final/readonly members.
    Note, however, that immutability is not important in function-local variables. There is absolutely nothing wrong with function-local mutation if it serves the slightest purpose. Which brings us to the next point:
  • Do overwrite function parameters. There exists a widespread cargo-cult programming practice of never overwriting a parameter within a function, and some languages (e.g. Scala) even prohibit it, which is deplorable. Go ahead and do this, (if your language allows it,) when the original parameter value should not be used in the remainder of the function. In doing so you are minimizing the number of variables that are in scope and preventing accidental use of the original value.
    The origins of the practice of never overwriting function parameters are quite arcane: some old versions of Fortran would pass everything by reference, including constants. So, if someone invoked your function with the constant 3 for parameter x, and then within that function you assigned 5 to parameter x, then from that moment on the constant 3 would actually have the value of 5 in the entire program. As a result, old computer scientists decreed that function parameters should never be reassigned, and they kept passing this advice to newer generations without ever bothering to rethink it.
  • Avoid b-to-a style conversions, use a-from-b style instead.  When I see `A = AfromB( B )` I can immediately tell that it looks correct, since A is on the side of A and B is on the side of B. However, when I see `B = AtoB( A )` I have to stare at it for a little while longer before I can tell whether it is correct or not. Of course, this is a trivial example: in real-world situations, the identifiers, as well as the call chain, could be much longer. This is related to Joel Spolky's notion that correct code should look correct, while wrong code should look wrong, and it is especially important since the entire industry has traditionally been doing it in precisely the wrong way with b-to-a style conversions.
  • Avoid unnecessary braces. Doing so keeps the code more compact, making more statements fit within the screen. The cargo-cult programming convention of enclosing even single-statement blocks within curly braces allegedly avoids bugs caused by trying to add a second statement to the block while forgetting to introduce curly braces. This has actually happened to me once, and the programmer who introduced the bug in my code did not even apologize, because he considered it my fault for not having provided the curly braces for him to insert his second statement in. The fact of the matter is that any halfway decent IDE will flag such a mistake with a warning, so this is not a problem today. Of course, in order to have that warning enabled, you must be keeping a consistent indentation style everywhere, right?
  • Minimize flow control statements, especially the `if` statement. If there is any opportunity to structure a piece of code so as to save an `if` statement, the opportunity should be pursued tenaciously.
  • Put the complexity in the design, not in the code. If the code does not look so simple that even an idiot can understand it, then the code is too complex. When this happens, it usually means that shortcuts have been taken in the design, which have to be compensated for with overly complex code. Make the design as elaborate as necessary so that the code can be as simple as possible. Overly complex code is usually the result of violations of the Single Responsibility Principle. Which brings us to the next point:
  • Adhere to the Single Responsibility Principle like your life depends on it. Often, what you think of as a single responsibility can in fact be further sub-divided into a number of more fundamental responsibilities. 
    • Almost all of the code that we write performs, or can be thought of as performing, some kind of transformation, involving a certain number of participants. For example:
      • At the lowest level, an assignment operation transforms each bit of the destination variable into the value dictated by the corresponding bit of the source variable.  Obviously it involves two participants: the source and the destination.
      • At the highest level, a shopping web site transforms relational data and user input into pixels on the user's browser window and purchase orders in the logistics department. In this simplified view we have four participants, realistically we have many more.
    • Most transformations are of the simplest kind, involving only two participants, transforming one into the other. That's great, that's a single responsibility: convert A to B.
    • Many transformations involve three participants, and they tend to be appreciably complex.
      • In some cases they can be simplified into successive operations, one to go from A to B and another to go from B to C, meaning that there were in fact two different responsibilities which were identified and realized as separate steps. 
      • However, quite often they cannot be simplified, as for example when we are converting A to B by consulting C. That's a single responsibility which cannot be further broken down.
    • All to often, people manage to involve four or more participants in a single transformation. These tend to be grotesquely complex, and they invariably constitute violations of the single responsibility principle. It goes without saying that they must be avoided at all costs.
      Luckily, operations that involve more than 3 participants can always be refactored into multiple successive transformations of no more than 3 participants each, introducing intermediate participant types if necessary. (I have never heard of this being suggested by anyone before, so this could perhaps be The Mike Nakis Theorem of Simplification.)
  • Refactor at the slightest indication that refactoring is due; do not allow technical debt to accumulate. Avoid the situation of being too busy mopping the floor to turn off the faucet.  Allow a percentage of sprints to explicitly handle nothing but technical debt elimination. Do not try to spread the task of refactoring over feature development sprints, because a) doing so will not make the refactoring effort magically disappear, b) you will not do a good enough job at it, and c) the time estimation of the features will suffer.
  • Strive for abstraction and generalization. The urge to abstract and generalize is often mistaken as having reusability as its sole aim, and so it is often met with the YAGNI objection: "You Ain't Gonna Need It". The objection is useful to keep in mind so as to avoid over-engineering, but at the same time it must not be followed blindly, because abstraction and generalization have inherent benefits, regardless of the promise of reusability. Every problem of a certain complexity and above, no matter how application-specific it might seem to be, can benefit from being divided into a specialized, application-specific part, and an abstract, general-purpose part. Strive to look for such divisions and realize them in the design.
    • The general purpose part will be easier to understand because it will be implementing an abstraction. 
    • The application code will be easier to understand because it will be free from incidental complexity
In other words, if you can choose between the following two:
  • adding 5 lines of application code, vs.
  • adding 2 lines of application code and 10 lines of framework code
then opt for the latter, even if these 10 lines of framework code are unlikely to ever be reused.
  • Use domain-specific interfaces. Encapsulate third party libraries behind interfaces of your own devise, tailored to your specific application domain. Strive to make it so that any third-party library can be swapped with another product without you having to rewrite application logic. Conventional wisdom says the opposite: we have all heard arguments like:
    • "the best code is the code you don't write" (makes me want to invest in the business of not writing software) 
    • "a third-party library will be better documented than your stuff" (presumably because documentation is a skill your developers have not mastered) 
    • "if you run into trouble with a library, you can ask for help on stackoverflow, whereas with something you have developed in-house, you are stuck" (presumably because your developers know nothing of it, despite working with it every day.) 
The truth with application development is that the more you isolate the application logic from peripheral technologies, the more resilient your application logic becomes to the ever changing technological landscape, a considerable part of which is nothing but ephemeral fashions, the use of which is dictated not by actual technological merit, but by C.V. Driven Development instead. 
(See https://martinjeeblog.com/2015/03/11/cv-driven-development-cdd/)
  • Strive for what is simple, not for what looks easy. The simple often coincides with the easy, but sometimes the two are at odds with each other. Eschew languages and frameworks that provide the illusion of easiness at the expense of simplicity. The fact that a particular framework makes "hello, world!" an easy one-liner probably means that the ten-thousand-liner that you are aiming for will be unnecessarily complicated and hard to write.
    Watch this: https://www.infoq.com/presentations/Simple-Made-Easy
  • Avoid binding by name like the plague. Avoid as much as possible mechanisms whose modus operandi is binding by name: use them only for interfacing with external entities, never for communication between your own modules.  REST enthusiasts can cry me a river.
  • Always use strong (static) typing. Avoid any kind of weak typing (euphemistically called dynamic typing, duck typing, etc) and avoid languages and frameworks that require it or even just sympathize with it. Yes, this includes pretty much all scripting languages. Scripting language enthusiasts can cry me a river. (And yes, this includes Typescript too, because it sympathizes with Javascript.) 
  • Strive for debuggability. For example, do not overdo it with the so-called "fluent" style of invocations, because they are not particularly debuggable.  Do not hurry to adopt this or that cool new programming language before you have made sure that debugger support for it is complete and working properly. 
  • Resist the "idiomatic" craze. Contrary to popular belief, doing things in whatever way is considered idiomatic for the programming language at hand is never an end in and of itself; Avoid the use of idiomatic ways of doing things unless you are convinced they are superior.
  • Strive for testability.  Design interfaces that expose all functionality that makes sense to expose, not only functionality that is known to be needed by the code that will invoke them. For example, the application may only need an interface to expose a `register()` and `unregister()` pair of methods, but `isRegistered()` also makes sense to expose, and it will incidentally facilitate black-box testing.
  • Enable all warnings that can possibly be enabled. The fact that a certain warning may, on rare occasions, be issued on legitimate code, is no reason to disable the warning: the warning should be enabled, and selectively dealt with on a case by case basis.
    One way of dealing with a warning on legitimate code is by suppressing it. Warning suppression should always be as localized as possible, meaning that it should be done on the individual line where the warning is issued, instead of the entire function or the entire file.
    Another, better way of dealing with warnings is by using programmatic constructs that avoid  them. For example, suppose you have the following scenario: you are calling a function with multiple overloads for different parameter types, but you need to be explicit about the precise overload that you are selecting, so you are casting your parameter to the intended type despite the fact that your parameter is already of that type. If you do that, you should be receiving a "redundant cast" warning. You might be tempted to suppress the warning, but a better approach is to add a set of non-overloaded functions with different function names for different parameter types, thus making explicitness-by-casting unnecessary.
    Some warnings, like "unused identifier", occur on legitimate code too often for selective suppression to be practical. For those warnings, consider using an IDE that supports a "weak warning" level, which is highlighted inconspicuously, so it can be easily filtered out by your eyes, but the visual clue is there in case it points to something unexpected.
    And of course some silly warnings occur on legitimate code all the time, so it goes without saying that they need to be disabled, but in my experience they are far fewer than the average programmer thinks they are.
  • Strive for readability. Code is generally write-once, read many. We tend to read our code several times as we write it, and then many more times throughout its lifetime as we tweak it, as we write nearby code, as we browse through code to understand how things work, as we perform troubleshooting, etc. Therefore, choices that make code easier to read are preferable even if they make code harder to write. This means that languages that achieve great terseness of code are not really delivering something of value, (I am looking at you, Scala,) because verbosity of code is not one of the major problems that our profession is faced with; unreadable code is. This also means that certain languages whose grotesquely arcane syntax has earned them the "write-only language" designation are not to be touched with a 10 ft. pole. Perl enthusiasts can cry me a river.
  • Use an IDE with a spell checker. Avoid acronyms and abbreviations, and anything that fails to pass the spell check. Add the spell checking dictionary of the IDE to source control and review any commits to it just as you review any other code.
    Modern IDEs have formidable auto-completion features, so using long identifiers does not mean that you have to type more. But even if it did, typing is not one of the major problems that our profession is faced with; unreadable code is.
    Note that when I say "avoid acronyms" I do not mean replace "GUID" with "GloballyUniqueIdentifier"; however, "GUID" should be replaced with "Guid", so that the spell checker can treat it as a word and spell-check it.
  • Pay attention to naming. Strive for good identifier names and for a variety of names that reflect the variety of the concepts. A Thesaurus is an indispensable programming tool. Spend the necessary time to find the right word to name something, and dare to use names that you may have never heard anyone using before. (For example, if you need a name for a Factory of Factories, why not call it Industry?)
At the same time, be weary of bad identifier names: Any piece of code written by a programmer whose English language skills are poor should be reviewed by a programmer whose English language skills are good.

Note: in the following discussion when we speak of a noun or an adjective or a verb we actually mean a sequence of various parts of speech that effectively constitute a noun or an adjective or a verb. For example, "reticulated spline" is a noun (spline), "reticulated before dive" is an adjective (reticulated), and "dive for moog" is a verb (dive).

    • Classes: The name of a class must always be a noun and it must always be in singular form. If you need to signify plurality, do not use plural! Instead, append a plurality-signifying term which is in turn a singular noun. For example, if you have a class that implements a group of entities, do not call it 'Entities', call it 'EntityGroup' instead. (Duh!)
    • Interfaces: The name of an interface must always be either an adjective, or a noun in singular form.
    • Variables: The name of a variable which is not a collection must always be in singular form, and it must always be a noun, unless it is of boolean type, in which case it may signify a condition, such as isEmpty. 
    • Collections: The name of a variable which is a collection must always be a noun and it must always be in plural form. (This is the only case where plural is allowed.)
    • Pure functions: The name of a pure function must always be a noun unless it returns boolean, in which case it may signify condition, such as hasChildren(). The name may be in plural form if a collection is returned.
    • Side-effect functions: The name of a function that performs an operation (has side effects) must always be a verb. (And of course not a condition-signifying verb.) If the function returns a result indicating success or failure, the name must begin with 'try', which must then be followed by the actual verb, for example 'tryAdd()'.
Never begin a function name with the prefix 'check'. Doing so is a typical example of a developer choosing identifiers according to concepts in their head, without the slightest concern as to how these identifiers will be understood by others. A function that checks something and then does absolutely nothing about it would be absolutely useless, so presumably, this function takes some kind of action based on the result of all that checking that it does. However, the word 'check' tells me absolutely nothing as to what that action is.
    • Does it return boolean? Then the prefix should have been 'is'.
    • Does it throw? Then the prefix should have been 'assert'.
    • Does it issue a notification? Then the prefix should have been 'notifyIf'.
    • Does it fix things? Then the prefix should have been 'ensure'.
    • etc.
  • Avoid defensive programming; do offensive programming instead. This means, never allow any slack or leeway, never fail silently, always ensure that everything is exactly as you expect it to be. Tolerances should not be kept down to a minimum; they should be kept at absolute zero. Fail fast, fail hard, fail eagerly and enthusiastically. For example:
    • Avoid things like a `Map.put()` method which either adds or replaces, and instead design for an `add()` method which asserts that the item being added does not already exist, and  a `replace()` method which asserts that the item being replaced does in fact already exist. If an add-or-replace operation is useful, (and it very rarely is,) give it a name that clearly indicates the weirdness in what it does: call it `addOrReplace()`. (Duh!) 
    • Avoid things like a `close()` method which may be invoked more than once with no penalty: assert that your `close()` methods are invoked exactly once.
Read this: http://trevorjim.com/postels-law-is-not-for-you
  • Use inheritance when it is clearly the right choice. The advice that composition should be favored over inheritance was very good advice during the nineties, because back then people were overdoing it with inheritance: the general practice was to not even consider composition unless all attempts to get things to work with inheritance failed. That practice was bad, and the fact that the predominant language at that time (C++) supported not just inheritance but actually multiple inheritance made things even worse. So the advice against that practice was very much needed back then. However, the advice is still being religiously followed to this day, as if inheritance had always been a bad thing. This is leading to unnecessarily convoluted designs and much weeping and gnashing of teeth. Even the original advice suggested favoring one over the other, it did not prescribe the complete abolition of the other. So, today it is about time we reword the advice to read know when to use inheritance and when to use composition.  Also heed the advice by Josh Bloch to "Design and document for inheritance or else prohibit it".
  • Favor early exits over deep nesting. This means liberal use of the `break` and `continue` keywords, as well as `return` statements in the middle of a method. The code ends up being a lot simpler this way. Yes, this directly contradicts the ancient "one return statement per function" dogma.  It is nice to contradict ancient dogma.
  • Avoid static mutable state like anthrax. Yes, this also includes stateful singletons. The fact that it only makes logical sense to have a single instance of a certain object in your world is no reason to design that object, and your world, so that only one instance of them can ever be. You see, I guarantee to you that the need will arise in the future, unbeknownst to you today, to multiply instantiate your world, along with that object inside it. As a matter of fact, it is quite likely that you will have to do that anyway during testing.
  • Put the tools of the trade into use. Armies of very good developers have worked hard to build these tools, don't you dare make their efforts go in vain. 
    • Use an IDE. Programmers who think that they are better off with their favorite text editor and their favorite assortment of command line tools should be admitted to rehabilitation.  (That having been said, programmers who are so attached to their IDE that they program by dragging and dropping code snippets around should consider that some desktop publishing job would perhaps suit them better.)
    • Use your IDE for building and for running tests. The continuous build pipeline should be your second line of defense, not your primary means of building and testing. Your IDE will always be a lot faster, and it has a built-in debugger. 
    • Use the debugger of your IDE as your first choice for troubleshooting anything, not as the last resort after all other options have been exhausted. This means that you should be using the debugger not only when there is trouble, but always, by default, so that it is ready when trouble occurs. This in turn means that you should never hit the "run" key on your IDE; hit the "debug" key instead. Always the "debug" key. Only the "debug" key. You are a programmer; act like it.
    • Do not optimize anything unless:
      • You know beyond any doubt that there is in fact a performance problem, and 
      • The profiling tool has shown precisely where the problem is. 
    • Do not even think that you are done with testing unless the code coverage tool gives you sufficient reason to believe so. 
    • Have your IDE perform code analysis on commit, and incorporate even more code analysis in the continuous build.
  • Design with reliability as a foundation, not as an afterthought.  For example, sharing data in a multi-threaded environment by means of traditional locking techniques ("synchronization") is both error-prone and untestable, because you cannot test for race conditions. "Error prone" and "Untestable" is a deadly combination; therefore, this way of sharing data should be abandoned. Instead, design for a lock-free, share-nothing approach that works by passing immutable messages, thus eliminating the very possibility of race conditions.
  • Design with security as a foundation, not as an afterthought. Security is not something that you can add on top of an insecure foundation, because there exist no automated tests that can detect security hazards and no amount of carefulness on behalf of programmers that is careful enough. So, what is necessary is architectural choices that eliminate entire classes of security hazards. (Do not worry, there will always be other classes of security hazards to have to worry about.) If a certain architectural choice is prone to vulnerabilities, do not make that choice. An example of a vulnerability-prone architectural choice which should be avoided like COVID19 is putting application code on the web browser. Full-stack developers can cry me a river.
  • Keep the logs clean. Do not vex your colleagues, and do not make your own life harder, with torrential info-level or debug-level spam in the logs. Keep the info-level messages down to an absolute minimum, and once debugging is done, completely remove all the debug-level log statements. Regularly use the "blame" feature of the version control system to remind developers of logging statements that they should remove. Never use the log for capturing metrics or any other kind of structured information; use some separate, specialized instrumentation facility for that.
  • Take maxims with a grain of salt. When someone says "no function should ever accept more than 4 parameters" or "no class should ever be longer than 250 lines" they are usually talking nonsense. A function should accept as many parameters as necessary to do its job, and if that is 15 parameters, so be it. A class should be as long as necessary to do its job, and if that is 2000 lines, so be it. Breaking things down to smaller units should be done because there is some actual merit in doing so, not because some prophecy said so.
  • Private static methods are fine. Really. An instance method has the entire object state at its disposal to read and manipulate, and this state may be altered by any other instance method. A static method on the other hand is obviously not in a position to read nor alter any of the object's state, and it is not able to invoke any instance methods that would do that. By its nature, a static method has to rely exclusively on parameters, which are all clearly visible at each call site. Thus, a static method is a magnificently less complex beast than an instance method. What this means is that private static methods are not the slightest bit evil as some folks are under the impression that they are, and we should have more of them. Personally, when I have a class that has both complex logic and mutable state I tend to move the complex logic into private static methods, reducing the instance methods to doing nothing but invoking private static methods, passing instance fields to them as necessary.
  • Do not fix it unless there is a test for it. So far I have not tried test-driven development, so I do not have an opinion about it yet, but what I have tried, and I have found to be immensely useful, is test-driven maintenance. So, if a bug is discovered, which obviously passed whatever automated tests you already had in place, do not hurry to figure out what the cause is and fix it. First, write a test that tests against the bug and fails, being completely agnostic of any theory that you might already have in your mind as to what is causing the bug. Then, fix the bug and watch the test pass. (And hopefully you have enough tests in place for every other part of your software system so as to have reasonable guarantees that in fixing this bug you did not break anything else.)
  • Use the type system to the fullest. Try to avoid using general purpose data types; try to use data types that are specific to the job instead. A classic example of this is the suggestion to always use a `Duration` data type instead of an `int` number of milliseconds, but it goes further than that. So, no, your height is not of type `double`, it is of type `Length`; your married status is not a boolean, it is an instance of `MarriedStatus`; and so on. Aficionados of untyped programming languages can cry me a river.
  • Avoid death by ten thousand little methods. Again and again I see codebases with multitudes of tiny methods, each containing just one or two lines of trivial code, aiming to ensure that not a single line of code is duplicated anywhere. The downside of this is that it increases the complexity of the calling tree and therefore the amount of mental effort required to make sense out of it. A new function is worth introducing if it has a well-defined, meaningful role to play. Difficulty in coming up with a name for a function, or having many functions with names that differ only slightly and fail to readily convey the difference between them, are both good indicators that these functions have no role to play other than to avoid code duplication. Of course there is merit in reducing code duplication, but not when the code in question is trivial. And when you see the possibility to deduplicate non-trivial code, then the well-defined, meaningful role of the function as well as the appropriate name for it tend to be immediately obvious.
  • Make the best out of break-on-exception. Set up your development tooling, and use whatever runtime mechanisms are necessary, so that the debugger always stops at any statement that throws an unexpected exception. 
    • Many programmers have the bad habit of doing all their troubleshooting by examining logs and post-mortem stack traces and theorizing as to what went wrong, instead of having the debugger break on exception and actually seeing what went wrong. This is extremely counter-productive. 
    • Unfortunately, exceptions are a somewhat complex topic, programming languages and their runtimes behave in complex ways when exceptions are thrown, and debuggers have complex mechanisms for dealing with them, none of which helps. As if that was not enough, it is not always easy to tell what constitutes an unexpected exception and what does not.
    • Thus, there exist several obstacles to accomplishing proper, usable, break-on-exception:
      • Our code throws and catches expected exceptions all the time, or uses external libraries that do that internally all the time; clearly, we do not want the debugger to stop on those.
      • One might think that the solution to this problem should be to configure the debugger to ignore caught exceptions and only stop on uncaught exceptions; unfortunately, that will not work either, because quite often we have exceptions that are technically caught but we nonetheless want the debugger to stop at:
        • An external library invokes our code, and our code throws an unexpected exception, which goes uncaught as far as our code is concerned, but it is caught by the external library. Thus, this exception is a caught exception, and yet we do want the debugger to stop at the statement that throws it.
        • In Java, the `try-finally` clause is implemented in a capricious way which catches exceptions and rethrows them, which means that any exception thrown from within this clause is a caught exception. A debugger configured to stop on uncaught exceptions will break at the closing curly brace at the end of the `finally` block, because that's where the caught exception is internally rethrown, which is completely useless. The same problem is encountered with other constructs of Java which are internally implemented using `try-finally`, such as the the `synchronized` clause and the `try-with-resources` clause.
      • Any exception which is under normal circumstances unexpected and unhandled may, when running tests, momentarily become expected and handled as a test specifically tests for it; we do not want the debugger to stop on that exception in that case.
    • Here is a stackoverflow question and answer which simplifies things a lot: https://stackoverflow.com/q/71115356/773113
  • Make the best out of the log. You should at all times be able to click on a log entry and be taken to the source line that generated that log entry, and you should also at all times be able to click on any line of a logged exception stack trace and be taken to the source that corresponds to that stack frame. I am appalled by how many programming environments do not offer this as the default mode of operation under all circumstances.
    • In the Microsoft Visual Studio world, for a line to be clickable it must start with a source pathname, followed by an opening parenthesis, followed by a line number, followed by a closing parenthesis and a colon. It can optionally be prefixed with whitespace. Luckily, both C++ and C# support efficient means of obtaining source file name and line number information: In C++ it is the `__FILE__` and `__LINE__` built-in pre-processor macros, while in C# it is the `CallerFilePath` and `CallerLineNumber` attributes. Unfortunately, the pathnames generated by these mechanisms are absolute, meaning that they start from the drive letter and include the kitchen sink, so you might want to programmatically convert them to pathnames relative to the solution folder before logging them. Visual studio also recognizes those. (This behavior is undocumented.)
    • In the JetBrains "Intellij Idea" world, for a line to be clickable it needs to contain an identifier, followed by an opening parenthesis, followed by a source filename-plus-extension, (but no path,) followed by a colon, followed by a line number, followed by a closing parenthesis. The identifier is meant to be a package name, but Idea does not interpret it in any way, so it can be anything. Due to a bug in Idea, if the word "at" appears in the log line, and if it is in any place other than immediately before the package name, then this mechanism breaks. (Note that this is all entirely undocumented.) Also note that this mechanism suffers from ambiguity in the case of multiple source files with the same filename. An alternative mechanism is to include a "file://" URI in the log entry, but then you would have to figure out the path from the package name, which is doable, but not easy. Unfortunately, Java does not provide any efficient means of obtaining source file name and line number information, so one has to internally throw an exception and obtain stack trace information from it. Fortunately, throwing an exception in the java world is not anywhere near as expensive as in the Microsoft world. Unfortunately, it is still expensive. You can see this performance penalty as one more reason to keep logging to a minimum.
  • Write code as if it will be reviewed by someone, even if it never will. Always try to take one more look at the code from a completely agnostic point of view, supposing that you know nothing about what it does, why it does it, how it does it. Does the code still make sense? Is everything obvious? If not, refactor it until it is as plain as daylight. If comments are necessary to explain what is going on, can the code be refactored so that the comments become unnecessary? Which brings us to the next point.
  • Avoid writing code comments. Never add a comment in the code unless it is really, really necessary.  (Note that this applies to code comments, not to public interface comments, which are of course always nice to have.)  
    • Code comments that simply state the obvious are unwarranted causes of alert, and if you repeat them enough they will force the reader to start treating your comments as noise, and may thus cause the reader to miss that rare comment which was actually important to note.
    • The purpose of a code comment should be to alert the reader that something special is happening here, which is not obvious, and cannot be explained by any means other than by English-language prose. This should only be necessary in exceptional situations, while the norm should be that the code is always so simple, and so self-explanatory, that no comments are necessary.
    • If you find yourself adding a code comment, first ask yourself whether there is any way to restructure the code so that the comment is unnecessary. There are many ways to do this, the simplest one being to extract the code into a separate function that has a self-explanatory name. Obviously, this is not possible when there is a lot of explaining to do, which in turn means that comments worth writing tend to be entire sentences explaining strange and complicated situations.
    • If a comment can be coded as an assertion statement, that's all the better. Comments saying "x must be greater than y" are absolutely retarded. Assert the darn thing, and spare us from the comment.
  • Stick with UTC everywhere. Use UTC and only UTC for any purpose that involves storing, retrieving, communicating, converting, calculating, and doing really anything whatsoever with time, except for the following two cases, and only the following two cases:
    • Converting a UTC time variable to a string to be shown to the user.
    • Parsing a string that was entered by the user into a UTC time variable.
  • Keep system design separate from implementation. Code should not be making assumptions about the design, so that the design is free to change with a minimal amount of changes to the code. For example, the multi-threading regime under which a system operates (whether the system  uses discrete threads, or a thread-pool, or no multi-threading at all) is a design concern. As such, multi-threading should be taken care of by the relatively small body of code which wires up (realizes) the system, and all implementation code should be capable of operating regardless of the multi-threading regime. Incidentally, this facilitates tests that run  under a single-threaded regime, to facilitate debugging.

No comments:

Post a Comment