2014-07-18

Benchmarking Java 8 lambdas

Now that Java 8 is out, I was toying in my mind with the concept of a new assertion mechanism which uses lambdas. The idea is to have a central assertion method that works as follows: if assertions are enabled, a supplied method gets invoked to evaluate the assertion expression, and if it returns false, then another supplied method gets invoked to throw an exception. If assertions are not enabled, the assertion method returns without invoking the supplied merhod. This would provide more control over whether assertions are enabled or not for individual pieces of code, as well as over the type of exception thrown if the assertion fails. It would also have the nice-to-have side effect of making 100% code coverage achievable, albeit only apparently so.

Naturally, I wondered whether the performance of such a construct would be comparable to the performance of existing constructs, namely, the 'assert expression' construct and the 'if( checking && expression ) throw ...' construct. I was not hoping for equal performance, not even ballpark equal, just within the same order of magnitude.

Well, the result of the benchmark blew my mind.

Congratulations to the guys that made Java 8, because it turns out that all three constructs take roughly the same amount of time to execute!

Here is my code:

package saganaki;

public class TestProgram
{
    public static void main( String[] arguments )
    {
        Benchmark benchmark = new Benchmark( 100 );
        for( int i = 0;  i < 3;  i++ )
        {
            run( benchmark, true );
            run( benchmark, false );
            System.out.println();
        }
    }

    interface Checker
    {
        boolean check();
    }

    private static boolean assertionsEnabled = true;

    private static void run( Benchmark benchmark, boolean enableAssertions )
    {
        TestProgram.class.getClassLoader().setClassAssertionStatus( TestProgram.class.getName(), enableAssertions );
        assertionsEnabled = enableAssertions;
        String prefix = "assertions " + (enableAssertions? " enabled" : "disabled");
        benchmark.runAndPrint( prefix + ": if-statement", new Runnable()
        {
            @Override
            public void run()
            {
                if( assertionsEnabled && System.out == null )
                    throw new IllegalArgumentException();
            }
        } );
        benchmark.runAndPrint( prefix + ": assert      ", new Runnable()
        {
            @Override
            public void run()
            {
                assert System.out != null;
            }
        } );
        benchmark.runAndPrint( prefix + ": assertTrue()", new Runnable()
        {
            @Override
            public void run()
            {
                assertTrue( () -> System.out != null, () -> { throw new IllegalArgumentException(); } );
            }
        } );
    }

    static void assertTrue( Checker checker, Runnable thrower )
    {
        if( !assertionsEnabled )
            return;
        if( checker.check() )
            return;
        thrower.run();
    }
} 

And here is the output:
assertions  enabled: if-statement 28237.0 iterations per millisecond
assertions  enabled: assert       29037.9 iterations per millisecond
assertions  enabled: assertTrue() 24593.2 iterations per millisecond
assertions disabled: if-statement 25118.5 iterations per millisecond
assertions disabled: assert       25912.2 iterations per millisecond
assertions disabled: assertTrue() 24825.6 iterations per millisecond

assertions  enabled: if-statement 25835.9 iterations per millisecond
assertions  enabled: assert       25127.6 iterations per millisecond
assertions  enabled: assertTrue() 25572.9 iterations per millisecond
assertions disabled: if-statement 25469.6 iterations per millisecond
assertions disabled: assert       25448.3 iterations per millisecond
assertions disabled: assertTrue() 25415.7 iterations per millisecond

assertions  enabled: if-statement 25838.6 iterations per millisecond
assertions  enabled: assert       25158.9 iterations per millisecond
assertions  enabled: assertTrue() 25541.7 iterations per millisecond
assertions disabled: if-statement 25373.6 iterations per millisecond
assertions disabled: assert       25402.5 iterations per millisecond
assertions disabled: assertTrue() 25370.9 iterations per millisecond

The first run shows quite different results from the next two runs, so it is best disregarded.

For the Benchmark class, see my previous post: michael.gr: Benchmarking code written in Java or C# (or any GCed, JITted, VM-based language)

Benchmarking code written in Java or C# (or any GCed, JITted, VM-based language)

Sometimes we need to measure the time it takes for various pieces of code to execute in order to determine whether a certain construct takes significantly less time to execute than another. It sounds like a pretty simple task, but anyone who has ever attempted to do it knows that simplistic approaches are highly inaccurate, and achieving any accuracy at all is not trivial.

Back in the days of C and MS-DOS things were pretty straightforward: you would read the value of the system clock, run your code, read the value of the clock again, subtract the two, and that was how much time it took to run your code. The rather coarse resolution of the system clock would skew things a bit, so one trick you would at the very least employ was to loop waiting for the value of the system clock to change, then start running your code, and stop running at another transition of the value of the system clock. Another popular hack was to run benchmarks with interrupts disabled. Yes, back in those days the entire machine was yours, so you could actually do such a thing.

Nowadays, things are far more complicated. For one thing, the entire machine tends to never be yours, so you cannot disable interrupts. Other threads will pre-empt your thread, and there is nothing you can do about it, you just have to accept some inaccuracy from it. Luckily, with modern multi-core CPUs this is not so much an issue as it used to be, but in modern VM-based languages like Java and C# we have additional and far more severe inaccuracies introduced by the garbage collection and the jitting. Luckily, their impact can be reduced.

In order to avoid inaccuracies due to jitting, we always perform one run of the code under measurement before the measurements begin. This gives the JIT compiler a chance to do its job, so it will not be getting in the way later, during the actual benchmark.

In order to avoid inaccuracies due to garbage collection, we always perform one full garbage collection before starting the benchmark, and we try to keep the benchmark short, so as to reduce the chances of another garbage collection happening before it completes. The garbage collection APIs of most VMs tend to be somewhat snobbish, and they do not really guarantee that a full garbage collection will actually take place when requested, so we need an additional trick: we allocate an object keeping only a weak reference to it, then we keep calling the VM to garbage collect and run finalizers until that object disappears. This still does not guarantee that a full garbage collection has taken place, but it gives us the closest we can have to a guarantee by using only conventional means.

So, here is the class that I use for benchmarking, employing all of the above tricks:

package saganaki;

import java.lang.ref.WeakReference;

/**
 * Measures the time it takes to run a piece of code.
 *
 * @author Michael Belivanakis (michael.gr)
 */
public class Benchmark
{
    private static final long NANOSECONDS_PER_MILLISECOND = 1000_000L;
    private final long durationInMilliseconds;

    /**
     * Initializes a new instance of {@link Benchmark}.
     *
     * @param durationInMilliseconds for how long to run the benchmark.
     */
    public Benchmark( long durationInMilliseconds )
    {
        this.durationInMilliseconds = durationInMilliseconds;
    }

    /**
     * Runs the benchmark, printing the results to {@link System#out}
     *
     * @param prefix   text to print before the results.
     * @param runnable the code to benchmark.
     */
    public void runAndPrint( String prefix, Runnable runnable )
    {
        double iterationsPerMillisecond = run( runnable );
        iterationsPerMillisecond = roundToSignificantFigures( iterationsPerMillisecond, 6 );
        System.out.println( prefix + " " + iterationsPerMillisecond + " iterations per millisecond" );
    }

    /**
     * Runs the benchmark
     *
     * @param runnable the code to benchmark.
     *
     * @return number of iterations per millisecond.
     */
    public double run( Runnable runnable )
    {
        //run the benchmarked code once, so that it gets JITted
        runnable.run();

        //perform a full garbage collection to bring the VM to an as clean as possible state
        runGarbageCollection();

        //wait for a system clock transition
        long currentNanos = System.nanoTime();
        long startNanos = currentNanos;
        while( currentNanos == startNanos )
            currentNanos = System.nanoTime();
        startNanos = currentNanos;

        //run the benchmarked code for the given number of milliseconds
        long endNanos = startNanos + (durationInMilliseconds * NANOSECONDS_PER_MILLISECOND);
        long iterations;
        for( iterations = 0; currentNanos < endNanos; iterations++ )
        {
            runnable.run();
            currentNanos = System.nanoTime();
        }

        //calculate and return number of iterations per millisecond.
        return iterations / ((double)(currentNanos - startNanos) / NANOSECONDS_PER_MILLISECOND);
    }

    /**
     * Runs a full garbage collection.
     *
     * See Stack Overflow: Forcing Garbage Collection in Java?
     */
    private static void runGarbageCollection()
    {
        WeakReference<Object> ref = new WeakReference<>( new Object() );
        for(; ; )
        {
            System.gc();
            Runtime.getRuntime().runFinalization();
            if( ref.get() == null )
                break;
            Thread.yield();
        }
    }

    /**
     * Rounds a number to a given number of significant digits.
     *
     * See Stack Overflow: rounding to an arbitrary number of significant digits
     *
     * @param number the number to round
     * @param digits the number of significant digits to round to.
     *
     * @return the number rounded to the given number of significant digits.
     */
    private static double roundToSignificantFigures( double number, int digits )
    {
        if( number == 0 )
            return 0;
        @SuppressWarnings( "NonReproducibleMathCall" )
        final double d = Math.ceil( Math.log10( number < 0 ? -number : number ) );
        @SuppressWarnings( "NumericCastThatLosesPrecision" )
        final int power = digits - (int)d;
        @SuppressWarnings( "NonReproducibleMathCall" )
        final double magnitude = Math.pow( 10, power );
        final long shifted = Math.round( number * magnitude );
        return shifted / magnitude;
    }
}
For an application of the above class, see my next post: michael.gr: Benchmarking Java 8 lambdas

2014-07-14

What do you need a debugger for?

In my many years of experience in programming I have noticed that there are some programmers who refuse to use a debugger, or try to use the debugger as little as possible, as in, only when they run out of alternative options. They tend to rely solely on the diagnostic log to troubleshoot problems in their code, so their code tends to spew thousands of lines of log entries per second, and they keep trying to divine the causes of exceptions by just looking at post-mortem stack traces.

Quite often these people do not understand what usefulness others find in debuggers.  I once requested the lead developer of a certain shop (Powernet, Athens, Greece, circa 2000) to enable debugging for me on their development web server so that I can run my debugger on the web site that I was developing in that shop, and she asked me "what do you need a debugger for?" Luckily, she proceeded to fulfil my request after a couple of long seconds of me staring blankly at her.

Listen folks, if you want to be called a "programmer" and if you want to be worth the cost of the keyboard you are pounding on, the debugger needs to be your absolute first tool of choice at the slightest need for troubleshooting, not your last tool of choice, not even your second tool of choice. Companies that develop IDEs go through huge pains to provide us with nice sleek and powerful debuggers so that we can do our job better, don't you dare let their efforts go to waste.

A call stack trace in the diagnostic log of your program will tell you which function was called by which function, and that's all.  This is enough in many simple cases, but when things get just slightly complicated, (and they usually do,) it is not enough.  Lacking any additional information, what you end up doing is theorizing about what might have happened instead of looking and seeing what has happened.

In addition to which function was called by which function, the debugger will also show you the values of the parameters to each call, and the values of the local variables within each call. For any variable which is an object, the debugger will show you the contents of that object, so in essence you have access to the state of the entire machine at the moment that the exception was thrown. When you have all that information at your disposal, then you can say that you are solving a problem. Anything less than that, and what you are actually doing is monkeying with the problem.

Similarly, in my career I have noticed lots of people who, when they want to perform a test run of the program that they are developing, always hit the "Run" button of their IDE instead of the "Debug" button.  Listen folks, if you want to be called a "programmer" then the button you should be pressing is the "Debug" button. You should forget that the "Run" button exists. "Run" is for users.  Programmers use "Debug".  Always "Debug".  Only "Debug".

Starting your program in debug mode does not mean that you are necessarily going to be doing any debugging; it just means that if some issue pops up during the test run, then you will be able to debug your program on the spot, rather than having to rely on the diagnostic log, or having to re-run the program in debug mode hoping that the issue will be reproducible.  Modern development environments even support program modification while debugging, (they call it "edit-and-continue" in the Microsoft world, "hot swap" in the Java world,) so if a small problem pops up you might even be able to fix it on the fly and continue running.

If you don't want to take my word for it, you could take a hint from the key bindings of your IDE.  In Visual Studio the description of the F5 key is "Run the application" while the description of Ctrl+F5 is "Run the code without invoking the debugger (Start without Debugging)".  As you can see, "Run the application" in Microsoft parlance means "Debug the application", and the key combination for debugging is the simple one, while the key combination for running without debugging is the more complicated one. Similarly, in Eclipse, F11 is "Debug", Ctrl+F11 is "Run". In PyDev, the same. Obviously, the creators of these IDEs expect you to be running your program with debugging far more often than without.  For me, it is 99.99% of the time with debugging, 0.01% without.

Some people complain that application start up is slower when debugging than without; I have not noticed such a thing, but what I have noticed is that sometimes one might be making use of some exotic feature of the debugger without realizing it, (for example having forgotten a so-called "function breakpoint" or a  "memory changed breakpoint" active) and that is slowing things down. Just make sure you do not have any fancy debugger features enabled unless you actually need them, and running your program with debugging should be about as fast as running it without debugging.


2014-06-04

Pronouncing the name of your web server

A memo to developers all over the world whose native language is not English:
Sign of the Apache Web Server
Folks, just so that you know, the world famous Apache Software Foundation which lends its name to its world famous Apache Web Server is not pronounced uh-pach;  it is pronounced uh-pach-ee.  The final letter is not a silent "e", it is a loudly and clearly pronounced "e".

There exist two words in English which are spelled "Apache";  one is of French origin, and according to dictionary.com it means "a Parisian gangster, rowdy, or ruffian".  This one does end in a silent "e", but it is not the one that the Apache Software Foundation was named after.  The other word is of Mexican-Spanish origin, it means "a member of an Athabaskan people of the southwestern U.S.", it ends in a definitely non-silent "e", and it is the word you are looking for.

Head over to dictionary.com to check out these two words and click on the little speaker icons to hear their pronunciation: http://dictionary.reference.com/browse/apache

Also, in the Wikipedia article about the Apache Software Foundation (http://en.wikipedia.org/wiki/Apache_Software_Foundation) we read:
The name 'Apache' was chosen from respect for the Native American Apache Nation, well known for their superior skills in warfare strategy and their inexhaustible endurance. It also makes a pun on "a patchy web server"—a server made from a series of patches—but this was not its origin.
And as a side note to fellow USAians: The same applies to the world famous Porsche brand of cars: the final "e" is not silent.  Please quit saying porsh; it is por-sheh.  See: http://youtu.be/4OuPY-1snyw

2014-05-14

Picture of Earth from Orbit in Cosmos S01E07

Nowadays the interwebz abounds with beautiful images of our Earth from orbit. Lately I have picked up the habit of trying to figure out what part of our world is visible when I see such an image. It is usually quite a puzzle, since the scale of the picture is not always obvious, parts of it are always obscured by clouds, the North can really be anywhere, and worst of all, countries are not painted with different colours! (Duh!) I am usually successful in this, but today I had a real tough one.

A couple of seconds into Cosmos: S01E07, there is a picture of Earth from orbit. Click on the picture below and see if you can identify the visible land before reading further down.



You might think that it is really obvious, but then try to verify your hypothesis by comparing the picture above against google earth, and whoops, you will see that you were wrong.

So, what's going on?

2014-05-12

By using this site, you agree to the use of cookies.

This site uses cookies to help deliver services. By using this site, you agree to the use of cookies. [Learn more] [Got it]
The EU legislators responsible for these messages should be removed from office and prohibited from ever holding any job other than milking goats.

I mean, really, how about this:
This site uses the Helvetica font to help deliver services. By using this site, you agree to the use of Helvetica. [Learn more] [Got it]
Or this:
This site uses TCP/IP to help deliver services. By using this site, you agree to the use of TCP/IP. [Learn more] [Got it]

And the list goes on...

2014-04-23

Stackoverflow.com question deleted within 2 minutes.

This question was sighted on stackoverflow.com on Thursday, April 30, 2013.  It was deleted within 2 minutes from being posted, but not before I managed to take a screenshot of the summary.

It is funny when you can tell what's wrong with the code by just looking at the summary!