2014-07-18

Benchmarking code written in Java or C# (or any GCed, JITted, VM-based language)

Sometimes we need to measure the time it takes for various pieces of code to execute in order to determine whether a certain construct takes significantly less time to execute than another. It sounds like a pretty simple task, but anyone who has ever attempted to do it knows that simplistic approaches are highly inaccurate, and achieving any accuracy at all is not trivial.

Back in the days of C and MS-DOS things were pretty straightforward: you would read the value of the system clock, run your code, read the value of the clock again, subtract the two, and that was how much time it took to run your code. The rather coarse resolution of the system clock would skew things a bit, so one trick you would at the very least employ was to loop waiting for the value of the system clock to change, then start running your code, and stop running at another transition of the value of the system clock. Another popular hack was to run benchmarks with interrupts disabled. Yes, back in those days the entire machine was yours, so you could actually do such a thing.

Nowadays, things are far more complicated. For one thing, the entire machine tends to never be yours, so you cannot disable interrupts. Other threads will pre-empt your thread, and there is nothing you can do about it, you just have to accept some inaccuracy from it. Luckily, with modern multi-core CPUs this is not so much an issue as it used to be, but in modern VM-based languages like Java and C# we have additional and far more severe inaccuracies introduced by the garbage collection and the jitting. Luckily, their impact can be reduced.

In order to avoid inaccuracies due to jitting, we always perform one run of the code under measurement before the measurements begin. This gives the JIT compiler a chance to do its job, so it will not be getting in the way later, during the actual benchmark.



In order to avoid inaccuracies due to garbage collection, we always perform one full garbage collection before starting the benchmark, and we try to keep the benchmark short, so as to reduce the chances of another garbage collection happening before it completes. The garbage collection APIs of most VMs tend to be somewhat snobbish, and they do not really guarantee that a full garbage collection will actually take place when requested, so we need an additional trick: we allocate an object keeping only a weak reference to it, then we keep calling the VM to garbage collect and run finalizers until that object disappears. This still does not guarantee that a full garbage collection has taken place, but it gives us the closest we can have to a guarantee by using only conventional means.

So, here is the class that I use for benchmarking, employing all of the above tricks:

package saganaki;

import java.lang.ref.WeakReference;

/**
 * Measures the time it takes to run a piece of code.
 *
 * @author Michael Belivanakis (michael.gr)
 */
public class Benchmark
{
    private static final long NANOSECONDS_PER_MILLISECOND = 1000_000L;
    private final long durationInMilliseconds;

    /**
     * Initializes a new instance of {@link Benchmark}.
     *
     * @param durationInMilliseconds for how long to run the benchmark.
     */
    public Benchmark( long durationInMilliseconds )
    {
        this.durationInMilliseconds = durationInMilliseconds;
    }

    /**
     * Runs the benchmark, printing the results to {@link System#out}
     *
     * @param prefix   text to print before the results.
     * @param runnable the code to benchmark.
     */
    public void runAndPrint( String prefix, Runnable runnable )
    {
        double iterationsPerMillisecond = run( runnable );
        iterationsPerMillisecond = roundToSignificantFigures( iterationsPerMillisecond, 6 );
        System.out.println( prefix + " " + iterationsPerMillisecond + " iterations per millisecond" );
    }

    /**
     * Runs the benchmark
     *
     * @param runnable the code to benchmark.
     *
     * @return number of iterations per millisecond.
     */
    public double run( Runnable runnable )
    {
        //run the benchmarked code once, so that it gets JITted
        runnable.run();

        //perform a full garbage collection to bring the VM to an as clean as possible state
        runGarbageCollection();

        //wait for a system clock transition
        long currentNanos = System.nanoTime();
        long startNanos = currentNanos;
        while( currentNanos == startNanos )
            currentNanos = System.nanoTime();
        startNanos = currentNanos;

        //run the benchmarked code for the given number of milliseconds
        long endNanos = startNanos + (durationInMilliseconds * NANOSECONDS_PER_MILLISECOND);
        long iterations;
        for( iterations = 0; currentNanos < endNanos; iterations++ )
        {
            runnable.run();
            currentNanos = System.nanoTime();
        }

        //calculate and return number of iterations per millisecond.
        return iterations / ((double)(currentNanos - startNanos) / NANOSECONDS_PER_MILLISECOND);
    }

    /**
     * Runs a full garbage collection.
     *
     * See Stack Overflow: Forcing Garbage Collection in Java?
     */
    private static void runGarbageCollection()
    {
        WeakReference<Object> ref = new WeakReference<>( new Object() );
        for(; ; )
        {
            System.gc();
            Runtime.getRuntime().runFinalization();
            if( ref.get() == null )
                break;
            Thread.yield();
        }
    }

    /**
     * Rounds a number to a given number of significant digits.
     *
     * See Stack Overflow: rounding to an arbitrary number of significant digits
     *
     * @param number the number to round
     * @param digits the number of significant digits to round to.
     *
     * @return the number rounded to the given number of significant digits.
     */
    private static double roundToSignificantFigures( double number, int digits )
    {
        if( number == 0 )
            return 0;
        @SuppressWarnings( "NonReproducibleMathCall" )
        final double d = Math.ceil( Math.log10( number < 0 ? -number : number ) );
        @SuppressWarnings( "NumericCastThatLosesPrecision" )
        final int power = digits - (int)d;
        @SuppressWarnings( "NonReproducibleMathCall" )
        final double magnitude = Math.pow( 10, power );
        final long shifted = Math.round( number * magnitude );
        return shifted / magnitude;
    }
}
For an application of the above class, see my next post: michael.gr: Benchmarking Java 8 lambdas

No comments:

Post a Comment