Is F# faster than C# - Muhammad Rizwan

Description

I had a bet with a co-worker that C# would outperform F# for a simple counting exercise. This turned out to be quite a surprise for many reasons. In this blog post I will try to explain why.

Problem

Sum the numbers from 1 to n using brute force (can't use the (1 + N)*(N/2) equation). Do this in both C# and F# and compare the speed. Since F# needs to do this recursively I thought this should be a no contest win for C#.

Using F#

We initially wrote the code without using tail recursion which immediately ended with a stack overflow. So we went back and wrote it doing it the functional way, with tail recursion. Here is what we came up with.

<span 
class="codeKeyword">module</span> FSharpLib

 <span class="codeKeyword">let</span> sumNum fromNum toNum =
  <span class="codeKeyword">let</span> <span 
class="codeKeyword">rec</span> sumNumTail fromNum toNum result =
   <span class="codeKeyword">if</span> fromNum &gt; 
toNum then 
    result 
   <span class="codeKeyword">else</span> 
    sumNumTail (fromNum+1L) toNum (result+fromNum)
  sumNumTail fromNum toNum 0L

Using C# - Attempt 1

You'll see later why I say attempt 1. But, this is the simplest C# code I could think of.

<span 
class="codeKeyword">public</span> <span 
class="codeKeyword">static</span> <span 
class="codeKeyword">long</span> CSharpForLoop(<span 
class="codeKeyword">long</span> fromNumber, <span 
class="codeKeyword">long</span> toNumber) {
 <span class="codeKeyword">long</span> result = 0;
 <span class="codeKeyword">for</span> (<span 
class="codeKeyword">long</span> i = fromNumber; i &lt;= 
toNumber; i++) {
  result += i;
 }
 <span class="codeKeyword">return</span> result;
}

The Results - Attempt 1

This is the results of running the above code using a fromNumber of 1 and a toNumber of 10000000 and running it 10 times alternating between C# and F# to filter out start-up times, etc.

F#	0.8606902s	11% faster
C#	0.9556198s

I ran it like 20 times because I couldn't believe it. I looked at the C# code for any possible optimizations I could add. But, being so simple there wasn't a lot I could do.

Reflector is an amazing tool

I turned to my good friend Reflector to see what was going on. Here is what the F# code looked like when viewing it as C# code.

<span 
class="codeKeyword">public</span> <span 
class="codeKeyword">static</span> <span 
class="codeKeyword">long</span> sumNum(<span 
class="codeKeyword">long</span> fromNum, <span 
class="codeKeyword">long</span> toNum)
{
    FastFunc&lt;<span class="codeKeyword">long</span>, 
FastFunc&lt;<span class="codeKeyword">long</span>, 
FastFunc&lt;<span class="codeKeyword">long</span>, 
<span 
class="codeKeyword">long</span>&gt;&gt;&gt; 
sumNumTail = <span class="codeKeyword">new</span> clo@7();
    <span class="codeKeyword">return</span> 
FastFunc&lt;<span class="codeKeyword">long</span>, 
<span 
class="codeKeyword">long</span>&gt;.InvokeFast3&lt;<span
 class="codeKeyword">long</span>, <span 
class="codeKeyword">long</span>&gt;(sumNumTail, fromNum, 
toNum, 0L);
}

and clo@7.Invoke looks like

<span 
class="codeKeyword">public</span> <span 
class="codeKeyword">override</span> <span 
class="codeKeyword">long</span> Invoke(<span 
class="codeKeyword">long</span> fromNum, <span 
class="codeKeyword">long</span> toNum, <span 
class="codeKeyword">long</span> result)
{
 <span class="codeKeyword">while</span> (fromNum &lt;= 
toNum)
 {
  result += fromNum;
  toNum = toNum;
  fromNum += 1L;
 }
 <span class="codeKeyword">return</span> result;
}

They used a while loop instead of a for loop. OK, I can do that. But, what happened to tail, I thought that tail is why the CLR is so good at functional language implementations.

C# - Attempt 2

<span 
class="codeKeyword">public</span> <span 
class="codeKeyword">static</span> <span 
class="codeKeyword">long</span> CSharpWhileLoop(<span 
class="codeKeyword">long</span> fromNumber, <span 
class="codeKeyword">long</span> toNumber) {
 <span class="codeKeyword">long</span> result = 0;
 <span class="codeKeyword">while</span> (fromNumber 
&lt;= toNumber)
 {
  result += fromNumber;
  <span class="codeComment">// I left out the toNum = toNum 
because it didn't do anything.</span>
  fromNumber += 1L;
 }
 <span class="codeKeyword">return</span> result;
}

and the results (I've got you now F#)...

F#	0.8606902s
C# For Loop	0.9556198s
C# While Loop	0.9105592s

Oh wait... DAMN! What could F# be doing to make it almost 10% faster than C#.

To the IL code Batman

This is the IL for the F# code.

 L_0000: nop              <span class="codeComment">// while loop</span>
 L_0001: nop 
 L_0002: ldarg.1          <span class="codeComment">// fromNum &lt;= toNum</span>
 L_0003: ldarg.2 
 L_0004: ble.s L_0009     
 L_0006: nop              <span class="codeComment">// return result</span>
 L_0007: ldarg.3 
 L_0008: ret              
 L_0009: nop              <span class="codeComment">// result += fromNum</span>
 L_000a: ldarg.1 
 L_000b: ldc.i8 1
 L_0014: add              
 L_0015: ldarg.2          <span class="codeComment">// fromNum += 1L</span>
 L_0016: ldarg.3 
 L_0017: ldarg.1 
 L_0018: add              
 L_0019: starg.s result   <span class="codeComment">// loop</span>
 L_001b: starg.s toNum
 L_001d: starg.s fromNum
 L_001f: br.s L_0000

This is the IL for the C# while loop.

 L_0000: nop                   <span class="codeComment">// long result = 0L</span>
 L_0001: ldc.i4.0
 L_0002: conv.i8 
 L_0003: stloc.0               
 L_0004: br.s L_0012           <span class="codeComment">// while loop</span>
 L_0006: nop                   <span class="codeComment">// result += fromNumber</span>
 L_0007: ldloc.0 
 L_0008: ldarg.0 
 L_0009: add                   
 L_000a: stloc.0               <span class="codeComment">// fromNumber += 1L</span>
 L_000b: ldarg.0 
 L_000c: ldc.i4.1 
 L_000d: conv.i8 
 L_000e: add                   
 L_000f: starg.s fromNumber    
 L_0011: nop 
 L_0012: ldarg.0               <span class="codeComment">// fromNumber &lt;= toNumber</span>
 L_0013: ldarg.1 
 L_0014: cgt 
 L_0016: ldc.i4.0 
 L_0017: ceq 
 L_0019: stloc.2 
 L_001a: ldloc.2 
 L_001b: brtrue.s L_0006       <span class="codeComment">// loop</span>
 L_001d: ldloc.0 
 L_001e: stloc.1 
 L_001f: br.s L_0021
 L_0021: ldloc.1 
 L_0022: ret

What just happened.

Wow, the C# compiler is quite a bit more verbose. First of all, C# makes the while loop into a do/while loop. Secondly, in the loop condition C# creates a temporary variable to store the comparison and then checks for 'true' ("brtrue.s"). If you look at the F# IL you'll see the use of the "ble.s" instructions (branch if less or equal) instead. Finally we see that C# uses a few "conv.i8" instructions (convert to 8 byte/64-bit int) which I can only guess adds to it's defeat.

Conclusion

So is F# just a more performant language or is C# just compiling to more robust IL code which as F# matures will end up suffering the same fate? I personally think (hope actually) the later is true. I'm not an expert in compilers but I would have to assume the C# compiler team is using every trick in the book to make highly performant IL code without sacrificing robustness and I can only assume that F# will get to a point where it needs to make the same concessions to handle some obscure cases.