The Zero Cost Abstraction of Iteration

I’ve been hooked on C++ for about 10 years now. Since CMPUT101, I’ve been attracted to the syntax of the language, the simplicity of use, and later the driving ideals behind the language. C++’s author Bjarne Stroustrup, stands alone when it comes to preaching language ideals and pushing the language forward all while respecting these ideals. This is no easy task, especially when the language is designed by a committee. I’ve always been convinced one of the key reasons C++ is such a large success, and has remained so timeless; is the fundamental guiding principles behind the language. The principle that I believe reigns supreme is the ‘Zero Cost Abstraction’.

What is the ‘Zero Cost Abstraction’? Also called a ‘Zero Overhead Abstraction’. Well — it’s an abstraction, that costs nothing. Not like monetary currency; or a favour. Instead resource or runtime cost. Bjarne’s definition is “an abstraction that imposes no space or time overhead in excess of  what would be imposed by careful hand coding of a particular example of the abstraction.” So in other words, an abstraction that doesn’t use any more resources than if you carefully crafted the assembly yourself.

Well then, what is an ‘abstraction’? Well an abstraction is a concept that you can work with, that ignores the details of operation. A good metaphor for this is a steering wheel on a car. You turn the wheel left and right, to turn left and right, but you’re not concerned with the details of actually moving the wheels of the car on the road. In fact, you don’t want to be concerned with those details at the level of driving.  You want to be more focused on the bigger task at hand — driving the car.

So abstractions in computing allow us to work with concepts, craft algorithms and structure code, so that we don’t have to worry about the details. Programming languages themselves are abstractions, they allow us to work with logical operations, classes, English words, and common symbols to create instructions that cause computer hardware to do things. Languages like C and C++, are considered ‘close to the metal’, this means that they only thinly abstract the assembly that is run on the processor, only thinly abstract the management of memory (either directly to the MMU or through the OS), and only thinly abstract the other pieces of hardware. Languages like C# and JAVA (shudder), are higher level languages, meaning they have many layers between the code and the actual instructions that execute on the CPU. These higher level abstractions give way for more flexibility, but often have a cost. In the case of JAVA, the flexibility of portability, at the cost of run-time and space overhead. Garbage collection adds the flexibility of not having to worry about memory management, at the cost of run-time, and space overhead. Not to mention the cost to developers, as the become brainless in the areas of memory lifetime, and ownership rules.

[Coming Soon – a post on Virtual Machines]

So what does any of this have to do with the ‘Zero Overhead Abstraction’ principle? Well, obviously I have a story to go along with it. So some time ago I was working on my library [ASPeKT] and I was soliciting the help of the internet for advice on how to work better with the IL I was crafting. The comments I got, were, well, unexpected to say the least. The majority of the criticism I received was my use of direct for loops for iteration. As opposed to the cooler C# foreach loop or even better LINQ-to-everything!! Now, I can take critiquing of my code, after all that’s how you learn. So I pondered this for quite some time. I reviewed some of the pull requests which had transformed some of my boring for-loops, into beautifully crafted LINQ queries. I wasn’t buying it. I just couldn’t. It hid some of the intent of what I was trying to do, and call me old school, but I found it harder to read.

Enter foreach, this one I liked. C++11 introduced new semantics for it and I liked the clarity that it gave. When you’re iterating a collection using an indexer, the first thing you do is get the item at the index. The foreach, just abstracts this and your boundary checking in a nice syntax package. Color me smitten. Now, I didn’t actually refactor much in my library to use foreach. Honestly, I like it but I wrote the code already and it worked. However I started using foreach, pretty much everywhere I could, just like I did in C++ now. I know that C# probably had this before, and I obviously adopted it backwards, but I do a lot of shit backwards. So sue me. 

Fast forward a couple of months, I’m reading “Writing High-Performance .NET Code” by Ben Watson. He’s got a little section on loops for iteration. The section I think was on comparing C++ to C# for performance. When we’re talking speed of execution, less instructions is more. The less instructions you have to run, the faster your program will be. The fastest program you can write, is one that does nothing.

There was a snip on comparing a loop in C# to a loop in C++, and the generated ASM instructions. He was making the statement that C# doesn’t ALWAYS have much overhead over a language like C++. He showed how old skool for loop iteration, was virtually the same instructions for C++ and C#. But the foreach, it was something else. I was floored, I had never really thought about it, but at that moment I did. What does a foreach in C# actually do?

His example was something like this

[C++]
int sum2()
{
    int range[4] = { 1, 2 , 3 , 4};
    int sum=0;
    for(auto i=range; i<range+4; ++i)
    {
        sum+=*i;
    }
    return sum;
}

[ASM]
//-O2 Disabled so it wouldn't optimize the loop out
sum2(): # @sum2()
  push rbp
  mov rbp, rsp
  lea rax, [rbp - 16]
  mov rcx, qword ptr [.L_ZZ4sumvE5range]
  mov qword ptr [rbp - 16], rcx
  mov rcx, qword ptr [.L_ZZ4sumvE5range+8]
  mov qword ptr [rbp - 8], rcx
  mov dword ptr [rbp - 20], 0
  mov qword ptr [rbp - 32], rax
.LBB1_1: # =>This Inner Loop Header: Depth=1
  lea rax, [rbp - 16]
  mov rcx, qword ptr [rbp - 32]
  add rax, 16
  cmp rcx, rax
  jae .LBB1_4
  mov rax, qword ptr [rbp - 32]
  mov ecx, dword ptr [rax]
  add ecx, dword ptr [rbp - 20]
  mov dword ptr [rbp - 20], ecx
  mov rax, qword ptr [rbp - 32]
  add rax, 4
  mov qword ptr [rbp - 32], rax
  jmp .LBB1_1
.LBB1_4:
  mov eax, dword ptr [rbp - 20]
  pop rbp
  ret
.L_ZZ3sumvE5range:
  .long 1 # 0x1
  .long 2 # 0x2
  .long 3 # 0x3
  .long 4 # 0x4
[C#]
static class Program
{
     static int sum()
    {
          int[] range = new int[]{ 1, 2 , 3 , 4};
          int sum=0;
          for(var i=0; i<4; ++i)
          {
                 sum+=range[i];
           }
          return sum;
   }

}

[JIT ASM]
Program.sum()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: mov ecx, 0x717168e2
    L000a: mov edx, 0x4
    L000f: call 0x58f3200
    L0014: lea edi, [eax+0x8]
    L0017: mov esi, 0x251c084c
    L001c: movq xmm0, [esi]
    L0020: movq [edi], xmm0
    L0024: movq xmm0, [esi+0x8]
    L0029: movq [edi+0x8], xmm0
    L002e: xor esi, esi
    L0030: xor edx, edx
    L0032: mov ecx, [eax+0x4]
    L0035: cmp edx, ecx
    L0037: jae L0049
    L0039: add esi, [eax+edx*4+0x8]
    L003d: inc edx
    L003e: cmp edx, 0x4
    L0041: jl L0035
    L0043: mov eax, esi
    L0045: pop esi
    L0046: pop edi
    L0047: pop ebp
    L0048: ret
    L0049: call 0x73a52b10
    L004e: int3

As you can see 25 instructions in C++ vs.  30 instructions for C#. So your standard for loop in C++ and C# are running essentially the same instructions.

The foreach was a different story — he wouldn’t even show the instructions. He only showed the generated IL, and let me tell you, it was a lot.

In my own research, the ASM generator that I used, generated 150+ instructions, which didn’t seem like a lot. However, I did notice is that it didn’t generate ASM for the actually calls to MoveNext(), it just used the syntax ‘call’ MoveNext(). Which likely has a lot more instructions under the hood.

Let’s compare that to the pure C++, range foreach.

[C++]
int sum()
{
    int range[4] = { 1, 2 , 3 , 4};
    int sum=0;
    for(auto i : range)
    {
        sum+=i;
    }
    return sum;
}

[ASM]
sum(): # @sum()
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [.L_ZZ3sumvE5range]
  mov qword ptr [rbp - 16], rax
  mov rax, qword ptr [.L_ZZ3sumvE5range+8]
  mov qword ptr [rbp - 8], rax
  mov dword ptr [rbp - 20], 0
  lea rax, [rbp - 16]
  mov qword ptr [rbp - 32], rax
  mov rax, qword ptr [rbp - 32]
  mov qword ptr [rbp - 40], rax
  mov rax, qword ptr [rbp - 32]
  add rax, 16
  mov qword ptr [rbp - 48], rax
.LBB0_1: # =>This Inner Loop Header: Depth=1
  mov rax, qword ptr [rbp - 40]
  cmp rax, qword ptr [rbp - 48]
  je .LBB0_4
  mov rax, qword ptr [rbp - 40]
  mov ecx, dword ptr [rax]
  mov dword ptr [rbp - 52], ecx
  mov ecx, dword ptr [rbp - 52]
  add ecx, dword ptr [rbp - 20]
  mov dword ptr [rbp - 20], ecx
  mov rax, qword ptr [rbp - 40]
  add rax, 4
  mov qword ptr [rbp - 40], rax
  jmp .LBB0_1
.LBB0_4:
  mov eax, dword ptr [rbp - 20]
  pop rbp
  ret

Hmmm…. 30 instructions. So as you can see, this is an illustration of a ‘Zero Cost Abstraction’. The code has been shrunk by quite a few characters. It becomes easier to read and reads with more intent, yet it doesn’t carry any overhead over the original loop.

So, what’s the development cost of writing your own foreach enumerable collection in C#. Implementing the IEnumerable concept.

Well — first you need to implement the IEnumerable interface, which returns an IEnumerator. The IEnumerator class is really the meat and potatoes. It contains your iteration state and a reference to your collection. You need to implement the MoveNext() call to increment the enumerator, which returns true or false based on whether it incremented. Then you need to implement Reset(), which resets the enumerator. You need to implement the ‘Current’ field to return the current object.

Vs.

What’s the development cost of making something enumerable in C++. Implementing the iterator concept.

You need to implement the iterator concept. Iterator concepts are described here, the minimum you need is the ForwardIterator. Which you need to be able to read and increment with multiple passes. The iterator concept models the pointer, so you need to implement operator++() and operator*(), and operator==(). Like the IEnumerator the iterator is the brains of iterator (fancy that), and contains some reference to the collection as well as iteration state.

In order for your collection to work with C++ foreach, you need to implement a begin() and end() method, which return an iterator.

begin() => returns an iterator to the first object
end() => returns an iterator of one past the last object

It’s as easy as that.

The moral of this story, just because an abstraction makes the code cleaner, doesn’t mean you’re not paying for it, someway. Of course, the flip-side is, if there is careful thought put into the abstraction — you can eat your cake and have it too.

“Small leaks sink big ships” – Benjamin Franklin

Happy Coding!

PL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s