Template<T> vs. Generic<T>

The other day I was discussing the differences between C++ templates and C# generics. In my opinion, C++ templates reigns supreme. Not because I’m one of those guys with a “My compiler compiles your compiler.” shirt. But because in general abstractions in C++ tend to ‘cost’ lest than the same abstractions in C#. For instance, the foreach exists in both languages, but the iterator concept is far lighter weight then the IEnumerable concept. Both of these concepts make use of generics to be able to support iterating a collection of “things”.  Having one single ‘concept’, that works across a multitude of different containers, allowing for traversal. The iterator concept is a bit more in-depth, in that it allows for more than forward traversal unlike IEnumerable.

The discussion came down to “why?”, why I thought that the template system in C++ was better than that of C# generics. Why is right. At the time, my statement was rather off the cuff, and somewhat tongue-in-cheek. Granted I had my reasons, but not backed by research. Only backed by my overview knowledge of the operations of the two systems. So now that I’ve had time to mull it over, and solidify my arguments. I’ll explain my point-of-view, grounded in my initial thoughts.

Let me ask you something? When was the last time someone wrote Tetris “in” C# generics? No, I don’t mean “with” C# generics, I mean “in” C# generics.

Spoiler: the answer is never.

Why? Let’s explore reason numero uno, which admittedly wasn’t the first reason to my head at the time. The C++ template system is Turing complete. That’s right, it’s Turing complete. Wait what does that even mean? Well, the definition for Turing complete is along the lines of  “A system of data-manipulation, is said to be Turing complete, when it can simulate a Turing machine.” Thank you Wikipedia for that ever so enlightening statement. Basically, what this means is that the meta-language built into the C++ template system, is actually a full blown programming language. One where if you’re brave enough, or crazy enough, you can even implement Tetris. Remember, that the C++ template engine runs at compile time, so you guessed it — it’s compile time Tetris.

You don’t need to use the template system only for silly things like Tetris. You can also use it for actually useful things as well. Things like specific implementations of your generic algorithms based on traits about a type. Or how about even finding traits about a type?  You can use the template system, to choose different implementations based on the type you’re using, or rather traits about the group of types you’re using. As well as do computations at compile time.

template <int N>
struct sum
{
    static const int value = N + sum<N-1>::value;
};

template<>
struct sum<0>
{
    static const int value = 0;
};

int main()
{
    std::cout << sum<5>::value << "!\n";
}

Figure 1 – A simple example of a summation recursive template meta-program

All of this for “free” at compile time. When I say free, I mean run-time free. The C++ template engine does its work at compile time, and will make heavy use of the type system. Don’t get me wrong, it’s not all roses. If you make heavy heavy heavy use of the template system, you can have really long compile times and even bloat your binaries. I’ve seen code generation times upwards of 30 minutes. The code actually only took about 5 or so minutes to compile, but the code generation was ~30 minutes. Imagine writing all those classes, TGFCPPT (Thank God for CPP Templates).  So at this point, C++ templates have delivered a knock-out punch to C# generics. KO! Generics down for the count. However, I am still going to proceed with examining my initial reasoning, because I like to “beat dead horses” so to speak. I think it can be said that generics are like the template systems little brother. They look similar, but one is about half as useful and twice as fat.

So the second reason, and the first that came to my head (not surprisingly) was overhead. The C++ template system doesn’t have run-time costs, because it’s a compile time operation. Whereas the generic system must have some run-time overhead, just due to its nature.  I wouldn’t say I was stabbing in the dark, but I definitely didn’t have research to back up my claim. I had read about the generics system in a few blogs, I understood how it worked, sort of. Anyways, the claim was contended. Fair enough, I didn’t have the research in my back pocket to prove it. For most, it would’ve ended there. However, on my never ending quest to remove misinformation and search down the truth, I couldn’t let it go. So really, what is the cost of the C# generics vs. the C++ template engine?

First, lets look at how the C++ template engine actually works. To be honest, I couldn’t find all that much real information on how C++ templates work. So I will have to share my understanding, which admittedly is naive, but get’s the point across . When you have some code like the following.

template <typename T>
void print(T value)
{
    std::cout << value;
}

print(5);
print("Hello World");

The compiler will for all intents and purposes, generate two functions for you.

void print(int value)
{
    std::cout << value;
}

void print(const char *value)
{
   std::cout << value;
}

Now, there’s little value in this example. But as you can see, the compiler is basically saving you the keyboard time. The compiler will only generate those functions that are needed, so it doesn’t just go out and create functions for every type. Only the ones that need to be created. Given this happens compile time, by the time you’re running those template definitions are a thing of the past, and you’re running the code that was generated by the compiler. These functions each have their own unique addresses, and their own ability to be optimized.

Now, compare this to how C# generics work. Consider List<T>, you could have instantiation for List<int>, List<string>, even List<Object>. How does this work?

Well, for a minute we can brainstorm a couple of different ways this could be achieved.

  • We could do something similar to the C++ template engine, and for every unique instantiation we could generate IL. Meaning that List<int>, List<string>, and List<object> would each get their own set of instructions.  This could be referred to as code specialization.

Let’s be honest with ourselves, this isn’t a good approach here. It would lead to HUGE binaries, and would be hugely inefficient. Given that C# objects are references, the code would be duplicated for a large portion of instantiations.

  •  We could ignore the type of the target. Effectively erasing it from our implementation. I don’t care if it looks like a duck, or quacks like a duck, it’s a duck. Then we only need to maintain ONE version of the code, and at run-time, we just ignore the type so the List<int>, List<string>, and List<Object> all use the same instructions. They’re just ignorant to the type.

This has issues too, because since we’ve erased the type, we’ve lost that information about the objects. Then we need to do extra work, to make sure that List<Object> != List<string>. As a note, this is the approach JAVA takes to its generics. The benefit to this approach, is that we only have one master set of IL for the implementation. There’s not duplication, like the approach above. It also means, that you need to treat POD types, that are stack based, as heap values.

  • We could use one shared implementation like JAVA, but at the same time use run-time techniques to manage our types. So this way we’re respecting the types of the objects. So List<int>, List<string>, and List<Object> all would use the same code, except there would be some overhead to store the information about the types.

This last approach is the approach that C# takes for their generics. Now, this isn’t exactly 100% true, because I believe that C# generics use a mixture of code specialization, as well as code sharing. Because when we talk about “objects” vs. “primitives”, in C# one is heap allocated and referenced, and the other lives on the stack. Therefore, if you treated something like an int the same as a string, you would need to do what is called “boxing” and allocate that int on the heap. Which is mega overhead and not cheap. So there is a combination of code specialization and code sharing going on.

One of the goals of the design of generics, was that it should be as efficient as code that is hand specialized. Using these two techniques, the implementation is equally as efficient as hand specialized code, and much much more efficient that type erasure. This is due to the boxing thing I mentioned earlier. That’s a lofty goal, and I’ll say an admirable one, taking from the spirit of C++. However! There are definitely other costs that are paid, the cost of memory overhead for storing the vtables, and other RTTI data. Making generics, not exactly free. Clever techniques for loading this information saves execution time, but we still have to pay for the memory used. Contrary to popular belief, memory isn’t always free. Although, at the C# level, it can pretty much be assumed to be. Like C++ code generation, C# generics can also basically explode when being evaluated. In one blog that I read, the author mentions that a single instance of List<T> will explode out into 28 different types, where 5 unique List types could cost upwards of 70K in the working set.

It’s not to say that C# generics don’t have some neat features. Specifically type constraints, and the ability to constrain types to interfaces. This is something that can be done with C++ templates (slightly easier post C++11), but the reality is that it’s not for the faint of heart. The second upside of this, is that the compiler errors you get when you violate a type constraint on a generic, they’re readable by a human. One of the downsides of C++ template usage and definitely of template meta-programming, is that the compile errors are pretty much non-coherent for anyone who isn’t a compiler writer or demigod.

In conclusion, I still think that C++ templates are far superior to C# generics. That said, they both have their benefits and costs. It is like anything, you need to understand the cost of something before you use it, and you should always pick the right tool for the job.

“Price is what you pay. Value is what you get.” — Warren Buffet

As always — keep calm and code on.

Happy Coding!

 

References:

[1] On generics and (some of) the associated overheads 

[2] Design and Implementation of Generics for the .NET Common Language Runtime

[3] Turing Completeness – Wikipedia

 

One thought on “Template<T> vs. Generic<T>

  1. IIRC, C# tries to specialise for interfaces and share from there, at least for constrained generics. This means that at least part of the work will be done at compile time, although not as much as in C++. I can’t remember if the same applies when there isn’t a constraint, though.

    As for template errors and readability, we can at least solve some of the issues with static_assert, although that still doesn’t fix most of ’em. xD

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s