Don’t try this at work – Vector Pt. 3

Welcome back everyone, for at long last, the last in this vector series. It’s been a long haul, and I appreciate everyone who has come along for the journey.

In light of the happenings recently in the United States of America, I’d just like everyone to take a moment of silence; for those victims of prejudice, hate, and discrimination. Humans are social creatures, and we all need to realize we are here not in-spite of each other, we are here because of each other. It is our duty, no matter your role in society, your job, your socioeconomic standing, to treat all other individuals with the same standard of decency, respect, and humility. We all face challenges, some more than others, and as humans we must empathize with our fellow human, regardless of ethnicity, gender, or sexual preference. It is our unique ability as humans to be self-aware, we should use this power for good, and not evil. I challenge you to pause for a moment and imagine what it would be like to live in someone else’s shoes, someone who has to deal daily with prejudice and segregation; if only for a moment. I have been blessed with many opportunities in my life, and am thankful everyday for that. No matter our position in life, it is never our right to deny others a chance at the opportunities we have had. In fact, it is our responsibility, to help enable those opportunities for others. We mustn’t allow this to continue. After the protests have dispersed, and the smoke has settled. We must remember this, and as a society we must change. For if nothing changes, nothing changes. #blacklivesmatter

Thank you.

The last time we left of, we had a pretty much working vector implementation. Then we wanted to enable it to be used with standard algorithms. it was a bit of a trick though, because it already worked with the algorithms, using a direct pointer into the memory. Instead we wrote an iterator class that was a bit safer, and allowed us a bit more control over the raw pointer. Then we learned a bit of funny business with SFINAE, to enable specific functions only when the arguments are iterators. It was a pleasant stride that we took, and we learned a few things along the way. Taking a look at insert allowed us to meander down the valley of memory allocation, and reorganization. Then we were able to stop and smell the SFINAE roses, though they often have an off scent. At the start of this journey, I’ll be honest — I wasn’t expecting it would be this much work. It is crazy to think about the amount of brain power that went into crafting our humble little standard vector, that we use daily. It puts in perspective the amount of work that goes into the STL alone. Although, it was the premise of this series to illustrate why it’s in our best interest to use the fantastic set of tools that ship with our compiler.

This adventure however, is coming to a conclusion. When I started, I had almost jokingly said I wanted to pit the vector we would create, against the MSVC standard vector. At that time of writing, I didn’t really know what I was going to do, (similarly with “use it with the algorithms”). It isn’t to say I didn’t have a direction, I just didn’t know what I was going to do. So I had to decide what exactly “pitting my vector” against Microsoft’s meant. Luckily, since it’s my blog, it’s my arena. Though if I didn’t do it justice, I knew that I would get down-voted into oblivion. LIKE THAT TIME I THOUGHT VIM COULDN’T NAVIGATE TO HEADER FILES. So, I had to actually put some brain power into this, other than just making some asinine competition up, and declaring that I win!

We really want to measure the amount of work the vector is doing. Often in computing the amount of work is measured by the amount of processing time. This is why we say an operation, or algorithm is fast or slow, it’s the amount of time it takes to complete. It follows, that the less a particular operation does, the faster it will be. The fastest program, is no program at all. However, it’s also the most useless program ever. The art of optimization, is to do less work while still having the same outcome. There are a few things that affect the actual timing of operations. These things range from processor speed, program load (i.e. how many programs are running simultaneously), memory consumption, etc. These though, are all environmental. Meaning that if you tested program A, in the same environment as B, they should have the same effect. The biggest affect on your operations overall speed, will be it’s algorithmic complexity. You might’ve heard of this, sometimes it’s called Big-O notation. That is a constant time operation. You’ll hear people say things like a binary search is Log(N), or that a linear search runs in linear time. Worst of all, an algorithm is exponential. Now, these “time” measurements, aren’t actual wall clock time. Though, they will affect the amount of ticks your algorithm takes to complete. They’re a relative time scale, based on the amount of work the algorithm has to do. If an algorithm is linear, it means that as the amount of work the algorithm must do grows, the time will grow linearly in relation to size of it’s input. When something is a constant time operation, it means that it has to do the same work, regardless of the size of input.

Let’s explain this in-terms of our vector; Let’s say we want to find an item in our vector. In the first case we want to find the 10th item. Okay, so regardless of ordering of our vector, if we want the 10th item, we have to do the same constant work. We need to get the start of the vector, increment it by 10, and we have the item. Okay, Okay. What about a vector of size < 10. So we have to check the size first. Regardless though of our vectors size, it’s the same work for 1, 10, 100, 1 000, 10 000, 1 000 000 000 items. This means it’s a constant time operation. You would see this in the wall clock time. If the operation took 10 ms for a vector of 10 items, it would take roughly the same 10 ms regardless of the size.

What if we wanted to find an item that matched some predicate? Assuming our vector is a vector<int> of random numbers, and we wanted to find say the value 10. What do we have to do? Well, we have to start at the beginning and iterate the vector, until we find the value 10. Now, all things being completely random, you may find the element right at the beginning, some where in the middle, or maybe it’s the very last element. Given one full iteration of the vector, you’ll be able to either a) find the element b) know it doesn’t exist. This is a linear, algorithm. The wall clock time, will grow roughly linearly as the size increases. Meaning that it should take roughly 10 times longer to find an element in a vector of 10 vs. a vector of 100, with a linear search. In practice however, if you measured this once, you might see something strange. Your vector of 10, might take more time than the vector of 100. Why is that? Well, it could simple be due to a roll of the dice. All things being random, you found the item at the end of the 10 item vector, and it was the first element in the 100 element vector. That’s why these time complexities are often times called the amortized complexity. With a linear search, the best case is the case where you found the element at the very start, and the worst case is at the end. The average case, will land somewhere in the middle. so if you were to measure searching a 10 element vector, vs. a 100 element vector, say 1000 times. You would see that on average the larger vector took longer, and if you were to graph it you would see it grow linearly.

Next up, the dreaded exponential growth. These are algorithms we want to avoid as developers, because they’re the type of algorithm that “works on my machine”, but grinds to a halt at scale. Because we test with 100 elements on our machine, and scale is 100 000 elements. These algorithms are insidious, and often come in the form nested loops. To continue our example, if you for instance wanted to search for a range of items, within your vector. If we call the elements we search for the needle, and the elements we’re searching the haystack. We could approach this problem as follows. Start by iterating the haystack, until we find an element that matches the first element in the needle. Then continue iterating both collections until we match the entire needle collection OR we do not match, and we start the process over again. Now, the reason these types of algorithms are insidious, is because they tend to be adequately fast enough, until you start getting into larger collections. So it is helpful to be able to spot these types of algorithms, before it’s too late. Each time you’ve got a nested loop where you have an unbounded end on each loop, you’ve got something that will grow exponentially.

Linear operations are often times good enough, meaning that on average, at scale, they’ll complete in an adequate amount of time. As long as this meets our timing requirements, and the user doesn’t go gray waiting, a linear algorithm will suffice. What about the times when they won’t? How can we do better? Well — when you’re dealing with completely random data, you can’t do better than linear or exponential, because the data is random. For instance, to search for an item in a random collection, you’re basically required to iterate the entire collection, without any shortcuts. How else can you be sure of your result? This means that in order to do better than linear, we have to add some structure to our data, so that we can make reasonable guarantees. This allows us to take short-cuts, and do less work.

Let’s go back to our search we were doing earlier. We want to find an element with the value of 10, in a vector<int>. Now, when we did this before, we operated on a random set of data. There was no order or structure to our data, so we were forced to inspect each element. Let’s instead assume the vector is sorted. Given this order, we can make reasonable assumptions that allow us to take shortcuts and do less work when we’re searching. Of course the linear search still works, but why would we do that when the data is sorted? Assuming an ascending sort, we could search in the following manner. Start at the middle element, is it our value? If not, is the value less than, or greater than the value we’re searching for? If it’s less than the value we’re searching for, and the list is in ascending order, than we know with certainty that the item resides in the top half of the list. This means in one operation we can discard half the list of items. We just discarded half the work. Now, if you take the same approach to the top half of the list, you’ve just discarded half-of-half. Keep doing this and you’ll find your item, or you’ll run out of items to search. As you can see, having the data in sorted order allows us to rapidly find the item, by means of divide and conquer. Each iteration of our search, we effectively remove half of the work. This is what is called a Logarithmic time complexity, because they divide the entire set in 2 for each step. These types of operations are often referred to as binary operations, because at any given point you either go left or right, depending on the value. Now, a sorted order allows this type of approach to a list of items, but you can structure your data in such a way i.e. a binary tree, that makes these types of binary operations very easy.

Now, you’re probably asking “well what about the time it takes to sort the vector”? Well, that does have a cost, and that’s where the balance of computing comes in. Maybe, adding items to your vector happens rarely, but you’re constantly searching it. So it makes sense to sort the collection and pay for that at initialization, and then use a faster algorithm for searching. Given your data set you might just want to manage a binary structure, where you pay a Log(N) cost for insertion, as well as for searching. It’s important to illustrate the point that structure and order, is what allows us to craft smarter algorithms, that allow us to do less work.

I’ve digressed. What I really wanted to cover was measuring our vector, against MSVC vector.

I thought a while (maybe 14 minutes) about what I wanted to measure. I felt that the only real operation that a vector did, was grow. Meaning that pretty much anything else, was effectively a constant time operation. If you want to get an element, or assign to a pre-existing slot, this is a constant time operation. So where are the operations that will take some time? Well, I know that there is work to be done when the vector grows. Due to the fact we have to allocate more space, and copy into that new space, then free the old space. So the operations we really want to time are adding an item (emplace) and inserting items (insert). When we insert, we want to test inserting at the beginning, middle, and end. Because we know when we implemented it, those operations all have different paths. You can read about this in the previous post.

I’ll be honest — I had planned to dedicate a significant portion of this post to benchmarking, and methods behind that. Though at this point with my algorithmic complexity digression, it will make this post a novel to get into that. If that’s of interest to you, let me know and I’ll try to write a follow up post about it.

Originally, my plan was to look into using a third party benchmarking tool. Something like Google Benchmark, or nonius. However, I couldn’t help my own curiosity. To be completely honest, I think one should not re-invent the wheel, at work. But at home, for fun, for posterity (yes — if I ever have children, I want to show them my library of useless recreations.), or for education, I’m a firm believer we should make attempts at it. So, for better or for worse, instead of using an off-the-shelf solution. I opted to write my own lightweight benchmarking tool. You can find it here.

What I chose to measures is the following:

  1. Emplace 10, 100, 1 000, 10 000, 100 000, 1 000 000 int values.
  2. Emplace 10, 100, 1 000, 10 000, 100 000, 1 000 000 struct w/ move ctor.
  3. Insert 1 000 into 1 000, 10 000, 100 000 int.
  4. Insert 1 000 into 1 000, 10 000, 100 000 struct w/ move ctor.

Once we do that, we can then analyze the results. This would be the very first step on the road of optimization. I won’t digress down that rabbit hole.

Here’s the graphed results of the different emplace measurements. On the x-axis, is the number of iterations the test was run, on the y-axis is the time in nanoseconds. The blue line represents the MSVC standard vector, and the orange line is the htk vector. Well — no surprise here, htk wins. No need to look any closer at those results.

Just kidding. Put your pitch-forks away. I don’t actually think that the htk vector is more performant than the MSVC vector. There’s perfectly logical explanations to why it is coming in faster than the STL vector. One plausible answer would be that the numbers are just way off, I’ve never claimed to be a mathematician or statistician. The code could be wrong, there’s a possibility that I’m timing more than I’d like to be timing. However, given that the code is identical for both cases, it should be consistent. I will say however, that there is large variance in the data.

nameiterationstotalmeanmedianminmaxs-dev
vector<int> emplace 101061200516025001700273007515.344
vector<int> emplace 101002682002162210015006000530.2405
vector<int> emplace 10100029931002391.522001600209001051.978
vector<int> emplace 1010000248184002022.0519001000190070019016.71
vector<int> emplace 10100000121941400997.82188006003753002574.748

With such a large standard deviation, it means are data is pretty widely spread, and using Bryce’s check to determine if the values are normally distributed. Hardly any of the tests were seemingly normal. That being said, I ran the measurements multiple times, and saw similar results so, let’s go with it being somewhat correct.

We can start to reason about why the htk vector might be faster than the standard vector? Well — truth be told, there was one test that I knew htk would have the upper-hand. The emplace 10 items test. Well — if you recall when we implemented our vector, the initialization of the vector allocated 10 slots. This means, that we started with enough room for 10 elements. The standard vector had to allocate this initial memory, so it was pretty safe to say we’d win there.

In two of the cases emplace 1000 and 10000, we didn’t actually win. If you look at the 100000 iteration trials for both tests, the timing was almost identical. So likely, at best for two of the cases we’re equal, which makes sense. To illustrate this further we’ll take the 100000 iteration trials, for each test 10 items, 100, 1000, 10000, 100000 items and graph that.

As we can see, the two are almost identical. Since our trial size is growing by a factor of 10 per test, it makes sense to plot the time on a logarithmic scale.

x- items emplaced; y-nanoseconds

This graph, shows pretty clearly, almost exactly what we expected. In the test with 10 elements, the htk vector is superior because it isn’t doing any allocations. The 100 element makes sense it would be slightly faster, because we saved the time from the first 10 allocation. After that, it stops mattering. The two are pretty much neck and neck. Another reason I suspect as to why the htk vector might have a slight upper hand, for the move_only object, is that it’s much less safe. The htk vector probably doesn’t even offer a basic exception guarantee, whereas the stl vector offers a strong guarantee, whereby if an exception is thrown while the collection is growing, it will effectively roll back to the state before trying to grow. With the htk vector, well, it doesn’t do that. It at least tries to clean up for a basic guarantee, but I didn’t really pay attention to ensuring it, so there could be cases where it does not offer that.

x- items emplaced; y-nanoseconds

Okay — so now for the embarrassment I think we were all expecting.

Overall — I’d say this is a blow out, the std vector is about 5x faster in all cases than our htk vector. So then we ask ourselves, why? Well, the answer lies back with what we did for emplace, and what we didn’t do for insert. Well, kind of. With the insert, when we need to grow if the original items, and they’re trivial items, we can just use memmove to very quickly move the contiguous data from one area of memory to another. Keeping in mind, this is trivial data which is what allows us to just copy the bytes. This definitely optimized our growth code for trivial types. However, when we implemented insert we used this code:

move_items(data_.first, data_.first + offset, next.first, copy_type{});
auto dest = first + offset;
while (start != fin)
    allocator_.construct(*(dest++), *(start++));
move_items(data_.first + offset, data_.last, dest, copy_type{});

You can see that we use our tag dispatch optimized function to move the items within our vector around, but when we’re inserting our new elements, we just construct them one by one. Instead of just moving them (read: memmove) into the newly allocated space. Now, you might be asking “how much slower could that actually be?” Well, in this case, it turns out to be roughly 5x slower, in fact. So we can change the code to something along the lines of…

move_items(data_.first, data_.first + offset, next.first, copy_type{});
auto dest = first + offset;
move_items(start.ptr(), fin.ptr(), dest, copy_type{});
move_items(data_.first + offset, data_.last, dest, copy_type{});

If you recall, when we wrote move_items, we took advantage of tag dispatching, to determine which function to use. The specific function that will get chosen will be the function that uses memmove.

void move_items(pointer first, pointer last, pointer &dest, memcpy_items_tag)
{
     memmove(dest, first, (last - first) * sizeof(T));
}

After changing the code, I re-ran the insert tests, and the results (unsurprisingly) looked like this.

You can see now, they’re both really fast, given that they’s within 200ns of each other. I’d say that they’re pretty much identical. The only justification I can make for the slight difference, is again the safety factor. We really don’t care about any exception guarantees, which for fun is okay, but don’t try it at work.

Okay — I’ll address it. I know it’s burning for some of you. That code I wrote above, doesn’t actually work. I mean — it works, for the test I had written, but not much else.

move_items(start.ptr(), fin.ptr(), dest, copy_type{});

This line here, has this weird .ptr() call on both the iterators. I can explain. Basically, for move_items, which is akin to the MSVC _Umove function. However, the MSVC functions work on iterators, and ours just so happens to work on pointers, from the legacy code we wrote in the first article. What does that mean? Well, it means that we directly pass the input to our move_items call into memmove. This was perfect at that time, because we had full access to the pointers. Meaning our iterator was a pointer. But now, now we’re working with iterators that aren’t pointers. Just so we’re on the same page, a pointer is an iterator, but an iterator is not a pointer. There is a level of indirection there, that allows iterators to work the way they do over pointers, but it means we can’t feed the iterator directly into the memmove function without the compiler yelling. But you just said that MSVC uses iterators in their function…. They do, but first they go through the process of undressing their iterators. No — wait, that’s not right. Unwrapping, their iterators. It’s basically the processes of removing the iterator facade so that we can in this case see the underlying pointer. But why didn’t you do that, Peter? Well — if I’m honest, as much as I love rabbit holes, and going down them to come out the other side. I’m probably almost as tired of working on this vector, as you readers are reading about it. It’s also another exercise in traits, and meta programming that I didn’t want to spend the time on. My bad. 🙂

I had a better use of that time. I’m going to try and keep my promise that I made in the beginning, and in the middle, and now as we approach the end. I still have not kept. I have not really talked at all about implementing OR timing an algorithm.

So — the other day, I was listening, or watching, I think listening to a CppCast Podcast, and Sean Parent was the guest. He was talking about how at Adobe one of the technical questions is that you have to implement a binary search, and then prove it. I thought, challenge accepted. Not that I’ll be applying at Adobe any time soon, but it’s a worth while exercise. Especially when it’s not for all the marbles, like a job at Adobe.

template <typename IteratorT, typename ValueT>
bool binary_search(IteratorT begin, IteratorT end, const ValueT &v)
{
    const int mid = (end - begin) / 2;
    if (mid == 0)
        return *begin == v ? true : ( *(end-1) == v ? true  : false);
    const auto midval = *(begin + mid);
    if (midval == v)
        return true;
    if (v < midval)
        return htk::binary_search(begin, begin+mid, v);
    else
       return htk::binary_search(begin + mid, end, v);
}

Okay — I did it. It’s a recursive binary search function. I know what you’re thinking, eww recursion. That’s just how I think about it. So the the next part, is to prove it by induction. Dun. Dun. Dun. Now, Sean mentioned that he had an individual provide 30 pages of documentation for the binary search. I’ll attempt to do one better…. 31 pages! Just kidding. I can barely muster 5000 words into a coherent blog post, let alone a 31 page proof.

Here it goes. Given that we have a ascending sorted collection. Our base case is a collection of 2 items, where the midpoint(end-begin)/2 will be equal to 0, when this is the case the value we’re looking for must be one of those two values, or does not exist. In the case where the collection is larger than 2, our midpoint is not equal to 0. There are three cases, if the value of our midpoint is equal to our value, it exists in our collection. If it is not equal, and the search value is less than the value of our midpoint then it can only exist in the bottom half of the collection. If the value is greater than the midpoint value, then it can only exist in the top half of the collection.

Well, it’s not much, and it’s hardly a mathematical proof, but it’s my best try. I think it illustrates the point of binary search as divide and conquer.

To close off this post, my plan was to illustrate the magic of the binary_search over the plain old linear search. However, like always, color me surprised. I went off an used the chronograph library to time the different search functions. I wanted to test std::find, std::binary_search, and htk::binary_search (recursive) to see how they fair across a vector of integers. So the test was as follows.

  1. Generate a random vector of size 10, 100, 1000, 10000, with values between 1-100
  2. Generate a random search value between 1 – 100
  3. Test 10, 100, 1000, 10000, 100000 times, each search function

As simple as this seems, it wasn’t as straightforward as I would have thought. My first hurdle was that my “random” generator function, wasn’t really random. The initial code looked like this.

template <typename T, typename VectorT>
VectorT random_numeric_vector(size_t size, T min = std::numeric_limits<T>::min(), T max = std::numeric_limits<T>::max())
{
    VectorT v;
    std::default_random_engine generator;
    std::uniform_int_distribution<T> distribution(min, max);

    for (size_t i = 0; i < size; ++i)
    {
        v.emplace_back(distribution(generator));
    }
    return v;
}

size_t next_random(size_t min, size_t max)
{
    std::default_random_engine generator;
    std::uniform_int_distribution<size_t> distribution(min, max);
    return distribution(generator);
}

A simple test case looked like this.

benchmark(s, "std::binary_search vector<int>(10)", [](session_run &r) {
    auto v = random_numeric_vector<int, std::vector<int>>(10, 1, 100);
    std::sort(v.begin(), v.end());
    auto value = next_random(1, 100);
    measure(r, [&v, &value]() {
        std::binary_search(v.begin(), v.end(), value);
    });
});

The result I was seeing, across the two searches were very, very strange. The linear search sat somewhere in the realm of 100ns, for every collection size. The binary search, was slower, somewhere in the range of ~150ns, and slightly higher for the larger collection, not much though. So I thought, what the heck? I chalked it up to something to do with optimization. So, I chased down some code to tell the optimizer to eff off, and I ran it again. Same result. Over an over. In hindsight, I probably should’ve checked the documentation, or been smarter. It took me putting a debugger on to look, and see. Every time I would generate a new “random” vector, it was the same random values. As well, each time I would generate a “new” random search value, it just so happened to be equal to the first element in the collection. Thus, my pseudo random number generator, using a new default_random_engine per call, was giving me the exact same set of random values. Which just so happened to make each random search value, not so random. In fact, it was the best case for a linear search, making it return in every iteration, and every vector size, immediately.

After all that. I fixed the issue by hoisting and reusing the default_random_engine, so I was getting a set of random values. So then back to timing it. After I ensured the values were random, and the test was definitely executing the code I wanted. I had hoped to see the results we talked about earlier. You know, where the linear search time grew exponentially as the collection grew exponentially. Then where the binary search time grew, and then as the collection got larger and larger, the time to search leveled off. A logarithmic graph.

Then this happened. What. The. Fox. Said? The one I least expected, std::find is fastest, almost in all cases. They’re basically equivalent up until a million items, where I would’ve expected the binary_search to prevail. The linear search, performing like that is very strange. Not only does it out perform both the linear searches (loop style & recursive), it also doesn’t require the input to be sorted, which on average took almost 33 ms to sort the 1 000 000 000 int vector. That is quite a bit of overhead to pay, just to be outperformed. What gives? This isn’t what my CompSci professor told me, they couldn’t have been lying could they? Obviously, it makes logical sense that a linear search, whereby you could potentially have to iterate all elements, would take more time than one where continually break the search in half. So, my professor wasn’t lying, and this result is just very counter intuitive. We can make a bet that std::find isn’t doing what we think. It just can’t be. Let’s prove that.

We are assuming it’s iterating the collection and checking, something like this.

template <typename IteratorT, typename ValueT>
bool linear_search(IteratorT begin, IteratorT end, const ValueT &v)
{
    for (; begin != end; ++begin)
    {
        if (*begin == v)
            return true;
    }
    return false;
}

Update 06.16.2020: After posting this story to reddit, it was pointed out by /u/staletic that std::find is not optimized by memchr. In fact, after I stepped through with the debugger, it was in fact, not doing this. I’m wondering if it’s just the sheer act of unwrapping the pointers to raw pointers that is making it that much faster. I’m going to run a test and see.

I ran the results of that test, and voila!

Size101001000100001000001000000
std::find249.341195.602205.812171.916266.472513.4
std::binary_search96.202129.227163.582206.146409.6851321.187
htk::binary_search108.538143.235161.051196.494504.581114.834
htk::linear_search85.808126.395421.9893577.37634773.36391704.3

So, the first graph looks a lot more like we would’ve assumed it to look, the linear search grows exponentially as our collection grows exponentially. It also gets trounced by even the recursive binary_search. Which even makes the cost of sorting the collection totally worth it. That said, if you’re able to not pay the sort cost, by maybe, maintaining sort order on insert. You have a very quick search on your hands.

So what gives with std::find, well the truth is in the details. It looks like for types that aren’t trivial types, it does a typical linear search, in the style we implemented. But, when it comes to trivial types, it uses a different mechanism. It uses a call to memchr, a C-api call that will find the pointer to a matching value in a set of data. Though I’m not sure how it’s implemented, I can tell you, it’s very quick. Meaning that if you have a collection of trivial types, it looks to be in your best interest to favor std::find for searching the collection, and not worry about having to sort the collection.

Alright — I think I kept my promise. That was long winded, but a surprisingly interesting journey. We scratched the surface of micro-benchmarking, by understanding how to apply it to the work a vector does. As well, we took a stab at implementing (and proving) a binary_search algorithm. Then, we got to see it in action, and how it performed (or should I say got out performed) against std::find and std::binary_search, then we saw how a true linear search really compares to a binary search, when they’re playing the same game. Overall, this was a very interesting walk down vector trail. I hope you enjoyed the read, and look forward to seeing you next time!

Until then, stay healthy, and Happy Coding!

PL

A sudden bold and unexpected question doth many times surprise a man and lay him open.

Francis Bacon

References
https://www.youtube.com/watch?v=zWxSZcpeS8Q&feature=youtu.be
https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=2441
https://www.bfilipek.com/2016/01/micro-benchmarking-libraries-for-c.html
https://github.com/google/benchmark/blob/c078337494086f9372a46b4ed31a3ae7b3f1a6a2/include/benchmark/benchmark.h
https://github.com/rmartinho/nonius

Don’t try this at work – Vector Pt. 1

Welcome back! I hope you’re enjoying quarantine. I know for me, not being able to get out climb has definitely had an impact on my mental health. Oh, and my waistline. That being said, I’ve had more time to get into the code, which is a piece of my own little paradise.

As I mentioned in my last write-up, I’m trying to bring these posts back to basics, and give people more of what they’re interested in reading. Though, it’s very likely that the people find those posts more interesting, because I had more fun writing them. Reverse engineering the C++ unit test framework brought me immense joy, because I was really excited by it, and to write about it. With that in mind, I spent a good long while trying to figure out what I should post next. I haven’t had the itch to deconstruct anything lately. I decided to rack my brain, and remember what the original intent of my blog was. To demystify code, to remove the “magic” behind it. I’ve always been curious about the way things worked. As developers, we shouldn’t be afraid of the magic behind libraries, we should embrace it. We should strive to understand at least peripherally how the code we use works. I once made a comment on Reddit, to a untrained junior developer looking for tips on how to further his career. The tip was that they should take their work code home, and try to recreate it. For me, recreating code I didn’t understand was always the best tool to further my understanding. You start small, and you build, as you run into problems you start to understand why the author did the things they did. You can start to form opinions about the decisions they made, as you understand the problems they faced. Needless to say, this comment was down-voted into oblivion, and someone even suggested that I was instructing OP to steal their employer’s code. (Spoiler: I wasn’t.) Reddit not agreeing with me didn’t change my mind. I still think that reverse engineering something is the best way to understand it. If you’ve got access to the code on a laptop, I’m sure your boss wouldn’t mind if you took the initiative to understand company code better. Just don’t go selling your companies code.

So, that’s sort of what inspired this series, Don’t try this at work. In my personal experience, I’ve noticed a commonly occurring problem, people experimenting at work. I’m sure we can all agree that writing code, is as much an art, as it is a science. The problem is, that often times if people don’t have an outlet outside of work, they’ll use the company code base as their canvas. This can be problematic for smaller companies, who don’t always have the capacity and resources of a larger company to catch these things. These “works of art” can sometimes make it into production code, thus needing to be maintained for the lifetime of the application. This type of exploratory code is often coined “toy” code, it’s code you definitely shouldn’t be using at work. Hence, Don’t try this at work. In the Don’t try this at work series, I want to demystify the one of the most commonly used classes, in the most common library, used by almost all C++ developers. The Standard Template Library.

In this, the first of the Don’t try this at work series, I’m going to be looking at something every C++ developer, regardless of experience, uses (or at least knows of). Just because we all use it, doesn’t mean we understand it. So without further ado, let’s dive into the vector!

You can find all the code I write for this series here, if you have any questions or comments feel free to shoot me an email.

The STL vector is the probably the most ubiquitous container used in C++. If you need a generic dynamically sized container, the vector is the first tool you should grab from the STL’s bag of tricks. It’s a thin wrapper over contiguous memory, providing immense usability with little overhead over a raw array. It can easily be used with STL algorithms, and fits most cases where a dynamically sized container is required. It features constant time random access, amortized constant appending and removal to the end, and linear insertions and deletions. In short, if you don’t use vectors, you should.

Warning: Because a vector is a generic container, we’re going to be doing template programming. So if you’re faint of heart, stop reading now.

Where do we start? Well, the vector is essentially a wrapper over an array, that will expand when the array runs out of space. So, something simple probably looks like this.

template <typename T>
class vector
{
    T* data_;
    size_t size_;
    size_t cap_;
public:
    vector()
        :data_(new T[10]),size_(0),cap_(10)
    {
    }

    void push_back(const T& value)
    {
        if (size_ == cap_)
        {
            grow();
        }
        data_[size_++] = value;
    }

private:
    void grow()
    {
        T* tmp = new T[cap_ * 2];
        for (int i = 0; i < size_; ++i)
            tmp[i] = data_[i];
        cap_ = cap_ * 2;
        auto data = data_;
        data_ = tmp;
        delete data;
    }
};

Pretty simple right? The basics of a vector are that, we have an array, the count of objects (size), and the capacity. When we push a new object, we check to see if we have space, if not we’ll grow, and then push that item on the end. When we do have to grow, we allocate double the capacity, copy all the items over, and then delete the old array. Neat huh? The problem is, that this is toy code. It’s inefficient, it requires that objects be default constructible, and not to mention it isn’t at all safe. It gets the point across, but it’s not really rocket science, or STL science for that matter. This code is a Fisherprice vector. It’s light-years away from being a real vector. This is your 3 year old’s vector. We want a real vector. We want a vector that looks and feels like the vector you use from your compiler writer.

Where do we even start? Well, the first thing would be to maybe look at a real vector implementation. Cooking is also an art form, and when you’re learning how to cook something new, you generally use a recipe. So, let’s start with a recipe.

I’m a Microsoft guy, through and through. So naturally, my first thought was to go look at the files in VS 2019. So I cracked open Visual Studio, opened a new C++ file, typed #include <vector>, right-clicked “Go to Document <vector>”, and voila. Sorry vim folks, I’m not sure if you can do that, BUT BASICALLY I RIGHT CLICKED AND OPENED THE VECTOR FILE. Have you ever looked at the MSVC’s implementation of anything STL? Well, be thankful when “Just my code” works when you’re debugging. Here’s a snippet. (I didn’t write this code, it’s directly from MSVC vector)

Note: If you don’t use Visual Studio to code and debug. I truly feel sorry for you.

// vector standard header

// Copyright (c) Microsoft Corporation.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// ...
template <class _Ty, class _Alloc = allocator<_Ty>>
class vector { // varying size array of values
private:
    template <class>
    friend class _Vb_val;
    friend _Tidy_guard<vector>;

    using _Alty        = _Rebind_alloc_t<_Alloc, _Ty>;
    using _Alty_traits = allocator_traits<_Alty>;

public:
    static_assert(!_ENFORCE_MATCHING_ALLOCATORS || is_same_v<_Ty, typename _Alloc::value_type>,
        _MISMATCHED_ALLOCATOR_MESSAGE("vector<T, Allocator>", "T"));

    using value_type      = _Ty;
    using allocator_type  = _Alloc;
    using pointer         = typename _Alty_traits::pointer;
    using const_pointer   = typename _Alty_traits::const_pointer;
    using reference       = _Ty&;
    using const_reference = const _Ty&;
    using size_type       = typename _Alty_traits::size_type;
    using difference_type = typename _Alty_traits::difference_type;

private:
    using _Scary_val = _Vector_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Ty>,
        _Vec_iter_types<_Ty, size_type, difference_type, pointer, const_pointer, _Ty&, const _Ty&>>>;

public:
    using iterator               = _Vector_iterator<_Scary_val>;
    using const_iterator         = _Vector_const_iterator<_Scary_val>;
    using reverse_iterator       = _STD reverse_iterator<iterator>;
    using const_reverse_iterator = _STD reverse_iterator<const_iterator>;

    vector() noexcept(is_nothrow_default_constructible_v<_Alty>) : _Mypair(_Zero_then_variadic_args_t()) {
        _Mypair._Myval2._Alloc_proxy(_GET_PROXY_ALLOCATOR(_Alty, _Getal()));
    }

// ..

Mmhmmm. Yup. Feast your eyes. If the code I wrote was Fisherprice, this is Dewalt. It’s safe, efficient, and it works. The reason that we never question much about things in the STL, is because they almost always work the way we expect them to work. The authors of these libraries deserve praise. So if you’re an STL developer, you have my sincerest thank you. The problem is, that this code is so dense, and I for one am not smart enough to understand what’s going on at a glance.

Shout out to @StephanTLavavej, and all others on the Visual C++ team for making our lives easier.

What do you think a _Scary_val is? I actually have no idea.

Okay — so that recipe might as well be in Cuneiform, or French. Neither of which I’m particularly talented at reading. We need a recipe we can understand. Thankfully, I did some digging, and I was able to find something. Located in the depths of the internet, here, I found Alexander Stepanov’s implementation of the STL. If you don’t know who Alex is, he’s the father of the STL. The man is a living legend, to him we owe a massive thanks. Anyways, his implementation of vector, is clean, simple, and readable. It’s something we can easily consume. The only problem is, that it’s from 1994. Do you know what this means? It means it was written when mullets were popular. I would know, I had one. It also means, that it was written before any of the cool modern C++ existed. It was written before C++ was even standardized!! Talk about using Grandma’s recipe. It’s fine, we’ll make due, things haven’t changed that much, have they? Until I find a better resource, this is the best I’ve got. (Edit: I actually found another good resource, a bit into my experiment, the EASTL). I’m going to use a hybrid between Stepanov’s vector, the CPP reference standard, and whatever I can figure out from MSVC. Then I’ll sprinkle a little bit of my knowledge, and bingo, we’ll have a real working vector.

Let’s start by defining what we’re working towards. First, we want a standard vector. This means we want it to fulfill the standard for what a vector is. It needs to work with standard algorithms, which we’ll eventually test. Lastly, we’re going to compare it performance wise to MSVC’s implementation.

From cppreference.com, a vector is a sequence container that encapsulates dynamic sized arrays. It’s declaration looks like this.

template <typename T, typename Allocator = std::allocator<T>>
class vector;

So, the template declaration is a bit different from the toy code we looked at earlier, and Stepanov’s vector. It has this weird thing called an Allocator. So you might be asking, “what’s an allocator, and why is it a template argument?” Let’s address the first question, an allocator encapsulates your allocation mechanism. This means, it hides away how the memory is getting allocated, behind a constant interface. To answer the second question, it’s a template argument, so that you can supply what type of allocator your vector will use. A huge part of what makes the STL so nice, is that it’s very very flexible. It works against all your custom hand rolled classes, and it allows you to control how allocations happen for standard containers. It does this through the Allocator template argument. Why would anyone use a custom allocator? Well — a few reasons come to mind. If you’ve got memory constraints. Or maybe you have a highly optimized use for your vector. You might want to use a custom allocator for these cases. Maybe you noticed that Stepanov’s vector implementation doesn’t have the allocator template either. However, if you read deeper, he uses a static member allocator that’s defined at the top section of the code.

If you recall back to the toy code, we used “new T[]” directly to allocate our array. This is problematic for two reasons. It forces the user of our toy vector, to use the runtime’s default allocation mechanism and it will call the default constructor on all of the items in the array. So we force our library user to have a default constructor on all their objects and use the default allocation heap. No bueno. The Allocator interface breaks the responsibility of allocation and construction, into two separate calls. This way we can control when the memory gets allocated for our objects, and when our objects get constructed. The member functions of an allocator are:

template <typename T>
struct allocator
{
    allocator()=default;
    ~allocator()=default;

    // return the address of an object
    T* address(T&);

    // allocates sizeof(T) * count
    T* allocate(size_t count);

    // deallocates the storage pointed to by p, must have been obtained by a call to allocate
    void deallocate(T *p);
    
    // construct - a given object in place of p (deprecated C++17, removed C++20)
    template <typename ...Args>
    void construct(T *p, Args &&args);
    
    // destroy - destruct the object pointed to by p (deprecated C++17, remove C++20)
    void destroy(T *p);

};

Note: I’ve chosen to use the deprecated functions construct / destroy. They’ve been replaced by allocator_traits, and honestly, I just didn’t want to go that far down the rabbit hole for this first post. So my vector takes advantage of them. For now.

So given that we can’t have a vector, without an allocator, we should probably start our journey there.

For brevity, I’m not going to write out the entire implementation, but I will write out one of the important functions, allocate. Taking a look at Stepanov’s implementation.

/*
 * Copyright (c) 1994
 * Hewlett-Packard Company
 */
template <class T>
inline T* allocate(ptrdiff_t size, T*) {
    set_new_handler(0);
    T* tmp = (T*)(::operator new((size_t)(size * sizeof(T))));
    if (tmp == 0) {
	cerr << "out of memory" << endl; 
	exit(1);
    }
    return tmp;
}

It’s pretty simple, we just use ::operator new, to allocate some space of the size multiplied by the size of our type. If we fail (returns 0), we write to the standard error, and exit the application. We what? We write to the standard error, and exit the application. Hmmm. I would’ve expected this to use exceptions, but my best guess is that in 1994, exceptions were not yet widely accepted. Also, at the time, if you ran out of memory, what was a guy to do? Other than to print to standard error, and crap itself. Well — in this day and age, that isn’t appropriate. We’ve got something up our sleeves known as exceptions. Exceptions are a fundamental piece of the STL, and they’re perfect for this very thing. The only problem is that, in this little experiment we’re doing, we have to implement everything that isn’t bare metal C++ (mostly). So, we’ve gotta implement standard exceptions.

Thankfully, exceptions aren’t that complicated. They’re basically a wrapper over a constant character array, or in other words, an immutable string.

class exception
{
    const char * const message_;
public:
    exception()
        :message_(nullptr)
    {
    }

    exception(const char* message)
        :message_(reinterpret_cast<const char *>(::operator new(strnlen(message,256))))
    {
        if (message_ == nullptr)
            exit(-1);
        strncpy_s(const_cast<char *const>(message_), 256, message, 256);
    }

    exception(const exception& e)
        :exception(e.message_)
    {
    }

    ~exception()
    {
        ::operator delete(const_cast<void*>(reinterpret_cast<const void*>(message_)));
    }

    virtual const char* what()
    {
        return message_;
    }

};

class bad_alloc : public exception
{
public:
    bad_alloc()
        :exception("bad_alloc")
    {
    }
    bad_alloc(const char *message)
        :exception(message)
    {
    }
};

So, now that we have an exception to throw, we can use it in our allocator and have the allocation look something like this.

pointer allocate(size_t count)
{
    T* temp = static_cast<T*>(::operator new(count*sizeof(T)));
    if (temp == nullptr)
    {
        throw bad_alloc();
    }
    return temp;
}

Once we have our allocation function, the rest of the allocator is pretty self explanatory. Mine came out looking something like this.

template <typename T>
struct allocator
{
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = size_t;

    allocator() = default;
    ~allocator() = default;

    // address
    pointer address(reference x) const
    {
        return &x;
    }

    const_pointer address(const_reference x) const
    {
        return &x;
    }

    pointer allocate(size_t count)
    {
        T* temp = static_cast<T*>(::operator new(count*sizeof(T)));
        if (temp == nullptr)
        {
            throw bad_alloc();
        }
        return temp;
    }

    void deallocate(pointer p, size_type n)
    {
        ::operator delete(p, n * sizeof(T));
    }

    size_type max_size() const
    {
        return max(size_type(1), size_type(UINT_MAX / sizeof(T)));
    }

    template <typename ...Args>
    void construct(reference dest, Args&&... args)
    {
        ::new (static_cast<void*>(&dest)) T(htk::forward<Args>(args)...);
    }

    void destroy(pointer p)
    {
        p->~T();
    }
};

Basically, the calls are thin shims into the default C++ mechanism for allocation (new) and deallocation (delete). Then the construction uses placement new to construct the object in a given destination, and the destroy call explicitly calls the destructor.

If you’re still following along, before we even started writing our vector, we needed to create some foundations. I searched and searched, but I couldn’t find instructions on how to implement an STL. What I mean by that, is where do you even start? Obviously classes in the STL depend on other classes in the STL, but where’s the leaves? I wish I could say I found that. So my best bet it to just pick a common class, and implement. So thus far, we have the basic exception, we have a bad_alloc exception, and we have the default allocator. All so we can start on our vector.

In order to keep this post from becoming my first novel, we’re just going to focus on implementing one method on the vector. We’ll leave the default construction for now, and we’ll look at the emplace_back function. This is a function that allows for argument forwarding, into direct construction at the back of the vector. It’s a really nice starting point. It seems rather straightforward. I’ll soon find out, it isn’t.

So the signature of emplace_back is something like this. It takes a variadic set of arguments, and returns the object it created. The decltype(auto) specifies that the compiler is to figure out the return type based on what we’ve returned.

template <typename ...Args>
decltype(auto) emplace_back(Args&&...args)
{
    ensure_space_at_least(1);
    allocator_.construct(*(data_.last++), htk::forward<Args>(args)...);        
    return back();
}

This code though, is pretty straightforward. We first ensure we have enough space for 1 element. Then, we use our allocator to construct the element in the last position, and increment the pointer.

You might be asking about this strange htk::forward<Args>(args)? Well, I debated going into it, and going down that rabbit hole. But decided against it. What std::forward or htk::forward does, is passes arguments exactly as they were passed into the function. This means that if we’re passing r-value references into emplace_back, we’re forwarding those as-is to the allocator construct call.

What does ensure_space_at_least(1) do? In our toy vector had some code to grow and copy our existing objects. That’s effectively what this is doing. We’re going to ensure we have space for at least one element. If we do we can just create the new item, right at the end of that memory we allocated earlier. This illustrates separating allocation from construction. We allocate space which hasn’t been initialized, then when someone pushes into our vector, we just initialize that memory!

void ensure_space_at_least(size_t count)
{
    if (initialize(count))
        return;
    // we need to see if we have enough space.
    if (freespace() < count)
    {
        const auto cap = calculate_growth(count);
        grow(cap);
    }
    // we good.
}

So the first line, is just to initialize the vector. Since with this vector, we’re starting out with no memory allocation. The first call will just allocate some memory, and return. If we have enough memory allocated, we don’t do anything. Where we’re really interested, is when we don’t have enough memory available. Our growth algorithm, is just going to keep doubling the capacity until we can fit the count. So that’s what calculate_growth(count) will do. Just return us the next capacity that’ll fit our count. Then we need to grow our memory.

In order to grow the memory, we need to do a couple things. First, we need to allocate a contiguous block for our new capacity. Then we need to transfer all existing items into the new memory block. Then we need to free our old memory.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
           
    move_items(data_.first, data_.last, new_vec);

    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

First thing we do, is to allocate the size of our capacity. Then we call move_items, which takes the first, and last element, and a pointer to the new destination. Then in move_items, we just iterate from first to last, and construct the old item into the destination, and increment it. Then, we’ll assign our new vector pointer to our internal data pointer, then destroy all the old items, and deallocate the memory. With that, we have a pretty simple growth algorithm that copies from our old vector into the new space, and frees up the old items.

My initial thought was to type “we’ve got a problem here“, like we only have one. There’s a boatload more than one problem in this code. This folks, is why being a library developer is hard. At this point, we’re still only writing the basic algorithm. We haven’t taken into account safety, or optimization. Just thank your god, that you don’t have be a library developer. (unless you want to be, you masochist.)

The first problem we’re going to address, is the fact that we only support copying items. This means, that every time our vector expands, we’re going to copy all the items over. For large types, this can be costly, and as of C++11 we don’t need to copy the items. We want to take advantage of move semantics. Move semantics are a pivotal part of smart pointers, and optimizations for things like the vector, because they effectively allow you to move the guts from one object to another. This is primarily useful when you have a temporary, or an item you know will be going out of scope, and do not want to copy. It’s done by defining a “move constructor” (see Rule of 5).

If you think about an object like a vector, that has an internal pointer to some array of objects. In C++98, when we pass that object by value (a copy), to be correct, we would need to allocate enough space, and copy all the items over into the new vector. This means then, we would have two copies of our vector. If you were passing this into a function, that vector you copied would then be destroyed. This is very wasteful. This also would happen when you would return by value. It’s obvious today, that we need a way to say “I’m only temporary”, and when it’s a temporary, we can commandeer that temporaries pointer as our own, set his pointer to nullptr, and let him go out of scope. This way we’ve moved the object. Before we had move semantics we had a lot different techniques, like reference out parameters, and copy-on-modify semantics. With move, those workarounds aren’t necessary.

Obviously, we want to take advantage of move semantics when we copy our vector. That way, if the class we’re holding is optimized for move, we will use that and not copy it. As well, we can hold on to classes that have only move semantics, and not copy. Things like resource holders. So how do we do that?

In order to take advantage of move semantics, we need to use std::move (or in our case htk::move), which will convert an object reference into an r-value reference.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
   if(has_move_semantics())
   {
         move_items(data_.first, data_.last, new_vec);
   }    
   else
   {
         copy_item(data_.first, data_.last, new_vec);
   }

    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void copy_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

It’s that simple! Pfft. Library implementation is easy…. Just kidding. This probably is not that easy. If this way did work, we would be doing the check at runtime! Because whether or not a type can be moved, is based on the class definition (does the class implement a move constructor?). We can do this check at compile time. This means, that when the vector code is generated for contained type i.e. <T>, we know whether we want to move or copy it.

If you’ve ever programmed in a language that doesn’t have function overloading. You’ll know how important it is. Because it lets you specify a function with the same name, but different arguments, and then the compiler can choose the best function to call, based on the type you’ve provided it. This is great, because then we can write one algorithm, and based on the type the compiler can select the best function to call. When we’re talking template code, the compiler will only call the required function. This may seem like a digression, but it’s not, and let me show you why.

template <typename T>
class adder 
{
    T value_;
    public:
    adder(const T &value)
    :value_(value)
    {
    }
    void add(const T& value)
    {
        add_internal(value);
    }
    const T& get() const
    {
        return value_;
    }
    private:
    void add_internal(const std::string &s)
    {
        value_ += " + " + s;
    }
    void add_internal(int val)
    {
        value_ += val;
    }
};

This code isn’t really useful at all, but it’s just to illustrate the point, that based on the type of our template, we can call different functions. I made a compiler explorer here. You can see in the compiler explorer, that when we use a std::string, we call the appropriate add_internal. Can you see where this is going? We want to call different functions based on traits of the class we’re containing. So back to our grow code.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, /* something here */);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, /* argument when movable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, /* argument when copyable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

We want to take advantage of overloading, and the template system, to call the right call based on our type. This is a technique called “tag_dispatching“, we want to get the compiler to pick the right function, based on the template type we’re using. Or, at least that’s what we’re trying to do. 🙂

I’ve been eluding to “traits about a type” without outright calling it what it is. To a library author, I can only assume that type_traits are a bit like paint, to a painter, or gasoline, to an internal combustion engine. (or arsonist?) Type traits (type_traits) are part of the STL, and they allow you to get compile time answers to questions about the types that the template code is working on. In our case, we want to ask the question “is this class moveable”? and based on the answer, we’ll have the compiler call the right move_items call.

In the STL, these “questions” are generally answered by small helper structs called type_traits. In general, they take a type argument (or a few), and give back a compile time constant boolean value. You can ask questions like:

std::is_reference<T>;
std::is_pointer<T>;
std::is_member_pointer<T>;

Honestly, there’s a lot, if you can think to ask it about a class, it’s probably a type_trait. Type traits are a build up of little tiny structs, that enable huge possibilities. The root of most type traits, is this thing called a bool_constant. It’st just a small struct that has a boolean as a compile time constant. With this, we can define two types (obvs) std::true_type && std::false_type. These classes are aliases of bool_constant<true> and bool_constant<false> respectively. So let’s go back to the code.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, /* something here */);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, htk::true_type /* is moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, htk::false_type /* is not moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

You can start to see how we can get the compiler to choose which function to call. If we want to move the object, we use htk::true_type as the tag that we want to move, and htk::false_type when we want to copy. The last piece is asking the question! Thankfully, in the STL’s bag of tricks there’s a helpful trait called std::is_move_constructible<T>.

Since this blog is about demystification, this is the last piece (for now) of wool we need to lift off our eyes. How do we implement is_move_constructible<T>? I’ll be totally honest with you. I had 0 idea. My first thought was I needed to some how have the compiler test call the move constructor with the arguments, and see if that could compile, if not fallback to false_type, this is something called SFINAE. (I’m definitely not an expert, and I have little confidence I could explain it without a significant amount of research.) So after some thinking, I decided I need to phone a friend. So I pulled up VS again and dove into type_traits.

// STRUCT TEMPLATE is_move_constructible
template <class _Ty>
struct is_move_constructible : bool_constant<__is_constructible(_Ty, _Ty)> {
    // determine whether _Ty can be direct-initialized from an rvalue _Ty
};

That’s it. No crazy template magic. Just 3 little lines. Wemp. Womp.

But wait! What the heck is that __is_constructible(_Tv, _Ty)? Yeah. That had me too. I actually spent longer than I’d like to admit, trying to figure that *expletive deleted* out. It wasn’t until I found the EASTL, that it dawned on me. Well, there’s a line that gives it away.

#if EASTL_COMPILER_INTRINSIC_TYPE_TRAITS_AVAILABLE && (defined(_MSC_VER) || (defined(EA_COMPILER_CLANG) && EA_COMPILER_HAS_FEATURE(is_constructible)))

I’ll let you see their code when the compiler doesn’t do it, for yourself here. It’s really quite fascinating and very impressive. Thank goodness for Microsoft and the compiler writers, because they made that job easy. We basically have a compiler intrinsic function, that will answer that question for us! Let’s go back to our code, so we can finish this off.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, htk::is_move_constructible<T>::value);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, htk::true_type /* is moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, htk::false_type /* is not moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

With that we’ve got our vector growing, and choosing the correct function for copying the old items, based on the type specified to the vector. Cool!

It’s been a journey, we waded (deeper than we may have wanted) into the world of the STL. We uncovered things like allocators, exceptions, and type_traits. We also looked at how we can leverage tag dispatching to ensure we’re doing the right thing based on the type at hand. We saw that library development past a toy, is hard. We didn’t even get into exception safety. Which admittedly complicates things passed something that’s understandable without writing a novel. I can only hope to be able to dig into that one day, until then we can let it remain magic. For now, we have the start of a vector, and some other useful tools in our just heavier than a toy STL.

I hope you enjoyed the opener in this vector series. Next up, we’ll be making our vector work with the STL algorithms. Then lastly, we’ll pit our vector against Microsoft’s, in a battle of the…. Well battle of the vectors. To see just how poorly this vector performs against the professionals.

Until then, stay healthy, and happy coding!

“Life is really simple, but we insist on making it complicated.”
― Confucius

References
https://en.cppreference.com/w/cpp/container/vector
http://stepanovpapers.com/butler.hpl.hp/stl/stl/VECTOR.H
https://github.com/microsoft/STL/blob/master/stl/inc/vector
https://github.com/electronicarts/EASTL
https://foonathan.net/2015/11/overload-resolution-3/


The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 3) – Exceptions

Again, if you haven’t already done so I suggest reading through Part 1 and Part 2 of the series. I’ll do a quick TL; DR; recap, but it may not do it justice.

What feels like ages ago, I started on a journey to implement a clone of Microsoft’s C++ Unit Test Framework Test Adapter, but fix some issues I had with it. These included a better reporting mechanism for standard C++ exceptions, and better error messages when libraries were failing to load. Typically this was due to missing dependencies. Part 1 of the series explained Microsoft’s technique to for exposing a dynamic set of classes and functions, for the test framework to call. This is typically solved with reflection, but given that there is no standard mechanism for reflection in C++, some neat tricks in the binary sections were played. Part 2 explores the second step, after discovery — execution. How can we take the metadata information, and actually have it run some code? It’s essentially a plugin system, in a plugin system. Part 2 left off where we were able to actually execute our test methods, but when an Assertion would fail the entire framework would collapse on itself.

I lift my head from my hands, wipe the tears from my eyes, and get real with the situation. I’ve started down this path of re-implementation, why stop now?

Among the package that is shipped with Microsoft’s C++ Unit Test Framework, there is a file called Assert.h. This is the header that you’re to include, in order to make assertions in your tests. Asserting is a critical part of Unit Testing. The three portions of a unit test are essentially:

  1. Setup
  2. Act
  3. Assert

There’s fancy acronyms describing this (AAA), I  prefer mine (SAA). In a general sense, in your test you will do these three things. You setup the test to have your code under test in the state it needs to be. You run the code you want to test. Then you Assert that what was supposed to happen happened, and potentially Assert that other things that weren’t supposed to happen, didn’t. Though this becomes a slippery slope into a rabbit hole. That being said, asserting is a very important part of unit testing, arguably the most important part. So we’ve hit a bit of a conundrum, because negative assertions are the part that tell us when our code isn’t working correctly. Anyone who is experienced with TDD will tell you there is more value in the red lights than the green ones. That means it’s a problem when our framework crashes and burns on a failed Assert.

So, to follow suit with re-implementation. I decided that I would just do away with Microsoft’s Assert library, in favour of my own. Obviously this is the best idea possible. Don’t get me wrong, it’s not like I didn’t try to figure out what was happening with their Assert library. The problem is there is less “public” information in it. Meaning that unlike the CppUnitTest.h file where a lot of the work was in the header because it was template code, most of the assertion code, lived in a compiled binary. The only code in the Assert.h file was template code for comparing generic types. It means I had no real way to figure out what they were doing. All I knew is that whatever they were doing, was crashing my application, and it worked for theirs. So I’ll make one that works with my framework. Now, you might be thinking.

Of what value is re-implementing the Microsoft C++ Unit Test Framework? Is there any actual value in now re-implementing part of their library. 

The answer is probably not, but if you’re curious like me, you like to figure out how things work. The way I do this, is I look at something and implement it myself. If I can do that, I typically run into the problems that the original author ran into, and then I can understand why they solved the problem the way they did. I’ve approached software development this way for my entire professional career, and I’d like to think it has paid dividends.

In all honesty, how hard could writing an assertion library be? Like, basically you only check a few different things. Are these two things equal? Are these two things not equal? Is this thing null? Is this thing not null? If the test passes, don’t do anything. If the test fails, stop executing and barf out some kind of message. If we ponder this for a moment, can we think of something we could use to halt execution and spit out some form of a message? I know! I’ve written this code a lot.

if ( !some_condition )
    throw condition_violated_exception("Violated Condition");

Well, exceptions look like a good place to start for our assertion library. So that’s where we’re going to start. The plan is essentially to have a bunch of Assert calls, that when they fail will throw an exception. Easy right? The code could look something like this.

// MyAssert.h

#include <AssertViolatedException.h>


namespace MyAssert
{
    template <typename T>
    static void AreEqual(const T& expected, const T& actual)
    {
        if( expected != actual )
           throw AssertViolatedException(fmt::format("{0} != {1}", expected, actual));
    }
};

By no means is this complete, it’s really just to illustrate the point that when we have two objects that aren’t equal we generate an exception with an appropriate message.

God it feels good to be a gangster.

Now we can go about our mission, we’ll just use MyAssert.h, and completely disregard Microsoft’s version Assert.h. Given that we’ve implemented so that any escaped standard C++ exception will end up in the framework’s handler. I can guarantee that the assertions will end up there. Right? Given that I didn’t show a snip of AssertViolatedException.h, you can assume that it’s derived from std::exception. If you’re interested in how I actually implemented it, you can find the file here. It has a little bit more complexity, for capturing line information, but largely it’s the same idea.

I’m sure this is EXACTLY how Microsoft would’ve done it.

After we’ve implemented this, we can use it in the same way that you would if you were to use the Assert library included in Microsoft’s framework.

#include <MyAssert.h>
#include "calculator.h"

TEST_CLASS(TestCalculator)
{
public:
    TEST_METHOD(TestAdd)
    {
          calculator c;
          int val = c.add(2, 2);
          MyAssert::AreEqual(4, val);
    }
};

This is great, and it works, for the most part. It works for this case. Can you see where it falls down? If we recall, we’re using a standard C++ exception for our assertion. Unfortunately, the compiler doesn’t care whether or not the exception begins from our MyAssert class or any other place in the application. This means, that any handler prepared to handle a std::exception, will catch our assertion.  Consider this code.

#include <functional>
class calculator
{
   std::function<int(int, int)> on_add_;
public:
     template <typename AddFnT>
     void on_add(const AddFnT &on_add)
     {
           on_add_ = on_add;
     }

     int add(int a, int b)
     {
          try 
          {
                return on_add_(a, b);
          }
          catch(const std::exception &e)
          {
                 return 4;
          }
     }
};
#include <MyAssert.h>
#include "calculator.h"

TEST_CLASS(TestCalculator)
{
public:
    TEST_METHOD(TestAdd)
    {
        calculator c;
        c.on_add([](int a, int b)
        {
               MyAssert::AreEqual(2, a);
               MyAssert::AreEqual(2, b);
               return a + b;
        });
        int val = c.add(2, 22); // see the error here? Tyop.
        MyAssert::AreEqual(4, val);
    }
};

This isn’t by any means good code, nor does it really make sense. Everyone knows developers are notorious for cutting corners to save time, and someone decided 4 was a good error code, and someone made a typo in the test. The unfortunate thing about this, is that it passes. The light is green, but the code is wrong. Now, you’re saying “no one would ever do this in their right mind.” Consider the case where you have a layer between your business logic, and your database. You call the business logic function, it does some work and it passes the values to the store function. A way to test to make sure you’re populating the DB with the right values, is to abstract the database layer and intercept at that level. You also, likely want some error handling there as well. If an exception was to throw from the database. So there you go, at this point our Assert library falls down, hard.

It’s been a long read, and you may feel like you’re getting ripped off at this point, because I haven’t really explained much. Realistically, we actually just learned a whole lot. So I encourage you to keep reading. Microsoft has an Assert library that you can use to make assertions about your program. The generally accepted form of “assertions” within an application is an exception. Microsoft can’t use standard exceptions in their framework, because it could interact with the application under test. I just proved that, by trying it. So what the hell did they do? Well the application crashes, that means something is happening.

Most modern day programmers are familiar with exceptions, most people just see them as the defacto standard for error handling. (I really want to get into talking about return values vs. exceptions for error handling, but that will take me down a rabbit hole.) To keep it short, exceptions in various languages allow for your normal program flow to be separated (for the most part) from your exceptional program flow. Wikipedia, defines exceptions as “anomalous or exceptional conditions requiring special processing”. If you’re familiar exceptions and handling them, you’ve probably seen try/catch before. If you’re familiar with modern C++, you probably at least know of this concept. What you might not know, is that the C++ standard only define what an exception is, not how it is implemented. This means that the behaviour of exceptions and handling them in C++ is standardized, but the way that compiler vendors implement them is free game. Another thing people might not know is languages like C and older languages, don’t have a concept of exceptions.  An exception can come from a custom piece of code, like one above where we throw the exception OR from somewhere deeper down maybe it’s a hardware exception like divide by zero, or out of memory exception. The end result in C++ is the same. Hence why it’s called a “standard” exception. The algorithm is pretty simple. Find an appropriate handler, and unwind the stack until that handler, call the handler. You ever wonder how though?

Well, low level errors like divide by zero, will come from the hardware, generally in the form of an interrupt. So how do we go from a hardware level interrupt to our C++ runtime? On Windows, this is called Structured Exception Handling (SEH). It is Windows concept of exceptions, within the OS itself. There’s a good explanation of this in the book Windows Internals – Part 1 by Mark Russinovich. At a high level, the kernel will trap the exception, if it can’t deal with it itself, it passes it on to user code. The user code, can then either A) deal with the exception and report back stating that, or B) tell the kernel to continue the search. Because this is at the OS level, this is not a language specific thing. So runtimes built on Windows, will use this to implement exceptions within the language. This means that MSVC uses SEH to implement C++ exceptions within the C++ runtime. Essentially the runtime generates a Structured Exception Handler for each frame, and within this the runtime can search for the appropriate handler in the C++, unwind the stack and call the destructor of the objects, then resume execution in the handler. Obviously, these generated Structured Exceptions are well known for the C++ runtime, so it can know how to appropriately deal with the exception.

What if Microsoft was using a Structured Exception for their assertion? The behaviour lines up with that hypothesis, in that something is generated on failed assertion that crashes the application. In SEH, if there isn’t an appropriate handler found the application will be terminated.  How can we prove that? Well it turns out it was easy. Though it’s not recommended. Microsoft recommends if you’re using exceptions in C++ you use standard exceptions, but there is Windows API that you can use in your code to do SEH.

#include "Assert.h"
TEST_METHOD(AssertFail)
{
   __try
   {
       Assert::AreEqual(0,1);
   }
   __except(EXCEPTION_EXECUTE_HANDLER)
   {
       Logger::WriteLine(L"Gotcha!");
   }
}

After we compile and run this code it’s pretty obvious what’s going on. When the Assert::AreEqual fails, we land smack dab in the handler. So I guess that mystery is solved. We just need to figure out how and where to do the exception handling. Now, the __try/__except/__finally APIs are built into C++ on Windows, and allow us to basically install a frame based exception handler. They work very similar to the way you would expect a try/catch to work. After some research I decided this wasn’t exactly what I wanted. I wanted to be able to catch an exception regardless of stack frame. I stumbled upon Vectored Exception Handling. This is an extension to SEH, that allows you to install a handler that gets called regardless of where you are. So you can globally register

The solution then is rather straight-forward. We just need to register an Exception Handler, when the exception throws we can catch it, record the error and continue on our way.  If you read the code in the repository, I had to go through a bunch of layers of indirection to actually get the message to pass to the Test Window Framework. That’s because the architecture of the application has a .NET component, a .NET/CLI component and a pure native component. So for sake of simplicity the way that I went about handling the exception was like this, but not exactly this.

LONG TestModule::OnException(_EXCEPTION_POINTERS *exceptionInfo)
{
    //  if the code is the MS Unit test exception
    if (exceptionInfo->ExceptionRecord->ExceptionCode == 0xe3530001)
    {
        NotifyFailure(reinterpret_cast<const wchar_t*>(exceptionInfo->ExceptionRecord->ExceptionInformation[0]));
    return EXCEPTION_CONTINUE_EXECUTION;
    }
    else
        return EXCEPTION_CONTINUE_SEARCH;
}

auto h = ::AddVectoredExceptionHandler(1, &OnException);

This is where I had to do a bit of guess work. If you recall, this handler will get called for all exceptions. But we only want to do something when it’s an Assert exception. So I had to make the assumption that Assert threw an exception with the code 0xe3530001. Then I did a bit of sleuthing in the memory, to see that a pointer to the message was stored in the first index of the ExceptionRecord ExceptionInformation. With that I could grab the message and fail appropriately.  That being said, I’m not sure if this solution lines up 100% with Microsoft’s functionalities.

To summarize this long journey, I set out to set some things right with the behaviour of Microsoft’s CPP Unit Test Framework. It started out as something fun to investigate and it turned out to be a great learning experience. Out of all the projects that I’ve worked on, I’ve probably learned the most about ingenuity from this one. There are a lot of  neat tricks, cool uses of obscure APIs, and really just overall an interesting view of how Microsoft engineers their tools. You might be wondering to yourself if it was actually worth it. For me, it was about learning, it was about facing the challenges and working to understand them. It wasn’t every really truly about replacing what ships with Visual Studio. So yes, it was worth it. Though I would love it if Microsoft could fix the issues… If I can do it, they most certainly can do it.

Recapping the issues that I ended up solving:

  1. Report a better error on standard exceptions [check]
  2. Report a better error for binaries that fail to load
  3. Support Test Names with Spaces [check]

As you can see, I only solved 2 of the 3 things I set out to solve! The last one is kind of a cop out, because I sort of just lucked into it. When I was messing around with the class attributes, I enhanced my framework to understand a special attribute, to be able to change the test name. Other than just getting it from the method name. So you could specify a test name with a space.

Reporting a better error when binaries fail to load — this is really hard. What makes it hard, is that there isn’t (that I can find) a good API to report the missing dependency. This is something you need to hand roll. Now, there are tools that do it. Specifically Dependency Walker. But, my understanding is that I would need to roll my own dependency walking algorithm. This unfortunately will be a story for another day.

I really hope you enjoyed reading this series. I had quite a bit of fun working on this project, and a lot of fun writing about it.

“What we call the beginning is often the end. And to make an end is to make a beginning. The end is where we start from.” — T.S. Elliot

Happy Coding!

PL

 

References:

MSDN on Structured Exception Handling

https://www.codeproject.com/Articles/2126/How-a-C-compiler-implements-exception-handling

The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 2)

If you haven’t already read Part 1, of this series, then I suggest giving it a skim. If not, I’ll give a quick TL;DR;

A few years ago, I was frustrated with some of the idiosyncrasies of Microsoft’s C++ Unit Test Framework. I set out on a mission to develop a custom test adapter, to solve my problems. It would essentially replace the stock adapter that lives in Visual Studio’s Test Explorer window. There would be no difference to writing tests, the only differences would be in the execution of the tests. Things like capturing standard C++ exceptions and displaying the text, and also understanding why binaries were failing to load. It was a bit of a mission. Part 1 of this series, dives into the mechanisms that Microsoft has developed to expose metadata about the tests. A C++ semi-reflection of sorts, allowing inspection of the binary, without loading it, to discover executable tests. They did this using a set of macros, storing information about the test functions in special sections in the binary. That exercise in discovery, was very fun and exciting for me. But it didn’t stop there, we still need to figure out execution of these tests.

“I have no idea what I’m doing.”  I sat with my head in my hands. I was trying to reason out why the test executor was crashing the engine when test asserts failed. 

Writing a test adapter that plugs into Visual Studio is a relatively simple task. You need a managed (.NET/CLR) library, that publicly exposes two interfaces.

public interface ITestDiscoverer
{
     void DiscoverTests(IEnumerable<string> sources, IDiscoveryContext discoveryContext, IMessageLogger logger, ITestCaseDiscoverySink discoverySink);
}

public interface ITestExecutor
{
    void Cancel();
    void RunTests(IEnumerable<string> sources, IRunContext runContext, IFrameworkHandle frameworkHandle);
    void RunTests(IEnumerable<TestCase> tests, IRunContext runContext, IFrameworkHandle frameworkHandle);
}

The first interface is used to populate the Test Explorer with the test information. If you’re unfamiliar with the Test Window in Visual Studio. You should get familiar with it, it’s your friend and can be a great teacher if used correctly. You can display it by selecting Test -> Windows -> Test Explorer. It’s come a long way since Visual Studio 2012, and I’m sure Microsoft will be enhancing it in further versions. There’s a reason Microsoft is investing in this Unit Test technology. Unit testing is an important part of software development. With a few C# attributes, sprinkled with some reflection you could easily craft your own managed test executor. Then you describe what types of files your discoverer discovers, and you implement this function. It will get called after a successful build (I’m not sure how live testing affects this.), telling your discoverer a list of files to discover tests in. Your discoverer, should then load the files, look for tests, then send the Test Cases to the Discovery Sink, which will then have those tests displaying in the Test Explorer window. From the last post, you can see how we could implement the ITestDiscoverer interface, in C++/CLI and then use the library we created to walk the binary searching for test cases. So I won’t go into detail on that.

The next actual hurdle, is with execution of the tests, this is done with the ITestExecutor interface. Again, I will leave it up to your imagination, or you can look at my project here to see how this gets tied into the managed world of Visual Studio. I will be describing how we dive into the actual native execution of these tests.

If we step behind the curtains, and think about really what execution of a test is. It’s just a fancy way of executing a function (or method if you prefer) on a class. There is some ‘execution engine’ which is the process in which this will live and occur, and that process will load in your library (or executable for that matter), instantiate one of your test classes, then execute the method or ‘Test Case’ on your class. This is a bit of an over simplification, but for all intents and purposes, that is the ‘magic’ behind the Test Explorer window. Now, if you’re doing shared libraries or DLLs on Windows in C++, there are two ways to call exported functions. The first method, is to use project referencing (or include the header, and add the .lib file path) and allow the compiler and linker to do the work. Another approach is to use ::LoadLibrary and dynamically load the binary yourself. The downside to using the compiler and linker, is that you have to know the classes and libraries at compile time. But it takes care of all the details of loading the library, and linking for you. The benefit to using ::LoadLibrary, is that you’re not tied to binaries at compile time. You could use this, along with some interfaces, to create a plugin system. The downside is there is only implicit contracts between the libraries you’re loading and the application. The compiler cannot enforce that the plugin is implemented correctly.

Our test discoverer and executor, is in essence a plugin which loads plugins. Each of the test libraries, are in fact plugins exposing an interface that we want to call on. So, we need to use a method where we dynamically load the libraries at run-time. When you’re doing dynamic loading of DLLs, it isn’t enough to simply load the DLL into your process. You have to know what and where you want to call. With C++, using classes, this concept gets harder to see. So, as I love to do, we will simplify and go back to an age where these things were more straight-forward. The age of C.

Let’s imagine we were going to write some C application, that would load different plugins to print some text. This is a relatively simple problem, and it will illustrate the point. The design is simple, we could have two components. We would have our executable “Print Application” and a set of “Printer Driver” plugins or DLLs.

// Printer Application 
#include "Windows.h"

// a function pointer, describing the signature of our plugin print function
int (*PrinterFunction)(char*, int);

// our print function
int print(char *driverPath, char * textToPrint, int length)
{
    // We want to load our driver
    HMODULE library = ::LoadLibrary(driver);
    if(library == NULL)
        return 0; // we failed to load, can't do nothing.
    
    // We want to call our drivers print function
    PrinterFunction printFunction = (PrinterFunction)::GetProcAddress(library, "printText");
    if(printFunction == NULL)
        return 0; // no function in the DLL, can't print.

   // finally print.
   return printFunction(textToPrint, length);
    
}

int main(char **argv, int argc)
{
   int printed = print(argv[0], argv[1]);
   printf("You printed %d bytes to %s", printed, argv[0]);
}

 

// Old Brother Printer Driver
#include "OldBrotherPrinter.h"

// The __declspec(dllexport) here, tells the compiler to expose the function
int __declspec(dllexport) printText(char *textToPrint, int length)
{
   PRINTER p = ConnectToOldBrother("COM1");

   return SendBytesToPrinter(p, textToPrint, length);
}

If you were to compile and run this, you would get an .exe and a .dll. One is the application itself and the other is our plugin printer library. When we run our application, we can give it the path to our OldBrotherPrinter.dll and some text, and it should print our text.

There are two very important things to note here. The first is the function pointer that we’ve declared. This means that we know the signature of the function that we want to call. It’s a function that takes a character pointer, and an int as arguments, then returns an int. The second part is that we know the name, “printText”. Now, if the library doesn’t expose a function called “printText” we can’t get the address to it. If it’s not the same signature, we’re going to have a bad time calling it. There are some implicit contracts between the caller and the library implementer. The call to ::LoadLibrary, will load the binary into our memory space. The ::GetProcAddress call, will find the address to that function in our memory space, so that we can make a call on it. We need to cast the return to our known signature, so that we can call it with our arguments. The take-away from this exercise, is that we need to know the name of the function, and the signature, to be able to load and call it on a loaded library.

The reason that I needed to explain this using C, is because it is less complex than C++. As we know, in C++ there are things like classes, and more importantly, function overloading. In plain C, you could see the function name was exported as the name “printText”. This is because in C, you can only have ONE function named “printToText”. In C++, we have the concept of function overloading. Allowing us to do something like.

// Printer Functions.
#include <string>

int printToText(int someInteger);
int printToText(const std::string &someText);
int printToText(char someCharacter);

If you’re asking, well how can that be, they’re all named the same, how can you differentiate them? That’s the right question. This is done by something called ‘name decoration’.  The function names, really look more like this, from MSVC 19, in Visual Studio 2019.

// Decorated function names
?printToText@@YAHH@Z
?printToText@@YAHABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z
?printToText@@YAHD@Z

Now, when you mix in classes, the decoration gets a little bit more complicated. Code like this.

// Printer Class
#include <string>
class PrinterClass
{
    const std::string myData_;
public:
    int __declspec(dllexport) printToText()
    {
        return 0;
    }
};

Will result in a decorated function named something like this.

?printToText@PrinterClass@@QAEHXZ

If you somehow could know the decorated name, then you could load that function by its decorated name. Alright, at this point you’re probably thinking. Are we ever going to get back to talking about C++ Unit Test Framework? Realistically, that’s what you came here to read. However, this is really important background. I hope you can see the foreshadowing. If not, I’ll lay it out on the table.

In order to dynamically load a library, and make a call into it. We need to know three things.

  1. The name of the library
  2. The name of the function we want to call
  3. The signature of the function we want to call

So, knowing we need those things. I hope you’re asking yourself, which ones do we have? Which ones don’t we have, and how do we get them? Well I can tell you, we have 1 and we have 3.  What we are missing, is the all important name of the function we want to call.  The framework will supply us the name of the binary, we know the function signature is ‘void (void)’ so, we just need the name of the function we want to call.

Huh. How the heck are we going to get that? A user can name the function whatever you want to name it. We also have that added pain that the functions live in classes, which the user can also name. Stumped yet? Yeah — me too. When I’m stumped, I go back to the drawing board. In this case, let’s go back to reviewing that “CppUnitTest.h” file. Do you recall back to when we looked at the TEST_METHOD macro? If not, it looks like this.

#define TEST_METHOD(methodName)\
static const EXPORT_METHOD ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo* CALLING_CONVENTION CATNAME(__GetTestMethodInfo_, methodName)()\
{\
    __GetTestClassInfo();\
    __GetTestVersion();\
    ALLOCATE_TESTDATA_SECTION_METHOD\
    static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};\
\
    static ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo s_Info = {::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo::TestMethod, NULL, &s_Metadata};\
    s_Info.method.pVoidMethod = static_cast<::Microsoft::VisualStudio::CppUnitTestFramework::TestClassImpl::__voidFunc>(&methodName);\
    return &s_Info;\
}\
void methodName()

You can see, that one of the macros that is being used is the FUNCTION, and another FUNCDNAME. Well, we know FUNCTION will give us the un-decorated name of the function, maybe FUNCDNAME would be a decorated one? Thank you Microsoft documenation!

‘__FUNCDNAME__ Defined as a string literal that contains the decorated name of the enclosing function. The macro is defined only within a function. The __FUNCDNAME__macro is not expanded if you use the /EP or /P compiler option.

This example uses the __FUNCDNAME____FUNCSIG__, and __FUNCTION__ macros to display function information.’

Well color me stoked, we just made the next tiny step. A decorated function name! But this macro is weird. Do you remember what the memory looked like?

0x07462D94  54 00 65 00 73 00 74 00 4d 00 65 00 74 00 68 00 6f 00 64 00 49 00 6e 00 66 00 6f 00 00 00 00 00 00 00 00 00 44 00  T.e.s.t.M.e.t.h.o.d.I.n.f.o.........D.
0x07462DBA  75 00 6d 00 6d 00 79 00 41 00 73 00 73 00 65 00 72 00 74 00 00 00 00 00 00 00 00 00 00 00 43 50 50 55 6e 69 74 54  u.m.m.y.A.s.s.e.r.t...........CPPUnitT
0x07462DE0  65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 3a 3a 6e 65 73 74 65 64 3a 3a 44 75 6d 6d 79 43 6c 61 73  estInvestigatorTest::nested::DummyClas
0x07462E06  73 3a 3a 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e 66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 00 00 00 00  s::__GetTestMethodInfo_DummyAssert....
0x07462E2C  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3f 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e  ....................?__GetTestMethodIn
0x07462E52  66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 40 44 75 6d 6d 79 43 6c 61 73 73 40 6e 65 73 74 65 64 40 43 50 50 55 6e  fo_DummyAssert@DummyClass@nested@CPPUn
0x07462E78  69 74 54 65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 40 40 53 47 50 42 55 4d 65 6d 62 65 72 4d 65 74  itTestInvestigatorTest@@SGPBUMemberMet
0x07462E9E  68 6f 64 49 6e 66 6f 40 43 70 70 55 6e 69 74 54 65 73 74 46 72 61 6d 65 77 6f 72 6b 40 56 69 73 75 61 6c 53 74 75  hodInfo@CppUnitTestFramework@VisualStu
0x07462EC4  64 69 6f 40 4d 69 63 72 6f 73 6f 66 74 40 40 58 5a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  dio@Microsoft@@XZ

Hmm, the decorated name is.

?__GetTestMethodIn
fo_DummyAssert@DummyClass@nested@CPPUnitTestInvestigatorTest@@SGPBUMemberMet
hodInfo@CppUnitTestFramework@VisualStu
dio@Microsoft@@XZ

It looks, like the function name is __GetTestMethodInfo_DummyAssert(). That’s not the name of our function? Our function was called DummyAssert. Color me confused. Looking back at the macro, now we can see it actually just macros out a metadata function, and then starts our function. So it was never really capturing our method name at all, it was capturing metadata about our function. Shoot! How do we call it now?

Ahhhhh! Time to breathe. Time to rack our brains. Time to dig deep.

Well — what exactly is this MethodMetadata for? They wouldn’t put it in for no reason. So it’s gotta be useful. If we look closely, and kind of expand the macros, removing some non-essentials, that function boils down to this.

static const ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo* __GetTestMethodInfo_DummyAssert()
{
    // removed lines above for simplicity.
    static ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo s_Info = {::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo::TestMethod, NULL, &s_Metadata};
    s_Info.method.pVoidMethod = static_cast<::Microsoft::VisualStudio::CppUnitTestFramework::TestClassImpl::__voidFunc>(&DummyAssert);
    return &s_Info;
}

We can see that they are statically allocating a MemberMethodInfo class, and then, they set something called a pVoidMethod member on that, then the return the address to that. The signature of the GetTestMethodInfo_DummyAssert, function returns a const MemberMethodInfo*. We can see now that we’re getting somewhere, this function captures a pointer to the actual test method. So the algorithm we want is something along the lines of.

1. Use our tool set to scan for the MethodMetadata
2. Load the library into memory
3. Load the __GetTestMethodInfo_X function, by its decorated name in the metadata
4. Call this function, to return us the MemberMethodInfo
5. Make our call on the method.pVoidMethod function pointer

Could it be so simple? Unfortunately, not. If you recall from our simple example, we used free functions. What I mean by free functions, is that they are functions that aren’t associated with any data. I’m sorry what? This has nothing to do with the problem at hand.

Yay! Another history lesson.  If we compare procedural programming vs. object oriented programming. We can look at procedural programming as a set of defined functions where we enter into one, and it calls another, and another so on and so forth. There is procedure to its logic. Where as with object oriented programming, we have this concept of a class or object, that gets instantiated, and has a set of methods that operate on it, they may act on other objects, so on and so fourth. Thus, it can be harder to follow having object A calling object B. etc. etc. However, the two principles aren’t all that different, if you look at it with a different perspective. You can actually model object oriented programming in a procedural language. You do this by creating some data model, and a set of functions that work against it. Consider this C-like code.

struct Person
{
   char *name;
   int age;
};

void construct(Person *p, char *name, int age)
{
     p->name = malloc(strlen(name));
     strcpy(p->name, name);
     p->age = age;
}

void printName(Person *p)
{
    printf("My name is %s", p->name);
}

void destruct(Person *p)
{
    free(p->name);
}

As you can see, this looks a lot like a simple C++ class. You have a constructor, destructor and a simple printName function. You notice that each function operates on a Person*. I didn’t invent this pattern, or discover it. In fact, this was the beginnings of C++. Of course, C++ has come a long way. But still at it’s core, classes or objects are a set of functions, that operate on a chunk of data. Class functions in C++, take a pointer to that data, the instance, as their first argument. When I said that we only worked on free functions, our library example only worked against functions that were not being called on object instances. Our void functions in our test methods, act against the test class instance. Therefor, we can’t just “call” the function outright, or bad things will happen, demons could fly out of our noses. We don’t want that. It has to work with an instance of our data, our class. So that means, we need to actually create an instance of our class first.

So, in order to do that we need to know what our class is. This is really getting complicated. Let’s look at that file again, to see if we can glean some details.

// This is a part of the VSCppUnit C++ Unit Testing Framework.
// Copyright (C) Microsoft Corporation
// All rights reserved.

///////////////////////////////////////////////////////////////////////////////////////////
// Macro to define your test class. 
// Note that you can only define your test class at namespace scope,
// otherwise the compiler will raise an error.
#define TEST_CLASS(className) \
ONLY_USED_AT_NAMESPACE_SCOPE class className : public ::Microsoft::VisualStudio::CppUnitTestFramework::TestClass<className>

...

#pragma pack(push, 8)
    struct TestClassInfo
    {
        TestClassImpl::__newFunc pNewMethod;
        TestClassImpl::__deleteFunc pDeleteMethod;

        const ClassMetadata *metadata;
    };
#pragma pack(pop)

...

template <typename T>
class TestClass : public TestClassImpl
{
    typedef T ThisClass;

public:
    static TestClassImpl* CALLING_CONVENTION __New()
    {
        CrtHandlersSetter setter;
        return new T();
    }

    static void CALLING_CONVENTION __Delete(TestClassImpl *p)
    {
        CrtHandlersSetter setter;
        delete p;
    }


    // assume method matches this pointer
    virtual void __Invoke(__voidFunc method)
    {
        typedef void (ThisClass::*voidFunc2)();
        voidFunc2 method2 = static_cast<voidFunc2>(method);

        CrtHandlersSetter setter;
        (static_cast<ThisClass *>(this)->*method2)();
    }

    static EXPORT_METHOD const ::Microsoft::VisualStudio::CppUnitTestFramework::TestClassInfo* CALLING_CONVENTION __GetTestClassInfo()
    {
        ALLOCATE_TESTDATA_SECTION_CLASS
        static const ::Microsoft::VisualStudio::CppUnitTestFramework::ClassMetadata s_Metadata = {L"TestClassInfo", reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__)};

        static const ::Microsoft::VisualStudio::CppUnitTestFramework::TestClassInfo s_Info = {&__New, &__Delete, &s_Metadata};
        return &s_Info;
    }

    static EXPORT_METHOD const ::Microsoft::VisualStudio::CppUnitTestFramework::TestDataVersion* CALLING_CONVENTION __GetTestVersion() 
    {
        ALLOCATE_TESTDATA_SECTION_VERSION
        static ::Microsoft::VisualStudio::CppUnitTestFramework::TestDataVersion s_version = { __CPPUNITTEST_VERSION__ };

        return &s_version;
    }
};

Here, we see the same pattern. This method __GetTestClassInfo(), has ClassMetadata which has the decorated name to the __GetTestClassInfo() method. We can load that method and call it. From there this TestClassInfo object, has pointers to a __newFunc and __deleteFunc. This was the key to unlocking our success! We can see the finish line now. The macro, TEST_CLASS, ensures that you derive from the template class TestClass<T>. It uses CRTP to be type aware of our class, and defines two static functions. One that returns a TestImpl* called __New(), which creates a new instance of T (our type) and the other deletes it __Delete(TestImpl*). It also defines a function called __Invoke(__voidFunc) which invokes a void method, against ‘this’. TestImpl is defined as.

// This is a part of the VSCppUnit C++ Unit Testing Framework.
// Copyright (C) Microsoft Corporation
// All rights reserved.

class TestClassImpl
{
public:
    TestClassImpl() {}
#ifdef FEATURE_CORESYSTEM
    virtual ~TestClassImpl() {}
#else
    virtual ~TestClassImpl() noexcept(false) {}
#endif

    typedef TestClassImpl* (CALLING_CONVENTION *__newFunc)();
    typedef void (CALLING_CONVENTION *__deleteFunc)(TestClassImpl *);

    typedef void (TestClassImpl::*__voidFunc)();

    virtual void __Invoke(__voidFunc method) = 0;

protected:
    struct CrtHandlersSetter
    {
    typedef void (__cdecl *INVALID_PARAMETER_HANDLER)(const wchar_t* pExpression, const wchar_t* pFunction, const wchar_t* pFile, 
    unsigned int line, uintptr_t pReserved);

        CrtHandlersSetter()
        {
            if(IsDebuggerAttached())
            {
                debuggerAttached = true;
                return;
            }
            
            debuggerAttached = false;
            // Suppress the assert failure dialog.
            oldReportMode = _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
            oldReportFile = _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
            // Set the handler
            oldInvalidParameterHandler = _set_invalid_parameter_handler(reinterpret_cast<INVALID_PARAMETER_HANDLER>(InvalidParameterHandler));
        }
        
        ~CrtHandlersSetter()
        {
            if(debuggerAttached)
            {
                return;
            }
            
            _CrtSetReportMode(_CRT_ASSERT, oldReportMode);
            _CrtSetReportFile(_CRT_ASSERT, oldReportFile);
            _set_invalid_parameter_handler(oldInvalidParameterHandler);
        }

private:
        // Check if a debugger is attached.
        __declspec(dllexport) static bool __stdcall IsDebuggerAttached();
       // The handler for invalid parameters
        __declspec(dllexport) static void __cdecl InvalidParameterHandler(const unsigned short* pExpression, const unsigned short* pFunction, const unsigned short* pFile, 
    unsigned int line, uintptr_t pReserved);
    
private:
        _invalid_parameter_handler oldInvalidParameterHandler;
        int oldReportMode;
        _HFILE oldReportFile;
        bool debuggerAttached;
     };
};

So __Invoke(__voidFunc), is a pure virtual function, that will allow us, using the miracle of polymorphism, to make the call into our class, from just a pointer to TestImpl.

You can see that we have everything we need.

1. Load the binary metadata, and find the test method we want
2. Determine which class it exists in
3. Load the ClassMetadata, to get the decorated name of the __GetClassInfo() function
4. Call __GetClassInfo() to retrieve the class info
5. Use the pointer to __New(), to create an instance of the class
6. Use the decorated GetMethodInfo() name, to call GetMethodInfo()
7. Use the method info, to __Invoke(__voidFunc), on the instance of TestClassImpl we created earlier.
8. Success.

So, I went and did just that. I made the test executor do the steps above. It invoked the method, the test ran. I was over the moon. It was time to fix my first issue. I hurriedly wrapped the __Invoke(__voidFunc) call in a try/catch. Like this

// MsCppUnitTestAdapter.cpp:210 
void VsTestAdapterExecutionContext::ExecuteMethod(System::String ^methodName, TestResult ^r)
{
    ResultRecorder cb(r); 
    static_cast<ResultReporterExceptionHandler*>(handler_)->Reset(&cb);
    try
    {           
        auto className = context_->Info().GetClassNameByMethodName(MarshalString(methodName));
        // load the test class, from the method name, this will load the class into the execution context, but it's name if it doesn't exist, it'll load it.
        TestClass_ *tc = nullptr;
        if (!stdx::find_if(*classes_, tc, FindByClassName(className)))
            tc = context_->CreateClass(className); // if you didn't call load, you won't get the class initialize.

        tc->Reset(); // this will reset the class i.e. create a new instance
        tc->InvokeMethodSetup();
        cb.OnStart();
        tc->InvokeMethod(MarshalString(methodName));
        cb.OnComplete();
        tc->InvokeMethodCleanup();
    }
    catch (const std::exception &e)
    {
        cb.OnError(System::String::Format("Uncaught C++ exception. {0}", gcnew System::String(e.what())));
    }
    catch (...)
    {
        cb.OnError("Unknown C++ Exception");
    }
    static_cast<ResultReporterExceptionHandler*>(handler_)->Reset();
    
}

I didn’t go into any detail about how I ended up getting the class name from the function name. The simple answer is that I parse the class name, from the function name. I also didn’t go into detail, about setup / teardown of the classes and modules. The above snip does some of that, as well does some reporting about the tests. I’ll admit it’s a bit messy, but it works. You can see that I capture std::exceptions and print the error. Now, if a C++ std::exception escapes the test method, my framework will catch it and print the error.

By this point, I was over the moon. I had done something harder than I thought I could and I really pushed my understanding of how these things worked. I had run some tests, and I was getting green lights in the Test Explorer window. I had let some std::exceptions escape, and saw the tests were failing correctly. I made sure the exception information was displayed in the test window. Time to try some negative assertion tests. I setup a test that had a bad assertion, something like.

TEST_METHOD(willAssert)
{
    Assert::AreEqual(3,4, L"They're not equal"); 
}

Each time I ran ‘willAssert’, the test would stay semi-opaque, as if I hadn’t run it at all. When I watched the Task Manager, the test execution engine process would disappear when I ran the test. Oh no. 

I put my head into my hands. I have no idea what I’m doing.

I hope that Part 2 of this series was equally as entertaining as the first part. I actually really loved putting the execution part of this code together. It was such a puzzle. Stay tuned for the next piece of the puzzle, where we explore Structured Exception Handling.

“Magic lies in challenging what seems impossible” — Carol Moseley Braun

Happy Coding!

References

Predefined Macro Definitions

 

 

The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 1)

Have you ever really been interested in Microsoft’s C++ Unit Test Framework? I mean really interested? Like where you’d go to great lengths to figure out how it works. Well, I have… The story goes back a few years, maybe 3 or so.

At this point in my career I was deep into C++ development, and I had fallen in love with unit testing. I had fallen in love with Visual Studio’s Test Window, the ease it allowed me to get quick feedback about my development. My defacto standard was the built-in C++ Unit Test Framework. It was accessible. I didn’t need to install anything (past VS), and I didn’t need to download anything special. The project templates came built in, it was the easiest path to unit testing my work. I loved it. However, as with many things in life, there are always the ‘little things’ you have to learn to love.

My main gripe, was that if I wrote a function and a std::exception would escape, for whatever reason, the test would fail with the message “An unexpected C++ exception occurred”. Thanks Microsoft, for this useless information… I put up with it. I would use tactics like wrapping my calls in try/catch, I even wrote a new header that made an extension to the TEST_METHOD macro that would make a function level try/catch. It wasn’t enough for me, I could not believe that this wasn’t built in. For instance, if an Exception escapes in the C# test framework, you get the data about the Exception. This is a novel idea, so why doesn’t it work in C++? My second major stumbling block, was that if you didn’t have the right dependencies in the right directory. You would get an error, on all your tests, something along the lines of “Failed to setup execution context.” Also, a very very helpful error. This was the straw that broke the camels back. The amount of times I had run into this, the amount of times that junior developers scrambled to figure it out. It was too much. Something had to be done. Rather than divorce myself from Microsoft’s framework, and use something like boost::test, like so many had said I should do. I decided to do the sane thing, and just write my own. Not write my own, like re-invent the wheel. Just write my own test executor. I already had all the tests written, I didn’t want to redo that work in some new framework. I wanted to just build my own engine to run the tests I already had. My thought was that if someone at Microsoft could build it, so could I. They’re humans too — I think. Armed with only naive curiosity, my trusty Visual Studio, and the internet. I set out to do just that.

Where do I even start? Let’s just start at discovering the tests. How can we discover what tests are available in the binary? Well, if you’re familiar with the C# Unit Test framework, defining test classes and methods is done with attributes, similar to the C++ macros. My thought is that the C# Test Discoverer, must use reflection, look for the attributes, discovering the test classes and methods. I don’t know this for sure, but I would bet that is the case. Cool. Well, apart from some third party libraries, there’s no built in reflection in C++. So that can’t be the case for the C++ tests, can it? Maybe they load the assembly and have it tell the discoverer what tests are available? That’s what I would do if I engineered this.

Stop for a minute. Let’s reflect.

I said that my second problem with the framework, was that when you tried to run the tests and the dependencies couldn’t be loaded, you would get the error “Failed to load execution context”. Now — let’s think about this. If you’re able to see all the tests, yet the assembly can’t be loaded due to missing dependencies. How are we able to see what tests are in the binary? Magic! That’s how. Just kidding — I don’t believe in magic. It means that they’re not “loading” the library, which means that information about the tests, lives somewhere in metadata in the binary… Reflection… Could it be???

Well, the magic was right there in front of us the whole time, if you’re using the framework. The magic lies in the ‘ CppUnitTest.h’ header file. It took me a few beers, and a few hours to figure out just exactly WTF they were doing in there. It was essentially like trying to decipher Cuneiform .

If you’re unfamiliar, a typical TEST_CLASS and TEST_METHOD(s) looks like this.

#include "CppUnitTest.h"

TEST_CLASS(DummyClass)
{
    TEST_METHOD(DummyAssert)
    {
        /// My Code Under Test Here
    }
};

If you build and discover this, you’ll end up with a test class named DummyClass and a test in your Test Window, that says DummyAssert. So the magic lives in that TEST_METHOD macro. We will ignore the TEST_CLASS for now. Let’s look at TEST_METHOD. This is the macro, pulled directly from ‘CppUnitTest.h’

///////////////////////////////////////////////////////////////////////////////////////////
//Macro for creating test methods.
#define TEST_METHOD(methodName)\
    static const EXPORT_METHOD ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo* CALLING_CONVENTION CATNAME(__GetTestMethodInfo_, methodName)()\
    {\
        __GetTestClassInfo();\
        __GetTestVersion();\
        ALLOCATE_TESTDATA_SECTION_METHOD\
        static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};\
\
        static ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo s_Info = {::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo::TestMethod, NULL, &s_Metadata};\
        s_Info.method.pVoidMethod = static_cast<::Microsoft::VisualStudio::CppUnitTestFramework::TestClassImpl::__voidFunc>(&methodName);\
        return &s_Info;\
    }\
    void methodName()

Okay — so humour me and let’s ignore the __GetTestClassInfo(); and __GetTestVersion(); calls and look to the line ALLOCATE_TESTDATA_SECTION_METHOD, which if we scan a little higher in the file is found here.

///////////////////////////////////////////////////////////////////////////////////////////
//Macros for creating sections in the binary file.
#pragma section("testvers$", read, shared)
#pragma section("testdata$_A_class", read, shared)
#pragma section("testdata$_B_method", read, shared)
#pragma section("testdata$_C_attribute", read, shared)

#define ALLOCATE_TESTDATA_SECTION_VERSION __declspec(allocate("testvers$"))
#define ALLOCATE_TESTDATA_SECTION_CLASS __declspec(allocate("testdata$_A_class"))
#define ALLOCATE_TESTDATA_SECTION_METHOD __declspec(allocate("testdata$_B_method"))
#define ALLOCATE_TESTDATA_SECTION_ATTRIBUTE __declspec(allocate("testdata$_C_attribute"))

But what does it all mean Basil? Well, without diving into too much history, we need to at least know about Windows’ binary formats. If you didn’t already know, every form of executable “binary” on the Windows platform is in the format of a Portable Executable (which was an extension of the COFF format). This is what allows the operating system to load and run executables, dynamic libraries, etc. It’s a well defined format, see the Wiki link above if you don’t believe me. A PE file looks like this.

Portable_Executable_32_bit_Structure_in_SVG_fixed

I’m not going to explain everything in this image, only the relevant information. If you look down just passed the DOS STUB, on the very right (your right) you’ll see a 2 byte number called #NumberOfSections, this tells us the count of sections in the binary. That’s something we care about. I know this, because I know they’ve made sections where the data lives. I know this, because of the

#pragma section("testdata$_B_method", read, shared)

and the

#define ALLOCATE_TESTDATA_SECTION_METHOD__declspec(allocate("testdata$_B_method"))

Then, if you look at the bottom, you’ll see the ‘Section Table’. It means, that from the COFF Header, the offset of the Optional Header, there lives N sections in the Sections Table. In there, we will find our “testdata$_B_method” section, and in there, we will find SOMETHING! Are you bored yet? Because when I got this far, you couldn’t pull me away. I was like a 13 year old watching my first R rated movie. What did they store in there? What was it used for? The only thing I could do, is dive a little deeper. My best bet, was that these MethodMetadata were stored in that section.

ALLOCATE_TESTDATA_SECTION_METHOD\
static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};

It would be a block of data, that would contain a bunch of strings. The first being a wide character string, “TestMethodInfo”, the next a wide character string of the ‘methodName’ defined in the macro, the next a character string of the __FUNCTION__, next the string of __FUNCDNAME__, a wide character string of the filename __WFILE__ , and lastly the __LINE__. (If you’re interested in a list of Predefined Macros there you go.)

This was my assumption, but I couldn’t know for sure unless I saw it with my own two eyes. But how do I do that? Well there are a few third-party tools that will dump the PE (I’ll let you figure out what to search…), but I needed to write my own tool anyways so I just jumped in feet first. A few quick Bing searches (just kidding I used Google), and I found out I needed to open the binary as a flat file, and then map that file into memory. From there, I could get a pointer to the start of the file and use some macros, structures and functions in Windows.h to move about this file.  The “pseudo” algorithm is as follows

1) Open the binary as a flat file
2) Map the binary into memory
3) Obtain a view of the map (a pointer to the map)
4) Navigate the PE to understand file type, and number of sections
5) Iterate through each section definition until we find the correct section
6) Using the mapping offset, along with the section definition, find our data

There we go, simple as that. Let’s try it. The memory when we do that, and we point to the section table, looks like this.

0x07440210  2e 74 65 78 74 62 73 73 00 00 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 00 00 e0 2e 74 65 78 74 00 00 00 82 15 02 00 00 10 01 00 00 16 02 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 60 2e 72  .textbss............................ ..à.text............................... ..`.r
0x07440262  64 61 74 61 00 00 c3 bd 00 00 00 30 03 00 00 be 00 00 00 1a 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 64 61 74 61 00 00 00 cc 0e 00 00 00 f0 03 00 00 0c 00 00 00 d8 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 c0 2e 69 64 61  data..Ã....0......................@..@.data...Ì....ð.......Ø..............@..À.ida
0x074402B4  74 61 00 00 86 29 00 00 00 00 04 00 00 2a 00 00 00 e4 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 6d 73 76 63 6a 6d 63 79 01 00 00 00 30 04 00 00 02 00 00 00 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 c0 74 65 73 74 64 61  ta...).......*...ä..............@..@.msvcjmcy....0......................@..Àtestda
0x07440306  74 61 49 06 00 00 00 40 04 00 00 08 00 00 00 10 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 50 74 65 73 74 76 65 72 73 09 01 00 00 00 50 04 00 00 02 00 00 00 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 50 2e 30 30 63 66 67 00 00  taI....@......................@..Ptestvers.....P......................@..P.00cfg..
0x07440358  04 01 00 00 00 60 04 00 00 02 00 00 00 1a 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 72 73 72 63 00 00 00 3c 04 00 00 00 70 04 00 00 06 00 00 00 1c 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 72 65 6c 6f 63 00 00 69 1a  .....`......................@..@.rsrc...<....p......................@..@.reloc..i.
0x074403AA  00 00 00 80 04 00 00 1c 00 00 00 22 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ...€......."..............@..B....................................................
0x074403FC  00 00 00 00 cc cc cc cc cc e9 86 36 01 00 e9 41 78 01 00 e9 9c 33 00 00 e9 b7 93 01 00 e9 b3 73 01 00 e9 7d b4 00 00 e9 68 a8 00 00 e9 13 62 00 00 e9 7e 76 00 00 e9 09 6f 01 00 e9 a4 59 00 00 e9 8f e4 00 00 e9 aa 9d 01 00 e9 98 73 01 00 e9 ce ab

So what gives??? I don’t see any section called “testdata$_B_method”.  I can however see a ‘testdata’ section. At this point no amount of research other than this anecdotal evidence, leads me to believe the ‘$’ is some kind of delimiter on the section name. I guess we have to assume. We assume that the “testdata” section will contain our test method metadata. The problem is now, there are other things that sit in this section. There’s class, method, and attribute metadata. So, if it’s all lined up in a single section, how do we decipher what is what? Meaning, if we’re just trying to use pointers to walk around, how will we ever know what type we’re pointing to?

Did anything strike you as odd about the MethodMetadata structure? Maybe, if I show you the structure definitions of all the metadata objects, you might see something.

struct ClassMetadata
{
    const wchar_t *tag;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
};

struct MethodMetadata
{
    const wchar_t *tag;
    const wchar_t *methodName;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
    const wchar_t *sourceFile;
    int lineNo;
};

struct ModuleAttributeMetadata
{
    enum AttributeType { MODULE_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const wchar_t *attributeValue;
    AttributeType type;
};

struct ClassAttributeMetadata
{
    enum AttributeType { CLASS_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const void *attributeValue;
    AttributeType type;
};

struct MethodAttributeMetadata
{
    enum AttributeType { METHOD_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const void *attributeValue;
    AttributeType type;
};

Huh. If you look carefully, the first actual member of these structures is a wchar_t* called tag. Then if we go to our use of it.

static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};

You might notice, that there’s a L”TestMethodInfo” set as the tag, so one could deduce, dear Watson that that is how we can decipher our different metadata components. By their tag! Let’s readjust our rudders, and fly! If we get our ‘testdata’ section, then move to the area with the data, we should see a bunch of nicely spaced wide strings in memory, right? Wrong!

0x07471120  94 43 03 10 b8 43 03 10 d8 43 03 10 40 44 03 10 f8 44 03 10 67 00 00 00 00 00 00 00 94 43 03 10 6c 46 03 10 98 46  ”C..¸C..ØC..@D..øD..g.......”C..lF..˜F
0x07471146  03 10 08 47 03 10 f8 44 03 10 78 00 00 00 00 00 00 00 94 43 03 10 cc 47 03 10 08 48 03 10 80 48 03 10 f8 44 03 10  ...G..øD..x.......”C..ÌG...H..€H..øD..
0x0747116C  7e 00 00 00 00 00 00 00 94 43 03 10 28 4b 03 10 68 4b 03 10 e0 4b 03 10 f8 44 03 10 8b 00 00 00 00 00 00 00 94 43  ~.......”C..(K..hK..àK..øD..........”C
0x07471192  03 10 c8 4c 03 10 08 4d 03 10 80 4d 03 10 f8 44 03 10 95 00 00 00 00 00 00 00 94 43 03 10 4c 4e 03 10 78 4e 03 10  ..ÈL...M..€M..øD..........”C..LN..xN..
0x074711B8  e8 4e 03 10 f8 44 03 10 9d 00 00 00 00 00 00 00 94 43 03 10 dc 4f 03 10 20 50 03 10 a0 50 03 10 f8 44 03 10 a7 00  èN..øD..........”C..ÜO.. P.. P..øD..§.
0x074711DE  00 00 00 00 00 00 94 43 03 10 70 51 03 10 98 51 03 10 08 52 03 10 f8 44 03 10 af 00 00 00 00 00 00 00 94 43 03 10  ......”C..pQ..˜Q...R..øD..¯.......”C..
0x07471204  78 53 03 10 b0 53 03 10 28 54 03 10 f8 44 03 10 bb 00 00 00 00 00 00 00 94 43 03 10 0c 55 03 10 48 55 03 10 c0 55  xS..°S..(T..øD..».......”C...U..HU..ÀU
0x0747122A  03 10 f8 44 03 10 c3 00 00 00 00 00 00 00 94 43 03 10 dc 56 03 10 08 57 03 10 78 57 03 10 f8 44 03 10 cc 00 00 00  ..øD..Ã.......”C..ÜV...W..xW..øD..Ì...
0x07471250  00 00 00 00 94 43 03 10 6c 58 03 10 b8 58 03 10 38 59 03 10 f8 44 03 10 d2 00 00 00 00 00 00 00 94 43 03 10 30 5a  ....”C..lX..¸X..8Y..øD..Ò.......”C..0Z
0x07471276  03 10 60 5a 03 10 d0 5a 03 10 f8 44 03 10 de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ..`Z..ÐZ..øD..Þ
0x0747110A  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

What the hell? I was 100% expecting there to be a string sitting pretty right there in my section. Back to the drawing board I guess… Let’s recap, the first member of that struct is a wchar_t *. Huh. The first member of the struct, is a wchar_t *. What does that even mean? It means, that it is a ‘pointer to a wchar_t’. A ‘pointer’ to a wchar_t! Oh right! A pointer! I remember from school that a pointer is just an address! So where we were expecting there to be some text sitting pretty, was garbage, Or so we thought. Wrong again. It is a POINTER! That means sitting in that location is an address! An address to where though? It should be an address to my string, but where on Earth (quite literally) would that address be? It has to be somewhere that is constant, right?

If we study the sections of a PE fie, there’s a section called ‘.rdata’. Microsoft defines this section as “Read-only initialized data”. Thinking back a moment these are magic strings, (heaven forbid), aka ‘static strings’, static is read-only. If I was to hazard a guess, it probably means that they live somewhere in that section, because the compiler has to put magic string somewhere… So that garbage number, is probably a pointer to a string somewhere in that ‘.rdata’ section. So, if we take that address, and adjust it for where the section data lies, we can find the “TestMethodInfo” string.

0x07462D94  54 00 65 00 73 00 74 00 4d 00 65 00 74 00 68 00 6f 00 64 00 49 00 6e 00 66 00 6f 00 00 00 00 00 00 00 00 00 44 00  T.e.s.t.M.e.t.h.o.d.I.n.f.o.........D.
0x07462DBA  75 00 6d 00 6d 00 79 00 41 00 73 00 73 00 65 00 72 00 74 00 00 00 00 00 00 00 00 00 00 00 43 50 50 55 6e 69 74 54  u.m.m.y.A.s.s.e.r.t...........CPPUnitT
0x07462DE0  65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 3a 3a 6e 65 73 74 65 64 3a 3a 44 75 6d 6d 79 43 6c 61 73  estInvestigatorTest::nested::DummyClas
0x07462E06  73 3a 3a 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e 66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 00 00 00 00  s::__GetTestMethodInfo_DummyAssert....
0x07462E2C  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3f 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e  ....................?__GetTestMethodIn
0x07462E52  66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 40 44 75 6d 6d 79 43 6c 61 73 73 40 6e 65 73 74 65 64 40 43 50 50 55 6e  fo_DummyAssert@DummyClass@nested@CPPUn
0x07462E78  69 74 54 65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 40 40 53 47 50 42 55 4d 65 6d 62 65 72 4d 65 74  itTestInvestigatorTest@@SGPBUMemberMet
0x07462E9E  68 6f 64 49 6e 66 6f 40 43 70 70 55 6e 69 74 54 65 73 74 46 72 61 6d 65 77 6f 72 6b 40 56 69 73 75 61 6c 53 74 75  hodInfo@CppUnitTestFramework@VisualStu
0x07462EC4  64 69 6f 40 4d 69 63 72 6f 73 6f 66 74 40 40 58 5a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  dio@Microsoft@@XZ

Voila! At last we finally found it. We found a ‘TestMethodInfo’ (if you’re wondering why it’s T.e.s.t.M.e.t.h.o.d.I.n.f.o…, it’s because it’s a wide character string so each character is 2 bytes. Unicode amirite?).

To recap, we’ve loaded the DLL into memory, mapped it and walked to the section. We taken the tag pointer, adjusted the address to find it in the ‘.rdata’ section, now we know that we’re looking at the MethodMetadata structure. So, we can take the original pointer, and cast that to a MethodMetadata object.

struct MethodMetadata
{
    const wchar_t *tag;
    const wchar_t *methodName;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
    const wchar_t *sourceFile;
    int lineNo;
};

Then, for each of the other members, which are pointers to strings in the .rdata section we can just adjust and capture the same way we did the tag. In fact, the compiler did us a favour, and laid them out so you can see them in the memory dump above. Next, we just advance our section pointer the distance of the size of a MethodMetadata, and we are able to get the next test name!!! (This is mostly true, I’ve glossed over some details of the other things that can be in the section)

I really hope you’re thinking “Damn! That was so much fun!” Because I definitely was! You can see now, that the steps to tie this into a test adapter aren’t too far away. I won’t get into that until a later post, but as for this post, we’ve uncovered how to discover what tests lie in a Microsoft C++ Unit Test Framework created test DLL. Wasn’t that just a blast digging into that?

I hope you will join me for the next part, where we figure out how to actually load and execute the tests from this metadata.

If you’re interested in seeing this code, my cobbled together project lives here. I apologize in advance for the code. I had a really bad habit of putting as many classes as I could in a single file. I don’t do this anymore. It was out of laziness.  The PeUtils.h/.cpp and CppUnitTestInvestigator.h/.cpp, in the CppUnitTestInvestigator project have the code for loading the DLL metadata.

“It’s still magic, even if you know how it’s done” — Terry Pratchett

Happy Coding!

 

** As a disclaimer, I don’t own any of the code snips above, they are directly pulled from the VsCppUnit C++ Unit Testing Framework, which is Copyright (C) Microsoft Corporation **

 

References:
PE Format
Section Specifications
DUMPBIN