Don’t try this at work – Vector Pt. 3

Welcome back everyone, for at long last, the last in this vector series. It’s been a long haul, and I appreciate everyone who has come along for the journey.

In light of the happenings recently in the United States of America, I’d just like everyone to take a moment of silence; for those victims of prejudice, hate, and discrimination. Humans are social creatures, and we all need to realize we are here not in-spite of each other, we are here because of each other. It is our duty, no matter your role in society, your job, your socioeconomic standing, to treat all other individuals with the same standard of decency, respect, and humility. We all face challenges, some more than others, and as humans we must empathize with our fellow human, regardless of ethnicity, gender, or sexual preference. It is our unique ability as humans to be self-aware, we should use this power for good, and not evil. I challenge you to pause for a moment and imagine what it would be like to live in someone else’s shoes, someone who has to deal daily with prejudice and segregation; if only for a moment. I have been blessed with many opportunities in my life, and am thankful everyday for that. No matter our position in life, it is never our right to deny others a chance at the opportunities we have had. In fact, it is our responsibility, to help enable those opportunities for others. We mustn’t allow this to continue. After the protests have dispersed, and the smoke has settled. We must remember this, and as a society we must change. For if nothing changes, nothing changes. #blacklivesmatter

Thank you.

The last time we left of, we had a pretty much working vector implementation. Then we wanted to enable it to be used with standard algorithms. it was a bit of a trick though, because it already worked with the algorithms, using a direct pointer into the memory. Instead we wrote an iterator class that was a bit safer, and allowed us a bit more control over the raw pointer. Then we learned a bit of funny business with SFINAE, to enable specific functions only when the arguments are iterators. It was a pleasant stride that we took, and we learned a few things along the way. Taking a look at insert allowed us to meander down the valley of memory allocation, and reorganization. Then we were able to stop and smell the SFINAE roses, though they often have an off scent. At the start of this journey, I’ll be honest — I wasn’t expecting it would be this much work. It is crazy to think about the amount of brain power that went into crafting our humble little standard vector, that we use daily. It puts in perspective the amount of work that goes into the STL alone. Although, it was the premise of this series to illustrate why it’s in our best interest to use the fantastic set of tools that ship with our compiler.

This adventure however, is coming to a conclusion. When I started, I had almost jokingly said I wanted to pit the vector we would create, against the MSVC standard vector. At that time of writing, I didn’t really know what I was going to do, (similarly with “use it with the algorithms”). It isn’t to say I didn’t have a direction, I just didn’t know what I was going to do. So I had to decide what exactly “pitting my vector” against Microsoft’s meant. Luckily, since it’s my blog, it’s my arena. Though if I didn’t do it justice, I knew that I would get down-voted into oblivion. LIKE THAT TIME I THOUGHT VIM COULDN’T NAVIGATE TO HEADER FILES. So, I had to actually put some brain power into this, other than just making some asinine competition up, and declaring that I win!

We really want to measure the amount of work the vector is doing. Often in computing the amount of work is measured by the amount of processing time. This is why we say an operation, or algorithm is fast or slow, it’s the amount of time it takes to complete. It follows, that the less a particular operation does, the faster it will be. The fastest program, is no program at all. However, it’s also the most useless program ever. The art of optimization, is to do less work while still having the same outcome. There are a few things that affect the actual timing of operations. These things range from processor speed, program load (i.e. how many programs are running simultaneously), memory consumption, etc. These though, are all environmental. Meaning that if you tested program A, in the same environment as B, they should have the same effect. The biggest affect on your operations overall speed, will be it’s algorithmic complexity. You might’ve heard of this, sometimes it’s called Big-O notation. That is a constant time operation. You’ll hear people say things like a binary search is Log(N), or that a linear search runs in linear time. Worst of all, an algorithm is exponential. Now, these “time” measurements, aren’t actual wall clock time. Though, they will affect the amount of ticks your algorithm takes to complete. They’re a relative time scale, based on the amount of work the algorithm has to do. If an algorithm is linear, it means that as the amount of work the algorithm must do grows, the time will grow linearly in relation to size of it’s input. When something is a constant time operation, it means that it has to do the same work, regardless of the size of input.

Let’s explain this in-terms of our vector; Let’s say we want to find an item in our vector. In the first case we want to find the 10th item. Okay, so regardless of ordering of our vector, if we want the 10th item, we have to do the same constant work. We need to get the start of the vector, increment it by 10, and we have the item. Okay, Okay. What about a vector of size < 10. So we have to check the size first. Regardless though of our vectors size, it’s the same work for 1, 10, 100, 1 000, 10 000, 1 000 000 000 items. This means it’s a constant time operation. You would see this in the wall clock time. If the operation took 10 ms for a vector of 10 items, it would take roughly the same 10 ms regardless of the size.

What if we wanted to find an item that matched some predicate? Assuming our vector is a vector<int> of random numbers, and we wanted to find say the value 10. What do we have to do? Well, we have to start at the beginning and iterate the vector, until we find the value 10. Now, all things being completely random, you may find the element right at the beginning, some where in the middle, or maybe it’s the very last element. Given one full iteration of the vector, you’ll be able to either a) find the element b) know it doesn’t exist. This is a linear, algorithm. The wall clock time, will grow roughly linearly as the size increases. Meaning that it should take roughly 10 times longer to find an element in a vector of 10 vs. a vector of 100, with a linear search. In practice however, if you measured this once, you might see something strange. Your vector of 10, might take more time than the vector of 100. Why is that? Well, it could simple be due to a roll of the dice. All things being random, you found the item at the end of the 10 item vector, and it was the first element in the 100 element vector. That’s why these time complexities are often times called the amortized complexity. With a linear search, the best case is the case where you found the element at the very start, and the worst case is at the end. The average case, will land somewhere in the middle. so if you were to measure searching a 10 element vector, vs. a 100 element vector, say 1000 times. You would see that on average the larger vector took longer, and if you were to graph it you would see it grow linearly.

Next up, the dreaded exponential growth. These are algorithms we want to avoid as developers, because they’re the type of algorithm that “works on my machine”, but grinds to a halt at scale. Because we test with 100 elements on our machine, and scale is 100 000 elements. These algorithms are insidious, and often come in the form nested loops. To continue our example, if you for instance wanted to search for a range of items, within your vector. If we call the elements we search for the needle, and the elements we’re searching the haystack. We could approach this problem as follows. Start by iterating the haystack, until we find an element that matches the first element in the needle. Then continue iterating both collections until we match the entire needle collection OR we do not match, and we start the process over again. Now, the reason these types of algorithms are insidious, is because they tend to be adequately fast enough, until you start getting into larger collections. So it is helpful to be able to spot these types of algorithms, before it’s too late. Each time you’ve got a nested loop where you have an unbounded end on each loop, you’ve got something that will grow exponentially.

Linear operations are often times good enough, meaning that on average, at scale, they’ll complete in an adequate amount of time. As long as this meets our timing requirements, and the user doesn’t go gray waiting, a linear algorithm will suffice. What about the times when they won’t? How can we do better? Well — when you’re dealing with completely random data, you can’t do better than linear or exponential, because the data is random. For instance, to search for an item in a random collection, you’re basically required to iterate the entire collection, without any shortcuts. How else can you be sure of your result? This means that in order to do better than linear, we have to add some structure to our data, so that we can make reasonable guarantees. This allows us to take short-cuts, and do less work.

Let’s go back to our search we were doing earlier. We want to find an element with the value of 10, in a vector<int>. Now, when we did this before, we operated on a random set of data. There was no order or structure to our data, so we were forced to inspect each element. Let’s instead assume the vector is sorted. Given this order, we can make reasonable assumptions that allow us to take shortcuts and do less work when we’re searching. Of course the linear search still works, but why would we do that when the data is sorted? Assuming an ascending sort, we could search in the following manner. Start at the middle element, is it our value? If not, is the value less than, or greater than the value we’re searching for? If it’s less than the value we’re searching for, and the list is in ascending order, than we know with certainty that the item resides in the top half of the list. This means in one operation we can discard half the list of items. We just discarded half the work. Now, if you take the same approach to the top half of the list, you’ve just discarded half-of-half. Keep doing this and you’ll find your item, or you’ll run out of items to search. As you can see, having the data in sorted order allows us to rapidly find the item, by means of divide and conquer. Each iteration of our search, we effectively remove half of the work. This is what is called a Logarithmic time complexity, because they divide the entire set in 2 for each step. These types of operations are often referred to as binary operations, because at any given point you either go left or right, depending on the value. Now, a sorted order allows this type of approach to a list of items, but you can structure your data in such a way i.e. a binary tree, that makes these types of binary operations very easy.

Now, you’re probably asking “well what about the time it takes to sort the vector”? Well, that does have a cost, and that’s where the balance of computing comes in. Maybe, adding items to your vector happens rarely, but you’re constantly searching it. So it makes sense to sort the collection and pay for that at initialization, and then use a faster algorithm for searching. Given your data set you might just want to manage a binary structure, where you pay a Log(N) cost for insertion, as well as for searching. It’s important to illustrate the point that structure and order, is what allows us to craft smarter algorithms, that allow us to do less work.

I’ve digressed. What I really wanted to cover was measuring our vector, against MSVC vector.

I thought a while (maybe 14 minutes) about what I wanted to measure. I felt that the only real operation that a vector did, was grow. Meaning that pretty much anything else, was effectively a constant time operation. If you want to get an element, or assign to a pre-existing slot, this is a constant time operation. So where are the operations that will take some time? Well, I know that there is work to be done when the vector grows. Due to the fact we have to allocate more space, and copy into that new space, then free the old space. So the operations we really want to time are adding an item (emplace) and inserting items (insert). When we insert, we want to test inserting at the beginning, middle, and end. Because we know when we implemented it, those operations all have different paths. You can read about this in the previous post.

I’ll be honest — I had planned to dedicate a significant portion of this post to benchmarking, and methods behind that. Though at this point with my algorithmic complexity digression, it will make this post a novel to get into that. If that’s of interest to you, let me know and I’ll try to write a follow up post about it.

Originally, my plan was to look into using a third party benchmarking tool. Something like Google Benchmark, or nonius. However, I couldn’t help my own curiosity. To be completely honest, I think one should not re-invent the wheel, at work. But at home, for fun, for posterity (yes — if I ever have children, I want to show them my library of useless recreations.), or for education, I’m a firm believer we should make attempts at it. So, for better or for worse, instead of using an off-the-shelf solution. I opted to write my own lightweight benchmarking tool. You can find it here.

What I chose to measures is the following:

  1. Emplace 10, 100, 1 000, 10 000, 100 000, 1 000 000 int values.
  2. Emplace 10, 100, 1 000, 10 000, 100 000, 1 000 000 struct w/ move ctor.
  3. Insert 1 000 into 1 000, 10 000, 100 000 int.
  4. Insert 1 000 into 1 000, 10 000, 100 000 struct w/ move ctor.

Once we do that, we can then analyze the results. This would be the very first step on the road of optimization. I won’t digress down that rabbit hole.

Here’s the graphed results of the different emplace measurements. On the x-axis, is the number of iterations the test was run, on the y-axis is the time in nanoseconds. The blue line represents the MSVC standard vector, and the orange line is the htk vector. Well — no surprise here, htk wins. No need to look any closer at those results.

Just kidding. Put your pitch-forks away. I don’t actually think that the htk vector is more performant than the MSVC vector. There’s perfectly logical explanations to why it is coming in faster than the STL vector. One plausible answer would be that the numbers are just way off, I’ve never claimed to be a mathematician or statistician. The code could be wrong, there’s a possibility that I’m timing more than I’d like to be timing. However, given that the code is identical for both cases, it should be consistent. I will say however, that there is large variance in the data.

nameiterationstotalmeanmedianminmaxs-dev
vector<int> emplace 101061200516025001700273007515.344
vector<int> emplace 101002682002162210015006000530.2405
vector<int> emplace 10100029931002391.522001600209001051.978
vector<int> emplace 1010000248184002022.0519001000190070019016.71
vector<int> emplace 10100000121941400997.82188006003753002574.748

With such a large standard deviation, it means are data is pretty widely spread, and using Bryce’s check to determine if the values are normally distributed. Hardly any of the tests were seemingly normal. That being said, I ran the measurements multiple times, and saw similar results so, let’s go with it being somewhat correct.

We can start to reason about why the htk vector might be faster than the standard vector? Well — truth be told, there was one test that I knew htk would have the upper-hand. The emplace 10 items test. Well — if you recall when we implemented our vector, the initialization of the vector allocated 10 slots. This means, that we started with enough room for 10 elements. The standard vector had to allocate this initial memory, so it was pretty safe to say we’d win there.

In two of the cases emplace 1000 and 10000, we didn’t actually win. If you look at the 100000 iteration trials for both tests, the timing was almost identical. So likely, at best for two of the cases we’re equal, which makes sense. To illustrate this further we’ll take the 100000 iteration trials, for each test 10 items, 100, 1000, 10000, 100000 items and graph that.

As we can see, the two are almost identical. Since our trial size is growing by a factor of 10 per test, it makes sense to plot the time on a logarithmic scale.

x- items emplaced; y-nanoseconds

This graph, shows pretty clearly, almost exactly what we expected. In the test with 10 elements, the htk vector is superior because it isn’t doing any allocations. The 100 element makes sense it would be slightly faster, because we saved the time from the first 10 allocation. After that, it stops mattering. The two are pretty much neck and neck. Another reason I suspect as to why the htk vector might have a slight upper hand, for the move_only object, is that it’s much less safe. The htk vector probably doesn’t even offer a basic exception guarantee, whereas the stl vector offers a strong guarantee, whereby if an exception is thrown while the collection is growing, it will effectively roll back to the state before trying to grow. With the htk vector, well, it doesn’t do that. It at least tries to clean up for a basic guarantee, but I didn’t really pay attention to ensuring it, so there could be cases where it does not offer that.

x- items emplaced; y-nanoseconds

Okay — so now for the embarrassment I think we were all expecting.

Overall — I’d say this is a blow out, the std vector is about 5x faster in all cases than our htk vector. So then we ask ourselves, why? Well, the answer lies back with what we did for emplace, and what we didn’t do for insert. Well, kind of. With the insert, when we need to grow if the original items, and they’re trivial items, we can just use memmove to very quickly move the contiguous data from one area of memory to another. Keeping in mind, this is trivial data which is what allows us to just copy the bytes. This definitely optimized our growth code for trivial types. However, when we implemented insert we used this code:

move_items(data_.first, data_.first + offset, next.first, copy_type{});
auto dest = first + offset;
while (start != fin)
    allocator_.construct(*(dest++), *(start++));
move_items(data_.first + offset, data_.last, dest, copy_type{});

You can see that we use our tag dispatch optimized function to move the items within our vector around, but when we’re inserting our new elements, we just construct them one by one. Instead of just moving them (read: memmove) into the newly allocated space. Now, you might be asking “how much slower could that actually be?” Well, in this case, it turns out to be roughly 5x slower, in fact. So we can change the code to something along the lines of…

move_items(data_.first, data_.first + offset, next.first, copy_type{});
auto dest = first + offset;
move_items(start.ptr(), fin.ptr(), dest, copy_type{});
move_items(data_.first + offset, data_.last, dest, copy_type{});

If you recall, when we wrote move_items, we took advantage of tag dispatching, to determine which function to use. The specific function that will get chosen will be the function that uses memmove.

void move_items(pointer first, pointer last, pointer &dest, memcpy_items_tag)
{
     memmove(dest, first, (last - first) * sizeof(T));
}

After changing the code, I re-ran the insert tests, and the results (unsurprisingly) looked like this.

You can see now, they’re both really fast, given that they’s within 200ns of each other. I’d say that they’re pretty much identical. The only justification I can make for the slight difference, is again the safety factor. We really don’t care about any exception guarantees, which for fun is okay, but don’t try it at work.

Okay — I’ll address it. I know it’s burning for some of you. That code I wrote above, doesn’t actually work. I mean — it works, for the test I had written, but not much else.

move_items(start.ptr(), fin.ptr(), dest, copy_type{});

This line here, has this weird .ptr() call on both the iterators. I can explain. Basically, for move_items, which is akin to the MSVC _Umove function. However, the MSVC functions work on iterators, and ours just so happens to work on pointers, from the legacy code we wrote in the first article. What does that mean? Well, it means that we directly pass the input to our move_items call into memmove. This was perfect at that time, because we had full access to the pointers. Meaning our iterator was a pointer. But now, now we’re working with iterators that aren’t pointers. Just so we’re on the same page, a pointer is an iterator, but an iterator is not a pointer. There is a level of indirection there, that allows iterators to work the way they do over pointers, but it means we can’t feed the iterator directly into the memmove function without the compiler yelling. But you just said that MSVC uses iterators in their function…. They do, but first they go through the process of undressing their iterators. No — wait, that’s not right. Unwrapping, their iterators. It’s basically the processes of removing the iterator facade so that we can in this case see the underlying pointer. But why didn’t you do that, Peter? Well — if I’m honest, as much as I love rabbit holes, and going down them to come out the other side. I’m probably almost as tired of working on this vector, as you readers are reading about it. It’s also another exercise in traits, and meta programming that I didn’t want to spend the time on. My bad. 🙂

I had a better use of that time. I’m going to try and keep my promise that I made in the beginning, and in the middle, and now as we approach the end. I still have not kept. I have not really talked at all about implementing OR timing an algorithm.

So — the other day, I was listening, or watching, I think listening to a CppCast Podcast, and Sean Parent was the guest. He was talking about how at Adobe one of the technical questions is that you have to implement a binary search, and then prove it. I thought, challenge accepted. Not that I’ll be applying at Adobe any time soon, but it’s a worth while exercise. Especially when it’s not for all the marbles, like a job at Adobe.

template <typename IteratorT, typename ValueT>
bool binary_search(IteratorT begin, IteratorT end, const ValueT &v)
{
    const int mid = (end - begin) / 2;
    if (mid == 0)
        return *begin == v ? true : ( *(end-1) == v ? true  : false);
    const auto midval = *(begin + mid);
    if (midval == v)
        return true;
    if (v < midval)
        return htk::binary_search(begin, begin+mid, v);
    else
       return htk::binary_search(begin + mid, end, v);
}

Okay — I did it. It’s a recursive binary search function. I know what you’re thinking, eww recursion. That’s just how I think about it. So the the next part, is to prove it by induction. Dun. Dun. Dun. Now, Sean mentioned that he had an individual provide 30 pages of documentation for the binary search. I’ll attempt to do one better…. 31 pages! Just kidding. I can barely muster 5000 words into a coherent blog post, let alone a 31 page proof.

Here it goes. Given that we have a ascending sorted collection. Our base case is a collection of 2 items, where the midpoint(end-begin)/2 will be equal to 0, when this is the case the value we’re looking for must be one of those two values, or does not exist. In the case where the collection is larger than 2, our midpoint is not equal to 0. There are three cases, if the value of our midpoint is equal to our value, it exists in our collection. If it is not equal, and the search value is less than the value of our midpoint then it can only exist in the bottom half of the collection. If the value is greater than the midpoint value, then it can only exist in the top half of the collection.

Well, it’s not much, and it’s hardly a mathematical proof, but it’s my best try. I think it illustrates the point of binary search as divide and conquer.

To close off this post, my plan was to illustrate the magic of the binary_search over the plain old linear search. However, like always, color me surprised. I went off an used the chronograph library to time the different search functions. I wanted to test std::find, std::binary_search, and htk::binary_search (recursive) to see how they fair across a vector of integers. So the test was as follows.

  1. Generate a random vector of size 10, 100, 1000, 10000, with values between 1-100
  2. Generate a random search value between 1 – 100
  3. Test 10, 100, 1000, 10000, 100000 times, each search function

As simple as this seems, it wasn’t as straightforward as I would have thought. My first hurdle was that my “random” generator function, wasn’t really random. The initial code looked like this.

template <typename T, typename VectorT>
VectorT random_numeric_vector(size_t size, T min = std::numeric_limits<T>::min(), T max = std::numeric_limits<T>::max())
{
    VectorT v;
    std::default_random_engine generator;
    std::uniform_int_distribution<T> distribution(min, max);

    for (size_t i = 0; i < size; ++i)
    {
        v.emplace_back(distribution(generator));
    }
    return v;
}

size_t next_random(size_t min, size_t max)
{
    std::default_random_engine generator;
    std::uniform_int_distribution<size_t> distribution(min, max);
    return distribution(generator);
}

A simple test case looked like this.

benchmark(s, "std::binary_search vector<int>(10)", [](session_run &r) {
    auto v = random_numeric_vector<int, std::vector<int>>(10, 1, 100);
    std::sort(v.begin(), v.end());
    auto value = next_random(1, 100);
    measure(r, [&v, &value]() {
        std::binary_search(v.begin(), v.end(), value);
    });
});

The result I was seeing, across the two searches were very, very strange. The linear search sat somewhere in the realm of 100ns, for every collection size. The binary search, was slower, somewhere in the range of ~150ns, and slightly higher for the larger collection, not much though. So I thought, what the heck? I chalked it up to something to do with optimization. So, I chased down some code to tell the optimizer to eff off, and I ran it again. Same result. Over an over. In hindsight, I probably should’ve checked the documentation, or been smarter. It took me putting a debugger on to look, and see. Every time I would generate a new “random” vector, it was the same random values. As well, each time I would generate a “new” random search value, it just so happened to be equal to the first element in the collection. Thus, my pseudo random number generator, using a new default_random_engine per call, was giving me the exact same set of random values. Which just so happened to make each random search value, not so random. In fact, it was the best case for a linear search, making it return in every iteration, and every vector size, immediately.

After all that. I fixed the issue by hoisting and reusing the default_random_engine, so I was getting a set of random values. So then back to timing it. After I ensured the values were random, and the test was definitely executing the code I wanted. I had hoped to see the results we talked about earlier. You know, where the linear search time grew exponentially as the collection grew exponentially. Then where the binary search time grew, and then as the collection got larger and larger, the time to search leveled off. A logarithmic graph.

Then this happened. What. The. Fox. Said? The one I least expected, std::find is fastest, almost in all cases. They’re basically equivalent up until a million items, where I would’ve expected the binary_search to prevail. The linear search, performing like that is very strange. Not only does it out perform both the linear searches (loop style & recursive), it also doesn’t require the input to be sorted, which on average took almost 33 ms to sort the 1 000 000 000 int vector. That is quite a bit of overhead to pay, just to be outperformed. What gives? This isn’t what my CompSci professor told me, they couldn’t have been lying could they? Obviously, it makes logical sense that a linear search, whereby you could potentially have to iterate all elements, would take more time than one where continually break the search in half. So, my professor wasn’t lying, and this result is just very counter intuitive. We can make a bet that std::find isn’t doing what we think. It just can’t be. Let’s prove that.

We are assuming it’s iterating the collection and checking, something like this.

template <typename IteratorT, typename ValueT>
bool linear_search(IteratorT begin, IteratorT end, const ValueT &v)
{
    for (; begin != end; ++begin)
    {
        if (*begin == v)
            return true;
    }
    return false;
}

Update 06.16.2020: After posting this story to reddit, it was pointed out by /u/staletic that std::find is not optimized by memchr. In fact, after I stepped through with the debugger, it was in fact, not doing this. I’m wondering if it’s just the sheer act of unwrapping the pointers to raw pointers that is making it that much faster. I’m going to run a test and see.

I ran the results of that test, and voila!

Size101001000100001000001000000
std::find249.341195.602205.812171.916266.472513.4
std::binary_search96.202129.227163.582206.146409.6851321.187
htk::binary_search108.538143.235161.051196.494504.581114.834
htk::linear_search85.808126.395421.9893577.37634773.36391704.3

So, the first graph looks a lot more like we would’ve assumed it to look, the linear search grows exponentially as our collection grows exponentially. It also gets trounced by even the recursive binary_search. Which even makes the cost of sorting the collection totally worth it. That said, if you’re able to not pay the sort cost, by maybe, maintaining sort order on insert. You have a very quick search on your hands.

So what gives with std::find, well the truth is in the details. It looks like for types that aren’t trivial types, it does a typical linear search, in the style we implemented. But, when it comes to trivial types, it uses a different mechanism. It uses a call to memchr, a C-api call that will find the pointer to a matching value in a set of data. Though I’m not sure how it’s implemented, I can tell you, it’s very quick. Meaning that if you have a collection of trivial types, it looks to be in your best interest to favor std::find for searching the collection, and not worry about having to sort the collection.

Alright — I think I kept my promise. That was long winded, but a surprisingly interesting journey. We scratched the surface of micro-benchmarking, by understanding how to apply it to the work a vector does. As well, we took a stab at implementing (and proving) a binary_search algorithm. Then, we got to see it in action, and how it performed (or should I say got out performed) against std::find and std::binary_search, then we saw how a true linear search really compares to a binary search, when they’re playing the same game. Overall, this was a very interesting walk down vector trail. I hope you enjoyed the read, and look forward to seeing you next time!

Until then, stay healthy, and Happy Coding!

PL

A sudden bold and unexpected question doth many times surprise a man and lay him open.

Francis Bacon

References
https://www.youtube.com/watch?v=zWxSZcpeS8Q&feature=youtu.be
https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=2441
https://www.bfilipek.com/2016/01/micro-benchmarking-libraries-for-c.html
https://github.com/google/benchmark/blob/c078337494086f9372a46b4ed31a3ae7b3f1a6a2/include/benchmark/benchmark.h
https://github.com/rmartinho/nonius

Don’t try this at work – Vector Pt. 2

How’s y’all summer bodies lookin’? Mine’s lookin’ like I got a nice personality. — Random Reddit Post

I hope all is well with each, and every one of you. This has definitely been a stressful, and mentally trying time. Shortly after my last post, the monitor on my laptop failed, and I had to replace it! I felt like a mechanic who used to work on old cars, and had to do work on a Tesla. I would never claim to be a brain surgeon, but….. Needless to say, my laptop screen is working now. I may be going a bit batty, as you might tell from this post. I just can’t wait for the day I get to hit the climbing wall again, and boulder me some V0s. Anyways — enough of the pleasantries, let’s to get to the code.

So last we left our vector, we were able to emplace into it. We used a simple growth algorithm which expanded the capacity of the vector, when we were pushing a new value and there wasn’t enough space. We used “tag dispatching” to ensure the compiler chooses the correct constructor based on the type of the object. To do this we built the tools that allow us to look at the traits of the types, and we even learned about some neat compiler intrinsics! After I posted the article, I had a lot of great feedback, and some really in-depth discussions about an interesting take on implementing the vector. I also learned that a fair few Redditors actually use Vim for C++ development! The message was definitely received that my humor could be misinterpreted. I even contemplated for a moment, just a moment, about developing this second part of the series with Vim. Then I got stuck trying to amend a git commit, and I remembered that I’d be dead before I’d be able to figure out how to do that. It turns out you can’t teach this old Visual Studio dog new tricks.

Up next in the series was to make our vector work the STL algorithms, what I really should’ve written was some. Because it’s a huge list, and I’m not that ambitious. So I’m going to make sure it works with some algorithms, and if I have time look at implementing some algorithms.

To start, what do I mean by the STL algorithms? Well, I mean the set of algorithms defined in the header <algorithm>. If you haven’t heard of this, or don’t know of it. I strongly suggest you meander on over to cppreference.com (after you’re done reading, of course) and take a look at this useful set of functions.

The algorithms library defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. Note that a range is defined as [first, last) where last refers to the element past the last element to inspect or modify.

cppreference.com

As the quote says these algorithms include a host of different operations you would want to perform on a collection, from searching, to sorting, to set partitioning. If you have an operation you want to perform on a vector or range, it more than likely already exists. So use the real thing at work, and leave the tool building to the professionals! However, that doesn’t stop us from having a little “weekend fun”, and looking at how some of these are implemented! I’ve digressed though, we’ll pick this up a little later on. I’m not one for linear plot-lines, I’m more of a Guy Ritchie fan myself.

Back to our vector — the poor thing thinks we’ve abandoned it. Last we left, it was a relatively useless container that only grew from the back. Which is all well and good, if you want to grow linearly at the back. It’s pretty useless however if you’re looking to insert anything into the middle, or trying to merge two vectors by inserting right into left. So we have to do a little work on our vector before it’s a useful container.

What do we need to do to insert? What does it mean to insert ? Well obviously, when we wrote the emplace_back function we were effectively doing a kind of insert. We were just inserting a single element, to the back of our vector. Which is pretty straightforward, if you have enough space, construct the element in place. If not, make more space, move/copy elements to the new space. Then construct the element in place. However, this is only one of the three cases of insert. When we’re inserting elements into our vector, we have three options. First, at the very front. Next in the middle. Then lastly, we could insert at the very back of the vector, which we’ve already covered.

So, with a little foreshadowing, we’ll move to the next item in the list, inserting in the middle. If we look at what made inserting at the end so easy, it was that there was never anything in our way. Sure, if we ran out of space we had to reallocate a new chunk of memory, and move things over. But nothing was ever in our way, meaning, the targeted seat we wanted to put our item never had someone sitting in it. Because we want to insert into the middle, we obviously have items in our way. Let’s use a musical chair analogy, while we attempt to figure out how to insert into our vector.

We’ll start out with our empty, 10 slot vector. It has enough space for 10 people. You can see the three pointers we manage, begin, end, and the capacity pointer. Then along comes someone to sit down.

Our begin pointer still points to the same spot, and our end is incremented. This is the placement at the back, that we worked on in the last post. Now, along comes another 5 people.

We still have capacity, so they sit at the end with no issue.

Then a group of three comes to the party, they insists on sitting starting at the 3rd slot (2nd index). What do we do?

We have to shift everyone down, starting at the blue pointer, to the red pointer (next end). Now, the chairs starting at the current end pointer, to the red pointer, are all fresh. Meaning no one has sat in them, so we don’t have to worry about them. We can just copy them forward, as we would normally, if they’re movable, we’ll move construct them. Otherwise they’re copy constructed, into their new chairs. The reason we need to do this, is because those slots are uninitialized, meaning the memory has been allocated, but we haven’t constructed anything there yet. So to figure this out, we calculated the distance between where we need to go (our new end), and our old end. That told us that three slots are free slots that need to be initialized.

So, now we’ve got 3 empty slots next to where the new guys want to sit (our insertion point). Then we have the guy still sitting at our blue pointer. Why is he special, well when we moved those last guys, it was easy, we could copy them forward into the uninitialized memory. But the guy in blue is different. He’s different, because he has to sit in a chair where someone else has. Social Distancing Alert!!! This means we have to do something different, in real life we would need to sanitize the chair. Thankfully, in our world, we just need to call the assignment operator since the memory is already initialized. For the given distance between our old end pointer, and our insertion point, we need to copy all the items backwards. This is so we can move without overrunning our objects.

That leaves three open previously initialized slots, that we can just assign our new elements to.

After I made this infographic I realized it would’ve been super helpful to illustrate the copy backwards on more than one element. The most important thing to notes is that we need to break the moving into two distinct steps. The first is to calculate and forward create the new elements. The amount will be the count. Then we will figure out the overlap from the insertion point, to the old end, and copy that backwards. There are a few other cases depending on where the insertion point is, but I just wanted to illustrate this case. It’s really just a warm-up for understanding the next point and a big topic when working with the STL algorithms. Up until this point I’ve just been calling these arrows, pointers. Which, up until this point, has been what I’ve been using… However, I’ve been using them to step through different areas of memory, which in C++ isn’t exactly what a pointer is supposed to be used for, but technically it can be used for that. In C++ we have an object specifically used for stepping through collections.

Bingo! You guessed it, it’s called an iterator. Like a pointer, an iterator is used to point to objects, but specifically within containers. They allow us to, unsurprisingly, iterate through containers. For something like a vector, a pointer and an iterator can be analogous. Since a vector is contiguous memory, incrementing and decrementing the pointer will move you along the vector. Whereas for a container like a tree, this isn’t the case. An iterator encapsulates the traversal logic, so if you’re traversing a non-contiguous container (i.e. linked-list, map, tree, etc..), you don’t have to worry about how to traverse to the next element. You simply call increment the iterator, and it’s handled. The other thing the abstraction can give us is safety. It can do things like bounds checking, and container validation. There are also different kind of iterators, there’s Input & Ouput Iterators. There’s Forward only Iterators, Bi-Directional, and Random Access Iterators. An Input iterator is a single use iterator, and it can only be used to read. So it can be read once, and then incremented. An Output Iterator, is a single use iterator as well, but used for, you guessed it output. This lets you write into a collection. A Forward Iterator is an iterator that lets you traverse from the start to the finish of the iterator, but allows you to read more than once at each position. Bi-directional, lets you move back and fourth between the bounds, incrementing a single step in each direction. Last but definitely not least, the Random Access Iterator. It lets you basically jump around, you can increment it by any value and it will be able to move to that position. This is the iterator type that is analogous to a pointer. All of that being said if you look the Stepanov vector we used as a recipe to follow, and added our own spices, it used a raw T pointer as its “iterator”.

In order to fill substance and have more than 4 minutes worth of reading in this post. We’re not going to settle for that lame old pointer iterator. We’re going to implement our own vector, Random Access Iterator! At the time of writing this, I have not yet actually implemented it. Though, here’s my initial thought. It won’t be hard. It basically just needs to mimic a pointer… How hard can that be?

Whew…. Back, what a wild ride that was. We haven’t looked at any code yet; So let’s start by taking a look at the iterator. Like our vector, our goal is to write a simple iterator.

/*
       We're going to implement a simple iterator, that uses a simple allocator,
        and implments all public members of the vector on CPP reference.

        Goal: It needs to function, and be just past a toy. It needs to work
        with the std::algorithms.
*/
template <typename VectorT>
class vector_iterator
{
public:
    using iterator_category = random_access_iterator_tag;
    using container = const VectorT;
    using difference_type = typename container::difference_type;
    using value_type = typename container::value_type;
    using pointer = typename container::pointer;
    using reference = typename container::reference;

// ...
};

So when I looked at the MSFT STL vector & iterator, I’ll be quite honest — there’s a lot going on in there. A lot of which I didn’t understand. Some of which I felt would over-complicate what I was trying to achieve. To a certain extent, it’s a rabbit hole I wasn’t willing to Alice my way through. Most of it likely stems from reuse across other parts of the STL. As well, I think part of it, may be due to SCARY iterators.

Thanks to the kind redditor that pointed me in the direction of SCARY iterators; my understand is slightly deeper than shallow on the topic. Basically it has to do with minimizing the parameters on the iterator dependencies. If you look at the template argument of the iterator I’ve written, it takes a VectorT, thus when we pass vector_iterator<VectorT>, the template argument for the iterator is actually htk::vector<int, default_allocator<int>>, which makes the iterator type actually vector_iterator<vector<int, default_allocator<int>>. Previous to C++11, this would cause a compilation error, if you tried to assign a non-default allocated vector iterator to a, default iterator. Like so,

std::vector<int, non_default_allocator> v;
std::vector<int>::iterator i = v.begin();

But, apparently with the SCARY iterator changes that MSFT made in early 2012. This will in-fact compile just fine. Let’s not kid ourselves though, the toy we’re building, won’t compile. It’s a concession I’m willing to make for us.

Ploop …

That was us coming out of the rabbit hole, and back to our own iterator implementation. As I mentioned earlier, I think, and if I didn’t then I’m mentioning it now, this iterator is a thin shim that needs to effectively behave like a pointer. What can we do with a pointer? We can increment and decrement it by some offset, or pre/post increment/decrement. We can de-reference it, or we can follow it ( operator-> ). So let’s go forth and begin.

template <typename VectorT>
class vector_iterator
{
public:
    using iterator_category = random_access_iterator_tag;
    using container = const VectorT;
    using difference_type = typename container::difference_type;
    using value_type = typename container::value_type;
    using pointer = typename container::pointer;
    using reference = typename container::reference;



private:
   pointer ptr_;
   container *vec_;
};

Without sounding like I’m beating a horse that isn’t alive, with a stick. It’s pretty simple, because we’re just building a thin shim over a pointer, that means we need a pointer. Technically, that’s all we need, but given that we’re paying for this shim, we might as well use it. We can get a bit of safety out of this thing, by hanging onto a pointer to our original container. This will allow us to do things like bounds checking, and validate we’re using iterators from the same container.

There’s really nothing surprising about this implementation so I’m only going to show a couple operators.

template <typename VectorT>
class vector_iterator
{
public:
    using iterator_category = random_access_iterator_tag;
    using container = const VectorT;
    using difference_type = typename container::difference_type;
    using value_type = typename container::value_type;
    using pointer = typename container::pointer;
    using reference = typename container::reference;


public:
    vector_iterator()
        :ptr_(nullptr), vec_(nullptr) 
    {
    }

    vector_iterator(pointer p, container *vec)
        :ptr_(p), vec_(vec)
    {
    }

    const reference operator*() const
    {
        assert_(ptr_ != nullptr, "empty iterator");
        assert_(ptr_ >= vec_->data_.first && ptr_ < vec_->data_.last, "out of bounds");
        return *ptr_;
    }

     reference operator*()
     {
        return const_cast<reference>(const_cast<const vector_iterator*>(this)->operator*());
     }

     pointer operator->() const
     {
        assert_(ptr_ != nullptr, "empty iterator");
        assert_(ptr_ >= vec_->data_.first && ptr_ < vec_->data_.last, "out of bounds");
        return ptr_;
     }

     vector_iterator& operator++()
     {
        assert_(ptr_ != nullptr, "empty iterator");
        assert_(ptr_ < vec_->data_.last, "cannot increment past end");
        ++ptr_;
        return *this;
     }

     vector_iterator operator++(int)
     {
         vector_iterator tmp{ *this };
         ++(*this);
         return tmp;
     }

private:
   pointer ptr_;
   container *vec_;
};


Some patterns I noticed in MS STL which are good practices are to implement certain operators, in-terms of the like operator. For instance operator++() and operator++(int), prefix and postfix increment operator respectively. One can implement postfix, in-terms of prefix, by simply creating a temporary copy, prefix incrementing yourself, and returning the temporary. This removes the necessity to do bounds checking in two places. Let’s use the iterator. We easily just flip the using (typedef) to use the vector_iterator and const vector_iterator appropriately, and voila we can be using it.

So earlier we went through the algorithm for insertion of an element into our vector. The signature on this function is as follows

iterator insert(const_iterator where, size_type count, const T &value)

as well there is an insert range which has a different signature taking an iterator pair, to denote a range.

template <typename It>
iterator insert(const_iterator where, It start, It fin)

For this insert overload, it takes the position, and an iterator pair to the range you’d like to insert. One would use it like this.

htk::vector<int> v;
v.emplace_back(1);
v.emplace_back(2);
v.emplace_back(3);
v.emplace_back(7);
v.emplace_back(8);
v.emplace_back(9);

htk::vector<int> v2;
v2.emplace_back(4);
v2.emplace_back(5);
v2.emplace_back(6);

v.insert(v.begin()+3, v2.begin(), v2.end());

Then we just compile it and it compiles fine. We get what we’d expect { 1, 2, 3, 4, 5, 6, 7, 8, 9 }. Next we try something a little more complicated. Well, it’s not complicated, it’s just we’re mixing our two insert calls.

 htk::vector<int> v;
 v.insert(v.end(), 4, 1);

 htk::vector<int> v2;
 v2.emplace_back(1);
 v2.emplace_back(2);
 v2.emplace_back(3);
 v.insert(v.end(), v2.begin(), v2.end());

We compile this… Wham! Illegal indirection. Wait — what does Illegal Indirection mean? From MSDN

Indirection when operator ( * ) is applied to a non-pointer value.

Compiler Error C2100


Wait — how can that be? I’ve obviously passed iterators, see v.insert(v.end(), v2.begin(), v2.end()); If you look closely, both inserts are the same amount of arguments, one takes 2 template arguments, and the other takes a size_type, and a T. But in our case T is int, and the value we’ve provided for size, 4, just so happens to be an integral literal which is an int. So, both of the arguments just so happen to match both signatures, so the template signature is chosen, because it’s closer. When the template signature is chosen, at some point we’re trying to dereference an iterator, which has been assigned the value 4. So what do we do?

If you recall from the last post, I mentioned in passing this thing called SFINAE. Then, I promptly took a right turn to avoid it, and we used a similar, but tastier concept called tag dispatching. Well, I must’ve taken 3 more right turns, because here we are again… The reality of this situation, is that we want the compiler to “test” whether or not this is the right function, and have that “test” not be an error. If it fails the test, the compiler will just carry on looking for an appropriate match. With simple glasses on, this is what SFINAE is. It stands for Substitution Failure Is Not An Error, it means that when a template substitution fails, it doesn’t generate a compiler error. This can be used to decorate functions, so that depending on the compiled type_traits, different functions can get called.

The canonical example, from Wikipedia:

struct Test {
  typedef int foo;
};

template <typename T>
void f(typename T::foo) {}  // Definition #1

template <typename T>
void f(T) {}  // Definition #2

int main() {
  f<Test>(10);  // Call #1.
  f<int>(10);   // Call #2. Without error (even though there is no int::foo)
                // thanks to SFINAE.
}

Let’s look at an example I cooked up.

// Type your code here, or load an example.
#include <string>
#include <iostream>
#include <type_traits>

template <int value>
constexpr bool is_three()
{
    return value == 3;
}

template <int value>
constexpr bool is_three_v = is_three<value>();

template <int value>
std::enable_if_t<!is_three_v<value>> foo()
{
    std::cout << value << std::endl;
}

template <int value>
std::enable_if_t<is_three_v<value>> foo()
{
    std::cout << "three" << std::endl;
}

int main()
{
    foo<1>();
    foo<2>();
    foo<3>();
    foo<4>();
}

and just like that you get!

1
2
three
4

The idea is that you plan your substitution failures, so that one function will resolve to a better match, and in this case, one function will resolve at all at any given time. Often, to help with SFINAE, library programmers will use meta-programming tools like std::enable_if to enable SFINAE. Under the hood, it looks something like this.

template <bool Eval, typename T = void>
struct enable_if {};

template <typename T>
struct enable_if<true, T> 
{
    using type = T;
};

template <typename T>
using enable_if_t = enable_if<T>::type;

If you look at the Wikipedia example, it uses the failure of T::foo as a failure, to fallback on the other function. With enable_if, if the evaluation is true, there is a type defined, if not, nothing. This allows you to put an enable_if where you would regularly use a type, and if it the evaluation is false, the substitution failure will continue. If we pop back to what we were doing.

template <typename It, htk::enable_if<htk::is_iterator_v<It>>>
iterator insert(const_iterator where, It start, It fin);

In our case, we only want to enable this insert call if, well if the type were passing is an iterator. We need to implement is_iterator_v. How do we go about doing that? We need some way to check if the type has something that indicates it’s an iterator. What makes an iterator unique? Looking back at the iterator, it defines some specific….characteristics. Yah — characteristics. So one of the characteristics is the iterator_category that this thing is. This tells us what kind of iterator it is. Any iterator should have this, so we can implement our is_iterator_v with that in mind.

template <typename T, typename = void>
struct has_iterator_cat : false_type {};

template <typename T>
struct has_iterator_cat<T, void_t<typename T::iterator_category>> : true_type {};

template <typename T>
constexpr bool has_iterator_cat_v = has_iterator_cat<T>::value;

template <typename T>
constexpr bool is_iterator_v = has_iterator_cat_v<T>;

You may notice the void_t on the specialization; I’m not 100% on my understanding of why it is needed, but it is as follows. When the compiler tries to instantiate has_iterator_cat<T>, it can either instantiate the specialization, or fall back to the default. With the void_t it will either be able to evaluate the expression thus completing the specialization, or it will fail, and fall back to default. Depending on which gets instantiate, we land on true_type or false_type, giving us the appropriate value for our type. Now we can apply this gently to our insert function, and our code will compile.

I said earlier that pointers acted as iterator, and denote ranges, so we should be able to do something like this.

htk::vector<int> v;
v.emplace_back(1);
v.emplace_back(2);
v.emplace_back(3);

int arr[3] = { 4, 5, 6 }
int *b = arr;
int *e = b+3;
v.insert(v.end(), b, e);

Then we hit compile, and get a compile error, “cannot convert int* to size_type“. Obviously, we did something wrong. Pair programming problem. 🙂 What did we do wrong though? Let’s take a look at what our rule we defined. We said, if the T::iterator_category was true, then we would enable the function. When we’re passing b, what is its type? int *. Oh – kay. int * obviously doesn’t have an iterator_category, so the SFINAE evaluation is failing and it’s using the other insert function. Let’s fix that, we’ll just go to the definition of int *, and add iterator_category like we did to our own iterator. Uhh — what? You mean we can’t just go change the definition of int *?

Did you pick up on my subtle foreshadowing before in the post? These properties about iterators, I called them characteristics. I was specifically trying to avoid the word traits, so that I wouldn’t give away everything at once. So we can illustrate why we need iterator_traits. Before, when we wrote our type_traits library, it was a collection of classes to distinguish properties of types at compile time. Because we can’t directly modify int *, we can create something that describes the iterator traits of a pointer.

Side note: this is one of the most powerful parts of the STL. The ability to be extensible without modification. i.e. the Open-Closed principle. The library is very open for extension, meaning you can easily make your classes work with it, and closed for modification. Meaning you don’t need to open the thing up and changes its guts to enable extension.

We need to create a small set of helper classes, that will describe these traits about our iterators. That we can then use to check if an object is an iterator. We need iterator_traits!

template <typename, typename = void>
struct iterator_traits 
{
};

// uses SFINAE to overload if iterator has the traits. 
// 
template <typename Iterator>
struct iterator_traits<Iterator, 
void_t<
typename Iterator::iterator_category, 
typename Iterator::difference_type, 
typename Iterator::value_type, 
typename Iterator::pointer,
typename Iterator::reference >>
{
    using iterator_category = typename Iterator::iterator_category;
    using difference_type = typename Iterator::difference_type;
    using value_type = typename Iterator::value_type;
    using pointer = typename Iterator::pointer;
    using reference = typename Iterator::reference;
};

template <typename T>
struct iterator_traits<T*>
{
    using iterator_category = random_access_iterator_tag;
    using difference_type = htk::ptrdiff_t;
    using value_type = T;
    using pointer = T*;
    using reference = T&;
};

template <typename T>
struct iterator_traits<const T*>
{
    using iterator_category = random_access_iterator_tag;
    using difference_type = htk::ptrdiff_t;
    using value_type = T;
    using pointer = const T*;
    using reference = const T&;
};

We start out by defining our base case, this has no traits whatsoever. So if you instantiate iterator_traits<Foo>, you get no members thus no traits. Which makes sense, because Foo is not an iterator. Second, we have this massive specialization, using that void_t trick, to check that the type has all the correct iterator members. If it does, they use to the trait typedefs, which expose the iterator types themselves. Next, we have the pointer specializations, which give us our traits for pointers. Which is exactly what we needed for this scenario. Then we just have to change our has_iterator_cat, to use the iterator_traits.

template <typename T>
struct has_iterator_cat<T, void_t<typename iterator_traits<T>::iterator_category>> : true_type {};

After this change, our compiler stops complaining, and our insert function now works with pointers! What a ride! As to not inundate the reader, I’m going to start wrapping this post up. Let’s take a look back and see how far we came. I wanted to explore getting this vector to work with some STL algorithms. To do this, we had to look at building an iterator for our vector. Because the STL algorithms work on iterators. We really didn’t need to, since our original pointer iterator would’ve sufficed. Meaning, our vector already worked with the STL algorithms.

To summarize our adventure. We learned how the insert function works, by creating space within the middle of our vector. It does this, by moving the trailing items into the uninitialized space, and then move-copying backwards any items that overlap already used space. Then we took a look at creating an iterator, and then lastly implementing our range based insert, so that it works with pointers and iterators. We learned about SFINAE, and how we can use it to choose which function the compiler should use. We also learned, hopefully, that implementing a real deal STL is a lot of work. If you think about it, we’ve implement one container (partially).

The code is available at my Github, and I’ll leave it as an exercise to the reader to implement the reverse_iterator, and test out the vector with some STL algorithms.

In the next post, I’m going to pit my little vector against the MSFT STL vector. It’ll be very interesting to see how it fairs. As well, who knows, maybe we’ll take a crack at implementing some algorithms, since I didn’t get a chance to this time.

Until then, Stay Healthy and Happy Coding!

PL

Some beautiful paths can’t be discovered without getting lost.

Erol Ozan

References
Dining room, chair icon
https://devblogs.microsoft.com/cppblog/what-are-scary-iterators/
https://riptutorial.com/cplusplus/example/3778/void-t

Don’t try this at work – Vector Pt. 1

Welcome back! I hope you’re enjoying quarantine. I know for me, not being able to get out climb has definitely had an impact on my mental health. Oh, and my waistline. That being said, I’ve had more time to get into the code, which is a piece of my own little paradise.

As I mentioned in my last write-up, I’m trying to bring these posts back to basics, and give people more of what they’re interested in reading. Though, it’s very likely that the people find those posts more interesting, because I had more fun writing them. Reverse engineering the C++ unit test framework brought me immense joy, because I was really excited by it, and to write about it. With that in mind, I spent a good long while trying to figure out what I should post next. I haven’t had the itch to deconstruct anything lately. I decided to rack my brain, and remember what the original intent of my blog was. To demystify code, to remove the “magic” behind it. I’ve always been curious about the way things worked. As developers, we shouldn’t be afraid of the magic behind libraries, we should embrace it. We should strive to understand at least peripherally how the code we use works. I once made a comment on Reddit, to a untrained junior developer looking for tips on how to further his career. The tip was that they should take their work code home, and try to recreate it. For me, recreating code I didn’t understand was always the best tool to further my understanding. You start small, and you build, as you run into problems you start to understand why the author did the things they did. You can start to form opinions about the decisions they made, as you understand the problems they faced. Needless to say, this comment was down-voted into oblivion, and someone even suggested that I was instructing OP to steal their employer’s code. (Spoiler: I wasn’t.) Reddit not agreeing with me didn’t change my mind. I still think that reverse engineering something is the best way to understand it. If you’ve got access to the code on a laptop, I’m sure your boss wouldn’t mind if you took the initiative to understand company code better. Just don’t go selling your companies code.

So, that’s sort of what inspired this series, Don’t try this at work. In my personal experience, I’ve noticed a commonly occurring problem, people experimenting at work. I’m sure we can all agree that writing code, is as much an art, as it is a science. The problem is, that often times if people don’t have an outlet outside of work, they’ll use the company code base as their canvas. This can be problematic for smaller companies, who don’t always have the capacity and resources of a larger company to catch these things. These “works of art” can sometimes make it into production code, thus needing to be maintained for the lifetime of the application. This type of exploratory code is often coined “toy” code, it’s code you definitely shouldn’t be using at work. Hence, Don’t try this at work. In the Don’t try this at work series, I want to demystify the one of the most commonly used classes, in the most common library, used by almost all C++ developers. The Standard Template Library.

In this, the first of the Don’t try this at work series, I’m going to be looking at something every C++ developer, regardless of experience, uses (or at least knows of). Just because we all use it, doesn’t mean we understand it. So without further ado, let’s dive into the vector!

You can find all the code I write for this series here, if you have any questions or comments feel free to shoot me an email.

The STL vector is the probably the most ubiquitous container used in C++. If you need a generic dynamically sized container, the vector is the first tool you should grab from the STL’s bag of tricks. It’s a thin wrapper over contiguous memory, providing immense usability with little overhead over a raw array. It can easily be used with STL algorithms, and fits most cases where a dynamically sized container is required. It features constant time random access, amortized constant appending and removal to the end, and linear insertions and deletions. In short, if you don’t use vectors, you should.

Warning: Because a vector is a generic container, we’re going to be doing template programming. So if you’re faint of heart, stop reading now.

Where do we start? Well, the vector is essentially a wrapper over an array, that will expand when the array runs out of space. So, something simple probably looks like this.

template <typename T>
class vector
{
    T* data_;
    size_t size_;
    size_t cap_;
public:
    vector()
        :data_(new T[10]),size_(0),cap_(10)
    {
    }

    void push_back(const T& value)
    {
        if (size_ == cap_)
        {
            grow();
        }
        data_[size_++] = value;
    }

private:
    void grow()
    {
        T* tmp = new T[cap_ * 2];
        for (int i = 0; i < size_; ++i)
            tmp[i] = data_[i];
        cap_ = cap_ * 2;
        auto data = data_;
        data_ = tmp;
        delete data;
    }
};

Pretty simple right? The basics of a vector are that, we have an array, the count of objects (size), and the capacity. When we push a new object, we check to see if we have space, if not we’ll grow, and then push that item on the end. When we do have to grow, we allocate double the capacity, copy all the items over, and then delete the old array. Neat huh? The problem is, that this is toy code. It’s inefficient, it requires that objects be default constructible, and not to mention it isn’t at all safe. It gets the point across, but it’s not really rocket science, or STL science for that matter. This code is a Fisherprice vector. It’s light-years away from being a real vector. This is your 3 year old’s vector. We want a real vector. We want a vector that looks and feels like the vector you use from your compiler writer.

Where do we even start? Well, the first thing would be to maybe look at a real vector implementation. Cooking is also an art form, and when you’re learning how to cook something new, you generally use a recipe. So, let’s start with a recipe.

I’m a Microsoft guy, through and through. So naturally, my first thought was to go look at the files in VS 2019. So I cracked open Visual Studio, opened a new C++ file, typed #include <vector>, right-clicked “Go to Document <vector>”, and voila. Sorry vim folks, I’m not sure if you can do that, BUT BASICALLY I RIGHT CLICKED AND OPENED THE VECTOR FILE. Have you ever looked at the MSVC’s implementation of anything STL? Well, be thankful when “Just my code” works when you’re debugging. Here’s a snippet. (I didn’t write this code, it’s directly from MSVC vector)

Note: If you don’t use Visual Studio to code and debug. I truly feel sorry for you.

// vector standard header

// Copyright (c) Microsoft Corporation.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// ...
template <class _Ty, class _Alloc = allocator<_Ty>>
class vector { // varying size array of values
private:
    template <class>
    friend class _Vb_val;
    friend _Tidy_guard<vector>;

    using _Alty        = _Rebind_alloc_t<_Alloc, _Ty>;
    using _Alty_traits = allocator_traits<_Alty>;

public:
    static_assert(!_ENFORCE_MATCHING_ALLOCATORS || is_same_v<_Ty, typename _Alloc::value_type>,
        _MISMATCHED_ALLOCATOR_MESSAGE("vector<T, Allocator>", "T"));

    using value_type      = _Ty;
    using allocator_type  = _Alloc;
    using pointer         = typename _Alty_traits::pointer;
    using const_pointer   = typename _Alty_traits::const_pointer;
    using reference       = _Ty&;
    using const_reference = const _Ty&;
    using size_type       = typename _Alty_traits::size_type;
    using difference_type = typename _Alty_traits::difference_type;

private:
    using _Scary_val = _Vector_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Ty>,
        _Vec_iter_types<_Ty, size_type, difference_type, pointer, const_pointer, _Ty&, const _Ty&>>>;

public:
    using iterator               = _Vector_iterator<_Scary_val>;
    using const_iterator         = _Vector_const_iterator<_Scary_val>;
    using reverse_iterator       = _STD reverse_iterator<iterator>;
    using const_reverse_iterator = _STD reverse_iterator<const_iterator>;

    vector() noexcept(is_nothrow_default_constructible_v<_Alty>) : _Mypair(_Zero_then_variadic_args_t()) {
        _Mypair._Myval2._Alloc_proxy(_GET_PROXY_ALLOCATOR(_Alty, _Getal()));
    }

// ..

Mmhmmm. Yup. Feast your eyes. If the code I wrote was Fisherprice, this is Dewalt. It’s safe, efficient, and it works. The reason that we never question much about things in the STL, is because they almost always work the way we expect them to work. The authors of these libraries deserve praise. So if you’re an STL developer, you have my sincerest thank you. The problem is, that this code is so dense, and I for one am not smart enough to understand what’s going on at a glance.

Shout out to @StephanTLavavej, and all others on the Visual C++ team for making our lives easier.

What do you think a _Scary_val is? I actually have no idea.

Okay — so that recipe might as well be in Cuneiform, or French. Neither of which I’m particularly talented at reading. We need a recipe we can understand. Thankfully, I did some digging, and I was able to find something. Located in the depths of the internet, here, I found Alexander Stepanov’s implementation of the STL. If you don’t know who Alex is, he’s the father of the STL. The man is a living legend, to him we owe a massive thanks. Anyways, his implementation of vector, is clean, simple, and readable. It’s something we can easily consume. The only problem is, that it’s from 1994. Do you know what this means? It means it was written when mullets were popular. I would know, I had one. It also means, that it was written before any of the cool modern C++ existed. It was written before C++ was even standardized!! Talk about using Grandma’s recipe. It’s fine, we’ll make due, things haven’t changed that much, have they? Until I find a better resource, this is the best I’ve got. (Edit: I actually found another good resource, a bit into my experiment, the EASTL). I’m going to use a hybrid between Stepanov’s vector, the CPP reference standard, and whatever I can figure out from MSVC. Then I’ll sprinkle a little bit of my knowledge, and bingo, we’ll have a real working vector.

Let’s start by defining what we’re working towards. First, we want a standard vector. This means we want it to fulfill the standard for what a vector is. It needs to work with standard algorithms, which we’ll eventually test. Lastly, we’re going to compare it performance wise to MSVC’s implementation.

From cppreference.com, a vector is a sequence container that encapsulates dynamic sized arrays. It’s declaration looks like this.

template <typename T, typename Allocator = std::allocator<T>>
class vector;

So, the template declaration is a bit different from the toy code we looked at earlier, and Stepanov’s vector. It has this weird thing called an Allocator. So you might be asking, “what’s an allocator, and why is it a template argument?” Let’s address the first question, an allocator encapsulates your allocation mechanism. This means, it hides away how the memory is getting allocated, behind a constant interface. To answer the second question, it’s a template argument, so that you can supply what type of allocator your vector will use. A huge part of what makes the STL so nice, is that it’s very very flexible. It works against all your custom hand rolled classes, and it allows you to control how allocations happen for standard containers. It does this through the Allocator template argument. Why would anyone use a custom allocator? Well — a few reasons come to mind. If you’ve got memory constraints. Or maybe you have a highly optimized use for your vector. You might want to use a custom allocator for these cases. Maybe you noticed that Stepanov’s vector implementation doesn’t have the allocator template either. However, if you read deeper, he uses a static member allocator that’s defined at the top section of the code.

If you recall back to the toy code, we used “new T[]” directly to allocate our array. This is problematic for two reasons. It forces the user of our toy vector, to use the runtime’s default allocation mechanism and it will call the default constructor on all of the items in the array. So we force our library user to have a default constructor on all their objects and use the default allocation heap. No bueno. The Allocator interface breaks the responsibility of allocation and construction, into two separate calls. This way we can control when the memory gets allocated for our objects, and when our objects get constructed. The member functions of an allocator are:

template <typename T>
struct allocator
{
    allocator()=default;
    ~allocator()=default;

    // return the address of an object
    T* address(T&);

    // allocates sizeof(T) * count
    T* allocate(size_t count);

    // deallocates the storage pointed to by p, must have been obtained by a call to allocate
    void deallocate(T *p);
    
    // construct - a given object in place of p (deprecated C++17, removed C++20)
    template <typename ...Args>
    void construct(T *p, Args &&args);
    
    // destroy - destruct the object pointed to by p (deprecated C++17, remove C++20)
    void destroy(T *p);

};

Note: I’ve chosen to use the deprecated functions construct / destroy. They’ve been replaced by allocator_traits, and honestly, I just didn’t want to go that far down the rabbit hole for this first post. So my vector takes advantage of them. For now.

So given that we can’t have a vector, without an allocator, we should probably start our journey there.

For brevity, I’m not going to write out the entire implementation, but I will write out one of the important functions, allocate. Taking a look at Stepanov’s implementation.

/*
 * Copyright (c) 1994
 * Hewlett-Packard Company
 */
template <class T>
inline T* allocate(ptrdiff_t size, T*) {
    set_new_handler(0);
    T* tmp = (T*)(::operator new((size_t)(size * sizeof(T))));
    if (tmp == 0) {
	cerr << "out of memory" << endl; 
	exit(1);
    }
    return tmp;
}

It’s pretty simple, we just use ::operator new, to allocate some space of the size multiplied by the size of our type. If we fail (returns 0), we write to the standard error, and exit the application. We what? We write to the standard error, and exit the application. Hmmm. I would’ve expected this to use exceptions, but my best guess is that in 1994, exceptions were not yet widely accepted. Also, at the time, if you ran out of memory, what was a guy to do? Other than to print to standard error, and crap itself. Well — in this day and age, that isn’t appropriate. We’ve got something up our sleeves known as exceptions. Exceptions are a fundamental piece of the STL, and they’re perfect for this very thing. The only problem is that, in this little experiment we’re doing, we have to implement everything that isn’t bare metal C++ (mostly). So, we’ve gotta implement standard exceptions.

Thankfully, exceptions aren’t that complicated. They’re basically a wrapper over a constant character array, or in other words, an immutable string.

class exception
{
    const char * const message_;
public:
    exception()
        :message_(nullptr)
    {
    }

    exception(const char* message)
        :message_(reinterpret_cast<const char *>(::operator new(strnlen(message,256))))
    {
        if (message_ == nullptr)
            exit(-1);
        strncpy_s(const_cast<char *const>(message_), 256, message, 256);
    }

    exception(const exception& e)
        :exception(e.message_)
    {
    }

    ~exception()
    {
        ::operator delete(const_cast<void*>(reinterpret_cast<const void*>(message_)));
    }

    virtual const char* what()
    {
        return message_;
    }

};

class bad_alloc : public exception
{
public:
    bad_alloc()
        :exception("bad_alloc")
    {
    }
    bad_alloc(const char *message)
        :exception(message)
    {
    }
};

So, now that we have an exception to throw, we can use it in our allocator and have the allocation look something like this.

pointer allocate(size_t count)
{
    T* temp = static_cast<T*>(::operator new(count*sizeof(T)));
    if (temp == nullptr)
    {
        throw bad_alloc();
    }
    return temp;
}

Once we have our allocation function, the rest of the allocator is pretty self explanatory. Mine came out looking something like this.

template <typename T>
struct allocator
{
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = size_t;

    allocator() = default;
    ~allocator() = default;

    // address
    pointer address(reference x) const
    {
        return &x;
    }

    const_pointer address(const_reference x) const
    {
        return &x;
    }

    pointer allocate(size_t count)
    {
        T* temp = static_cast<T*>(::operator new(count*sizeof(T)));
        if (temp == nullptr)
        {
            throw bad_alloc();
        }
        return temp;
    }

    void deallocate(pointer p, size_type n)
    {
        ::operator delete(p, n * sizeof(T));
    }

    size_type max_size() const
    {
        return max(size_type(1), size_type(UINT_MAX / sizeof(T)));
    }

    template <typename ...Args>
    void construct(reference dest, Args&&... args)
    {
        ::new (static_cast<void*>(&dest)) T(htk::forward<Args>(args)...);
    }

    void destroy(pointer p)
    {
        p->~T();
    }
};

Basically, the calls are thin shims into the default C++ mechanism for allocation (new) and deallocation (delete). Then the construction uses placement new to construct the object in a given destination, and the destroy call explicitly calls the destructor.

If you’re still following along, before we even started writing our vector, we needed to create some foundations. I searched and searched, but I couldn’t find instructions on how to implement an STL. What I mean by that, is where do you even start? Obviously classes in the STL depend on other classes in the STL, but where’s the leaves? I wish I could say I found that. So my best bet it to just pick a common class, and implement. So thus far, we have the basic exception, we have a bad_alloc exception, and we have the default allocator. All so we can start on our vector.

In order to keep this post from becoming my first novel, we’re just going to focus on implementing one method on the vector. We’ll leave the default construction for now, and we’ll look at the emplace_back function. This is a function that allows for argument forwarding, into direct construction at the back of the vector. It’s a really nice starting point. It seems rather straightforward. I’ll soon find out, it isn’t.

So the signature of emplace_back is something like this. It takes a variadic set of arguments, and returns the object it created. The decltype(auto) specifies that the compiler is to figure out the return type based on what we’ve returned.

template <typename ...Args>
decltype(auto) emplace_back(Args&&...args)
{
    ensure_space_at_least(1);
    allocator_.construct(*(data_.last++), htk::forward<Args>(args)...);        
    return back();
}

This code though, is pretty straightforward. We first ensure we have enough space for 1 element. Then, we use our allocator to construct the element in the last position, and increment the pointer.

You might be asking about this strange htk::forward<Args>(args)? Well, I debated going into it, and going down that rabbit hole. But decided against it. What std::forward or htk::forward does, is passes arguments exactly as they were passed into the function. This means that if we’re passing r-value references into emplace_back, we’re forwarding those as-is to the allocator construct call.

What does ensure_space_at_least(1) do? In our toy vector had some code to grow and copy our existing objects. That’s effectively what this is doing. We’re going to ensure we have space for at least one element. If we do we can just create the new item, right at the end of that memory we allocated earlier. This illustrates separating allocation from construction. We allocate space which hasn’t been initialized, then when someone pushes into our vector, we just initialize that memory!

void ensure_space_at_least(size_t count)
{
    if (initialize(count))
        return;
    // we need to see if we have enough space.
    if (freespace() < count)
    {
        const auto cap = calculate_growth(count);
        grow(cap);
    }
    // we good.
}

So the first line, is just to initialize the vector. Since with this vector, we’re starting out with no memory allocation. The first call will just allocate some memory, and return. If we have enough memory allocated, we don’t do anything. Where we’re really interested, is when we don’t have enough memory available. Our growth algorithm, is just going to keep doubling the capacity until we can fit the count. So that’s what calculate_growth(count) will do. Just return us the next capacity that’ll fit our count. Then we need to grow our memory.

In order to grow the memory, we need to do a couple things. First, we need to allocate a contiguous block for our new capacity. Then we need to transfer all existing items into the new memory block. Then we need to free our old memory.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
           
    move_items(data_.first, data_.last, new_vec);

    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

First thing we do, is to allocate the size of our capacity. Then we call move_items, which takes the first, and last element, and a pointer to the new destination. Then in move_items, we just iterate from first to last, and construct the old item into the destination, and increment it. Then, we’ll assign our new vector pointer to our internal data pointer, then destroy all the old items, and deallocate the memory. With that, we have a pretty simple growth algorithm that copies from our old vector into the new space, and frees up the old items.

My initial thought was to type “we’ve got a problem here“, like we only have one. There’s a boatload more than one problem in this code. This folks, is why being a library developer is hard. At this point, we’re still only writing the basic algorithm. We haven’t taken into account safety, or optimization. Just thank your god, that you don’t have be a library developer. (unless you want to be, you masochist.)

The first problem we’re going to address, is the fact that we only support copying items. This means, that every time our vector expands, we’re going to copy all the items over. For large types, this can be costly, and as of C++11 we don’t need to copy the items. We want to take advantage of move semantics. Move semantics are a pivotal part of smart pointers, and optimizations for things like the vector, because they effectively allow you to move the guts from one object to another. This is primarily useful when you have a temporary, or an item you know will be going out of scope, and do not want to copy. It’s done by defining a “move constructor” (see Rule of 5).

If you think about an object like a vector, that has an internal pointer to some array of objects. In C++98, when we pass that object by value (a copy), to be correct, we would need to allocate enough space, and copy all the items over into the new vector. This means then, we would have two copies of our vector. If you were passing this into a function, that vector you copied would then be destroyed. This is very wasteful. This also would happen when you would return by value. It’s obvious today, that we need a way to say “I’m only temporary”, and when it’s a temporary, we can commandeer that temporaries pointer as our own, set his pointer to nullptr, and let him go out of scope. This way we’ve moved the object. Before we had move semantics we had a lot different techniques, like reference out parameters, and copy-on-modify semantics. With move, those workarounds aren’t necessary.

Obviously, we want to take advantage of move semantics when we copy our vector. That way, if the class we’re holding is optimized for move, we will use that and not copy it. As well, we can hold on to classes that have only move semantics, and not copy. Things like resource holders. So how do we do that?

In order to take advantage of move semantics, we need to use std::move (or in our case htk::move), which will convert an object reference into an r-value reference.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
   if(has_move_semantics())
   {
         move_items(data_.first, data_.last, new_vec);
   }    
   else
   {
         copy_item(data_.first, data_.last, new_vec);
   }

    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void copy_items(pointer first, pointer last, pointer dest)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

It’s that simple! Pfft. Library implementation is easy…. Just kidding. This probably is not that easy. If this way did work, we would be doing the check at runtime! Because whether or not a type can be moved, is based on the class definition (does the class implement a move constructor?). We can do this check at compile time. This means, that when the vector code is generated for contained type i.e. <T>, we know whether we want to move or copy it.

If you’ve ever programmed in a language that doesn’t have function overloading. You’ll know how important it is. Because it lets you specify a function with the same name, but different arguments, and then the compiler can choose the best function to call, based on the type you’ve provided it. This is great, because then we can write one algorithm, and based on the type the compiler can select the best function to call. When we’re talking template code, the compiler will only call the required function. This may seem like a digression, but it’s not, and let me show you why.

template <typename T>
class adder 
{
    T value_;
    public:
    adder(const T &value)
    :value_(value)
    {
    }
    void add(const T& value)
    {
        add_internal(value);
    }
    const T& get() const
    {
        return value_;
    }
    private:
    void add_internal(const std::string &s)
    {
        value_ += " + " + s;
    }
    void add_internal(int val)
    {
        value_ += val;
    }
};

This code isn’t really useful at all, but it’s just to illustrate the point, that based on the type of our template, we can call different functions. I made a compiler explorer here. You can see in the compiler explorer, that when we use a std::string, we call the appropriate add_internal. Can you see where this is going? We want to call different functions based on traits of the class we’re containing. So back to our grow code.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, /* something here */);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, /* argument when movable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, /* argument when copyable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

We want to take advantage of overloading, and the template system, to call the right call based on our type. This is a technique called “tag_dispatching“, we want to get the compiler to pick the right function, based on the template type we’re using. Or, at least that’s what we’re trying to do. 🙂

I’ve been eluding to “traits about a type” without outright calling it what it is. To a library author, I can only assume that type_traits are a bit like paint, to a painter, or gasoline, to an internal combustion engine. (or arsonist?) Type traits (type_traits) are part of the STL, and they allow you to get compile time answers to questions about the types that the template code is working on. In our case, we want to ask the question “is this class moveable”? and based on the answer, we’ll have the compiler call the right move_items call.

In the STL, these “questions” are generally answered by small helper structs called type_traits. In general, they take a type argument (or a few), and give back a compile time constant boolean value. You can ask questions like:

std::is_reference<T>;
std::is_pointer<T>;
std::is_member_pointer<T>;

Honestly, there’s a lot, if you can think to ask it about a class, it’s probably a type_trait. Type traits are a build up of little tiny structs, that enable huge possibilities. The root of most type traits, is this thing called a bool_constant. It’st just a small struct that has a boolean as a compile time constant. With this, we can define two types (obvs) std::true_type && std::false_type. These classes are aliases of bool_constant<true> and bool_constant<false> respectively. So let’s go back to the code.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, /* something here */);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, htk::true_type /* is moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, htk::false_type /* is not moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

You can start to see how we can get the compiler to choose which function to call. If we want to move the object, we use htk::true_type as the tag that we want to move, and htk::false_type when we want to copy. The last piece is asking the question! Thankfully, in the STL’s bag of tricks there’s a helpful trait called std::is_move_constructible<T>.

Since this blog is about demystification, this is the last piece (for now) of wool we need to lift off our eyes. How do we implement is_move_constructible<T>? I’ll be totally honest with you. I had 0 idea. My first thought was I needed to some how have the compiler test call the move constructor with the arguments, and see if that could compile, if not fallback to false_type, this is something called SFINAE. (I’m definitely not an expert, and I have little confidence I could explain it without a significant amount of research.) So after some thinking, I decided I need to phone a friend. So I pulled up VS again and dove into type_traits.

// STRUCT TEMPLATE is_move_constructible
template <class _Ty>
struct is_move_constructible : bool_constant<__is_constructible(_Ty, _Ty)> {
    // determine whether _Ty can be direct-initialized from an rvalue _Ty
};

That’s it. No crazy template magic. Just 3 little lines. Wemp. Womp.

But wait! What the heck is that __is_constructible(_Tv, _Ty)? Yeah. That had me too. I actually spent longer than I’d like to admit, trying to figure that *expletive deleted* out. It wasn’t until I found the EASTL, that it dawned on me. Well, there’s a line that gives it away.

#if EASTL_COMPILER_INTRINSIC_TYPE_TRAITS_AVAILABLE && (defined(_MSC_VER) || (defined(EA_COMPILER_CLANG) && EA_COMPILER_HAS_FEATURE(is_constructible)))

I’ll let you see their code when the compiler doesn’t do it, for yourself here. It’s really quite fascinating and very impressive. Thank goodness for Microsoft and the compiler writers, because they made that job easy. We basically have a compiler intrinsic function, that will answer that question for us! Let’s go back to our code, so we can finish this off.

void grow(size_type cap)
{
    pointer new_vec = allocator_.allocate(cap);
    
    move_items(data_.first, data_.last, new_vec, htk::is_move_constructible<T>::value);
   
    auto old = data_;
    auto curs = size();
    data_ = data{ new_vec,dest, new_vec + cap };

    // destroy the old stuff
    destroy(old.first, old.last);
    allocator_.deallocate(old.first, old.capacity());
}

void move_items(pointer first, pointer last, pointer dest, htk::true_type /* is moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), htk::move(*(first++)));
}

void move_items(pointer first, pointer last, pointer dest, htk::false_type /* is not moveable */)
{
    while (first != last)
        allocator_.construct(*(dest++), *(first++));
}

With that we’ve got our vector growing, and choosing the correct function for copying the old items, based on the type specified to the vector. Cool!

It’s been a journey, we waded (deeper than we may have wanted) into the world of the STL. We uncovered things like allocators, exceptions, and type_traits. We also looked at how we can leverage tag dispatching to ensure we’re doing the right thing based on the type at hand. We saw that library development past a toy, is hard. We didn’t even get into exception safety. Which admittedly complicates things passed something that’s understandable without writing a novel. I can only hope to be able to dig into that one day, until then we can let it remain magic. For now, we have the start of a vector, and some other useful tools in our just heavier than a toy STL.

I hope you enjoyed the opener in this vector series. Next up, we’ll be making our vector work with the STL algorithms. Then lastly, we’ll pit our vector against Microsoft’s, in a battle of the…. Well battle of the vectors. To see just how poorly this vector performs against the professionals.

Until then, stay healthy, and happy coding!

“Life is really simple, but we insist on making it complicated.”
― Confucius

References
https://en.cppreference.com/w/cpp/container/vector
http://stepanovpapers.com/butler.hpl.hp/stl/stl/VECTOR.H
https://github.com/microsoft/STL/blob/master/stl/inc/vector
https://github.com/electronicarts/EASTL
https://foonathan.net/2015/11/overload-resolution-3/


They say Hindsight is 20/20

As we enter into a new year, I can’t help but reflect on the last. What I did, what I didn’t do, with a focus things I’d like to accomplish in the coming 12 months. I’ll spare you the details of my self reflection. Though, I would like to share the method that I use for my approach. I am aware that this is a tech blog, some might feel that personal and self growth have no place here. Well to them, I say “thanks for the view”! 🙂 This is Unparalleled Adventure after all, and what would an adventure be without some reflection? For those of us in the industry, unless you want to stagnate, it’s important to set goals, reflect, and to take steps to get better. Self reflection and employee growth isn’t always on a company’s radar, which is interesting, because capital growth and ROI often are. For me, it seems there should be a positive correlation between employee growth, and growth of a company. However, I digress. If it’s not part of your companies yearly mandate to reflect and set goals. You should mandate it for yourself! Often times developers want to become super-star 10x developers overnight. This isn’t realistic, and can lead to disappointment in ones self. The reality of it, is that it takes time, and lots of work. Becoming a great developer is a lot like optimization, you can’t optimize what you don’t measure. So in order to progress, you need to reflect, and take stock of where you are, set some goals to progress, and most of all have a vision for where you want to go. For some of us, this can be quite an eye opening experience. If you’ve never reflected and looked objectively where you are at, you might surprise yourself, one way, or the other. If you’re struggling with the how to reflect, and set goals. It can be helpful to understand someone else’s approach.

Over the years, I’ve adapted a model that I feel has worked (for the most part) for me. As like anything in life, it doesn’t need to be followed to a tee, nor should it be written in stone. Instead it should grow and adapt, as you grow and adapt. You should follow what works, and disregard what doesn’t. My method may serve a a jumping-off point, or give you enough perspective to turn and run in a different direction. I’ve built the model I use off of what has worked for me in the past, removed what hasn’t, built on foundations learned in school, as well as tactics and practices I’ve picked up from books, conferences, podcasts, and some creative ideas I crafted myself.

If you’ve ever read the book Tools of Titans by Tim Ferris, you’d have seen the break down and categorization of Healthy, Wealthy, and Wise. Healthy being what directly affects the health of ones body. Wealthy being what directly affects the health of ones bank account. And, Wise being what direct affects the health of ones mind. For a few years, I crafted goals like I’d been trained in school. Short-term, Mid-term, and Long-term goals. I want to exercise more in the next 30 days. I want to learn to program in the next 6 months. I want to have my vehicle paid off in the next year. Regardless of the quality of these goals. I always struggled with a way to categorize and relate them. What does exercise, programming, and having my vehicle paid off have to do with each other? This problem was only magnified by my ability to be overly self critical. I felt that they didn’t make sense, and in turn it didn’t make me want to work towards them. After reading Tools of Titans, I decided to categorize my goals into the three categories. Healthy – goals that pertain to my physical health. Wealthy – goals that pertain to my financial health. Wise – goals that pertain to my mental health. I still kept the Short, Mid, and Long term lengths, and decided on one of each goal per category. This gave me a total of nine goals. Which wasn’t bad. I made them SMART goals, based on their time frame and I went from there.

The problem was within categories, the goals didn’t relate. The Short Term goal didn’t run into the Mid Term goal, which in turn didn’t do anything to help me get towards the Long Term goal. The second pitfall, was that my short term goal would get completed, and I would have nothing to work on that was “Short Term”, while I worked on the longer term goals. This year, I’ve decided a good structure is to have a “summary” for each category, and 3 SMART goals that work towards the summary per quarter. This leaves me with 3 months of goal to focus and work on, and in the next quarter make an adjustment. I’m hopeful this breakdown will give me more to work with, and allow me to work a little closer to my goals.

So, that brings us to the goals for my blog. In the two years I’ve had this blog, I’ve posted a whopping 14 posts. I’m proud of that, but it’s not the amount of content I would’ve liked to have posted. Though I never put a measurable goal, I would’ve liked to have closer to 50 posts. This year I dug in and set a goal, my goal for 2020 is 12 posts. It’s only 1 per month, so it’s seemingly doable. I’m hoping that 12 posts will raise readership and subscribers, and in 2021 I’ll be able to put more fingers to keys and share more of my learning.

In 2019, I had a total of 2 041 views, with 1 442 unique visitors. The post with the most views was The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 1) it had a total of 714 views! The series as a whole did quite well, totalling around 1500 views. My posts that talked about architecture, design, and advice I wish I would’ve listened to each had a whopping 1 view. (Probably was me reading it. :P) It’s obvious what people are interested in, and to be honest, it’s what I’m most interested in too. The code, digging into the details of the code, and doing obscure things.

I look forward to the next series of posts!

Until then, Happy Coding!

PL

Life can only be understood backwards; but it must be lived forwards. — Soren Kierkegaard

 

A Single Responsibility

Over the years I quite often find myself pondering software development patterns and practices. It’s the kind of thing I think about in my downtime, when I’m driving to and from work, or sitting at home… pondering. I always seem to come back to one question, is the common interpretation of the pattern, that is how the majority of the world sees and implements it, what the author had intended? Is that inventor okay with it? It’s not like I’m going to solve this mystery today. One day I might have the opportunity to meet one of these greats and ask them. Is the outcome of their idea what they intended? I often times find myself reading coding, or working on past code where the implementation, doesn’t line up with my current interpretation of the pattern / practice. Though, upon talking about the code, or reading comments, or blind guessing. It can be clear that the intent was to implement towards a given pattern. In my last post, I talked about Separation of Concerns, which in my opinion can be closely related to, and often confused with the “Single Responsibility Principle“.

The intent of my last post, on Separation of Concerns, was to show that it in itself is not a pattern. It’s merely a practice that can be applied to thought, at any level. This can apply to your application at a macro level (application architecture), all the way down to the micro level (lines of code). Where I see things start to get fuzzy for people, is around the application into class level. The Single Responsibility Principle, is an embodiment of Separation of Concerns, but this embodiment is at a specific level. Unlike Separation of Concerns, Single Responsibility Principle applies only at the class level. I mean, that’s in the definition.

Classes should have a single responsibility and thus only a single reason to change.

But it’s like everything in software, open for interpretation. Patterns, Practices, and Principles fall victim to the subjective nature of application development. The real world of development rarely sees foo and bar, outside of whiteboard sessions. This means that you have to deal with real world objects, and real world problems. Then as developers we have to translate the canonical and sometimes naive FooBar examples, into our real world problems. Sometimes, more often than not, especially with less experienced developers, this leads to incorrect or harmful application of these principles.

Sometimes strict adherence to an interpretation of SRP and “Separation of Concerns”, can be deleterious to an application. The unfortunate nature of this, is that it’s a problem that doesn’t manifest until much later, when it’s too late. Now, I’m not trying to sit on my high-horse and say I don’t misapply these things. In fact, it would be foolish to think that anyone is perfect when it comes to this. I would be willing to bet that Martin Fowler himself doesn’t get it right 100% of the time. The difference is that with experience, you’re able to spot the blunder before it’s too late. Before you’ve gone to far, and you’re on the cliff being faced with a re-write. In the real world, this often times ends in a manager wishing he would’ve reviewed the code a little earlier, or a little more often. Hopefully, this post will help to clarify and add some litmus tests to the application of this principle.

First off, Separation of Concerns, isn’t SRP. If you think that, just forget it. Right now.

Separation of Concerns is the organization of thoughts, the ability to separate components, that are not overlapping, or mutually exclusive. Single Responsibility Principle, is an application of this practice, of grouping things that have the same concern, or reason for change into a single class. So it has a real world, level of application. It’s at the class level, that’s where you apply it…. And this is where the problem stems.

Say you have an application, you’ve got a Web Service that deals with incoming client web requests, and you’ve got an Auxiliary Service that is moving data to and from a database, and servicing long running system requests. This is an example of Separation of Concerns. This is not an example of Single Responsibility Principle. It’s too macro, we’re talking at our application level. Not at the class level. The problem that will stem from this form of thinking, is macro level class development. God classes. Sure — you have a class that represents your service that “Fulfills web service requests”. That’s his single responsibility… But is it? Is it really? If we imagine this mock class, he would have to

  1. Receive a message from the Web Server Service
  2. Parse said message
  3. Understand what to do
  4. Fulfill the request
  5. Build the response

Now, that’s definitely more than one responsibility! But from a macro level, it’s easy to see your class as having a single responsibility. You’re Separating your Concerns, remember?

In this case, it would probably make sense to have 1 class for building and parsing messages, 1 class that can switch on those messages to dispatch what to do, and 1 class for each of the actions to fulfill the requests. Woah. Woah. Woah. Wait just a minute. You just said 1 class for building and parsing… Isn’t that violating the SRP? Well, the answer as so many are in Software Development, is ‘it depends’. That statement, was intentional. It was meant to bring light to the fact, that the definition says “a single reason for change”. When you’re dealing with protocols, building and parsing can often be symmetrical. Therefor, if the protocol changes, that could be our single reason for change of this class. So it could be said to have a single responsibility of dealing with the protocol.

As you can see, where you focus the light of Single Responsibility will really play a factor into the organization and structure of your code. Just wait until you start shining that light too close.

When you start focusing the light at a micro level into your code. You’ll start to actually factor out responsibility.

Imagine you have a system, that is used to dispatch sms style messages, but it’s old school, so it takes time. You’ve got a client facing API called a MessageBroker, and a background service called a MessageDispatcher. Clients of your API deal directly with the MessageBroker, they give the MessageBroker in the format of

class Message 
{
public:
   enum RecipientTypes { Specified, Random }; 

public:
   const Address sender_;
   const Address recipient_;
   const RecipientTypes type_;
   const String &message_;
};

The intent, is that we give the MessageBroker the message, and he’ll do something with it for later pick-up by the MessageDispatcher. The MessageDispatcher will ensure delivery of the message to the recipient. Now, the API of the Message class is such that if you set the type, to Specified and set the address, the message will arrive at your intended target. However! If you set the type to Random, it should go to a random target. This randomness, isn’t really random, it could be based on a location heuristic.

You might think it’s best to define an interface for the message broker, and make it look something like this.

interface IMessageBroker 
{
    void Send(const Message &msg);
};

Then you might implement the IMessageBroker, something like this.

class MessageBroker : public IMessageBroker
{
public:
    void Send(const Message &msg)
    {
           // Validate the fields are appropriate for database
           ValidateFields(msg);
           // put it in the db
           Database.Store(msg);
    }

};

There you have it! Our SRP MessageBroker! He has a single responsibility. He accepts a message and stores it in the database. That’s it right? Well, you might be asking “What about sending to a Random recipient?” Yah — that’s not the MessageBroker’s responsibility… His responsibility is to “Accept” a message, and store it in the Database. It’s someone else’s responsibility to determine the recipient. /s

I hope even without the /s, you saw the sarcasm in that. But this is the reality of imposing SRP at a micro level. You’ve just shirked the responsibility of this class. He’s not anything more than a glorified secretary for the database. Worse yet, the real responsibility is now imposed somewhere it doesn’t belong.

Let’s say you require the client of your API to specify it. Well — you’ve just opened up a can of worms… Who’s to say they’ll use the same heuristic? They might have your carrier running all over Hell’s Half acre for the random recipient. Okay — force them to use a Utility class to generate the random. What you’re saying now, is that they have to remember to use your utility class to set the recipient. Or else… That’s not a very good design.

It makes sense, to put the responsibility, where it belongs. If calculation of a random recipient is the behaviour of how this system works. It only makes sense to put that logic into the MessageBroker.

class MessageBroker : public IMessageBroker
{
public:
    void Send(const Message &msg)
    {
        if(msg.type_ == MessageTypes.Random)
           msg.recipient_ = GenerateRandomRecipient(msg.sender_);

        // Validate the fields are appropriate for database
        ValidateFields(msg);
        // put it in the db
        Database.Store(msg);
     }

};

What’s curious about this, is that you can start to see you never needed the IMessageBroker interface in the first place. You see, the point of an interface is to be able to define a boundary of responsibility. Then have any implementation of that interface, provide that contract. Now, in our case, what’s our contract? Well, we get the Message somewhere where it can be dispatched. If it’s specified as a Random type, then we have to know to generate a random recipient, and get that to be dispatched. Would you get this out of the contract the IMessageBroker defines? I wouldn’t. So, for this design, it doesn’t make sense to have an interface. It’s a specific implementation, for a specific Message class. They’re very tightly coupled to each other. The behaviour expectation of your Message client, is very much dependent on the implementation behind that interface, implicitly. As you can see, his responsibility, really came to light once we took that step backwards, and looked at it for what it really was. (If you’re curious how I would do it differently, shoot me an e-mail!)

In summary, when you’re trying to apply the Single Responsibility Principle, it’s really important to focus your view at the right level. Take an objective look, and ask, what is this thing really doing? What would be reasons for me to make edits to this class? If you are honest, and you start to see that your class has many jobs. You can start to refactor some of those into their own classes, and compose that larger class of the smaller classes. Nothing says that Composition cannot occur with SRP. It just means your larger class’ job, becomes orchestration. There’s nothing wrong with that. The trap you don’t want to fall into, is shirking responsibility. You don’t want to start refactoring your code, where you’re pulling out responsibilities, and putting them on other classes. This will lead to a host of skeleton classes that are merely pass-through objects. The refactor in that case, is to look for the responsibility you’ve pushed on your clients. Ask yourself, should the client need to know that? If I was using this, and didn’t write it, would I know to do this? Those are the types of questions that you’ll find start to drive responsibility back into where they belong. That said, it’s a balance, and finding it is tough, it’s just a matter of practice.

I hope you enjoyed the post. As always — Happy Coding!

PL

“The price of greatness is responsibility” — Winston Churchill