Leak Free, by Design

C++ is a brilliant language, one that has had my heart for many years. Though with the rise in web applications, more developers are finding themselves working in full stack web applications. Which almost never utilize C++. This can be seen in the 2020 StackOverflow developer survey, where 69.7% of professional developers are utilizing JavaScript and 62.4% HTML / CSS, compared to C++ measly 20.5%. This saddens me, because even though I too dabble in those languages, I would hazard a guess that a large majority of those professional developers have never even seen C++, let alone worked with it. It’s the unfortunate side of demand side economics. Unfortunately, I suspect that the closest a vast majority of JavaScript developers have ever gotten to a memory leak is a post on /r/ProgrammerHumor. And the closest thing to a crash is a big red blob in the console. However, there’s good reason why it’s such an accessible language, and why people can throw together applications so rapidly. JavaScript is a scripting language, and it was thrown together in ’95 to be exactly that, and it is still roughly that. Even with 2.5 decades to evolve, it’s still a web scripting language. You don’t have to understand much past basic logic to put together an application, and with the wealth of libraries you don’t need much to really make something cool. Now, contrast this with a language like C++, where without a bit of grit and determination, it’s hard to get off the ground. Even once you do, you’re working with a black console fronted green text. Gone are the days where you could whip together a cool console app that asked your friends questions, and spit out some silly answers, or even let them play a game. If it’s not Fortnite, it’s not cool, and if it’s not on my phone I don’t want to see it. It’s no wonder why those StackOverflow numbers are the way they are. The sad thing is that many people don’t understand that languages like JavaScript are built on the shoulders of giants, and as flashy as JavaScript web apps are, you don’t build operating systems, real time mission critical systems, and real console or PC games, in languages like JavaScript. As much as they want it to, it just can’t happen. The problem is, that people are afraid of languages like C++. It’s scary to offer your 19 year old intern the keys to the Ferrari. If he doesn’t understand what he’s getting himself into, it’s relatively easy to drive that thing into a wall. It’s much easier to let him ride the JavaScript moped around the block. This is the problem that modern C++ developers need to address. In my opinion, with the recency of the modernization of C++, and the Microsoft open-source movement, I feel that there has been a resurgence in C++. But we need to fight for this, we need to educate, and we need to push the features of the language that make it safe, by design.

About eight weeks ago, I was contacted by a gentlemen by the name of Artem Razin. He was contacting me, because in the relatively small C++ community, he was looking to get word out about his product DeLeaker. He wanted me to write a review about the product. Truth be told. I’m not writing this blog to give product reviews, so I originally declined. I just want to work on my writing skills, and share some of my knowledge. However, I also want to be an active member of the community, and here someone was reaching out to test their product, and help them share it with the world. That being said, I downloaded and tested out the application. Needless to say, the application does what it says. It integrates into Visual Studio, and it detects leaks of all kinds, and it does this with little to no impact on your application. The problem is, I don’t have a real C++ application to try it on. Most of my hobby C++, is with libraries, and other fun toys, so I can’t say how it runs in the real world. But my curiosity struck me more than anything for how this thing worked. So, Artem and I went back and forth, as I questioned him about how the application did what it did, and how it got its start. I started putting together how I could craft a blog post about it. Since my blog is about understanding, education, and uncovering mysteries. What could I write? Ahhh ha. I had it, my post would be about lifting the covers on an application like DeLeaker. How would we go about detecting leaks in applications? When I started my research, it was pretty straightforward, I just had to figure out how to hook an application, and detect memory allocations. On Windows, it’s relatively straight-forward. Though, as I worked on it, I thought to myself about JavaScript, and other languages where this wasn’t necessary. I imagined a world, where we didn’t have to worry about this in C++. A world where leaks in C++, were a thing of the past. Then I realized that, that is the theme of this post. Creating leak free applications, by design.

What’s wrong with this code?

void foo()
{
     faucet *moen = new faucet();
     moen->run();
}

I’ll give you a hint, it’s a memory leak, a rather obvious one of course. The problem is that leaks don’t always look like this. They can hide in plain sight.

void foo()
{
    faucet *moen = new faucet();
    moen->run();
    delete moen;
}

No leak, right? Wrong. How can that be, we’ve got a paired delete for our new.? The problem is that since exceptions are the default mechanism for handling errors in C++, we have to be wary of them. This means that the run() method, could potentially throw an exception. If it does, and it’s caught somewhere higher up, we’ll never call the delete, and we’ll have the same outcome as the method above. A little leak.

If you’re a veteran, you already know this, and you know the answer for this. Use a smart pointer. For those of you who aren’t veterans, and you don’t know. A smart pointer is the encapsulation of a very powerful pattern in C++, called RAII. Resource Allocation Is Initialization. What this means, is that all allocated resources should be attained in initialization, and subsequently let go in destruction. In this case, heap-allocated in construction, and delete in destruction. This way, the code becomes memory safe, just by using the “smart pointer”. Smart pointers, are used to represent ownership, and ownership is a very powerful concept in C++.

#include <memory>
void foo()
{
    auto moen = std::make_unique<faucet>();
    moen->run();
}

You’ll notice that the code is much similar to our first try, except now it won’t leak. I’ve included the <memory> header, that’s where you’ll find these tools. You’ll also notice the removal of the * and type, for the keyword auto. Well, for our JavaScript and C# readers, that’s similar to the var keyword. The reason I’ve used it, is that the type is no longer a faucet, nor is it a pointer. When we use make_unique, we’re actually creating a wrapper object called a unique_ptr which is handling the allocation and deallocation of our faucet type, through the miracle of templates, and it acts just like a pointer to a faucet. The type of moen however, is now std::unique_ptr<faucet>.

Here we are at 1200 words, and the morale of the story is this use smart pointers, and you’ll never ever ever have a leak again in your life. Sorry Artem. 🙂 If only life was that simple. The problem with real code, and not contrived simplified blog code, is that it’s never that cut and dry. The reason why JavaScript is easier, is because you don’t have to think about design. You can just code a tangled rats-nest of JavaScript, and it can still be a functional working application. Needless to say you can shoehorn your C++ application into the category of “functional”, and “working”, with enough trial and error as well. Obviously the answer to Leak Free code, isn’t just “use smart pointers”.

faucet * make_faucet()
{
    faucet *moen = new faucet();
    moen->run();
    return moen;
}

In this example, we want to start our faucet, and pass it back to the caller. Unfortunately, unless the caller knows they need to call delete, we could have a leak. As well, we’ve also opened ourselves up for the potential of an exception causing the leak. Use smart pointers.

#include <memory>
std::unique_ptr<faucet> make_faucet()
{
    auto moen = std::make_unique<faucet>();
    moen->run();
}

Great. This works the same. Smart pointers are heating up!

#include <memory>
std::unique_ptr<faucet> make_faucet()
{
    auto moen = std::make_unique<faucet>();
    moen->run();
}

void turn_on_hot(std::unique_ptr<faucet> f)
{
    f->turn(dir::right, 0.75);
}

int main()
{
    auto f = make_faucet();
    turn_on_hot(f);
}

Oops! error C2280: 'std::unique_ptr<faucet,std::default_delete<_Ty>>::unique_ptr(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function. If you’ve programmed with modern C++ for any length of time using smart pointers, you know what this is. This is definitely scary to a beginner. Have you ever programmed in JavaScript, and gotten a compilation error that told you, you were referencing a deleted function? Oh wait. Nevermind. Regardless, for a beginner this error is difficult to reason about. The simple answer, is that you can’t copy a unique_ptr, the author has deleted this function. For good reason, if you understand it, but if you’re coming from a world where you just pass everything to everything, and you don’t even know what this is. It’s scary af, as the kids say.

The problem is that smart pointers alone, are not the key to writing safe, leak free applications. Just because I have a garage full of tools, doesn’t make me a mechanic. So where does this leave us? One of the most powerful features of C++, that many other languages don’t have, is well defined object lifetime. In C++, we can know exactly when objects come to life, and exactly when they’re destroyed. In my opinion, this is one of the most powerful language features, especially when you’re trying to write safe, leak-free code. In order to do this it means we must both understand and design with this in mind.

What is Object Lifetime when it comes to C++? According to cppreference.com lifetime definition, “A runtime property: for any object or references, there is a point of execution of a program when it’s lifetime begins, and there is a moment when it ends.” For an object, it’s life begins when the storage for it is obtained, and any initialization has been completed. The objects life ends, when the destructor call starts, or the storage has been released. Now, as an exercise to the reader to understand the other caveats, and lifetime of references. Understanding this, we can start to plan our programs so that the lifetime of our objects is a well-known principle of our application. This becomes important, because it will impact the type of smart pointers you use, and how you go about passing those pointers around. The choices you make for lifetime, will also impact how you model ownership in your application, but we’ll get to that.

Let’s start with our lifetime model. How long does an object live in our application? The simplest approach is often the best. The simplest approach here, is to break an object lifetime up into either short-lived or long-lived. In “Writing High Performance .NET Code”, author Ben Watson suggests to beat the garbage collector, you should make the objects either really long lived i.e. the lifetime of the application, or really short lived i.e. function scope. This evades the most costly part of garbage collection where the object is relocated. I feel that the same mentality holds true in C++, not for the cost of the garbage collection, but for the mental cost it incurs to reason about whether an object should still be there or not. If we know an object has a long lived lifetime, then we know it should be safe to pass this object around by reference without worry of it dangling. However, if we know it’s a short-lived object, then we have to think twice before we pass references to it. This model also applies relatively to objects that contain children. So long as a parent object is the sole owner of the child, then it is safe to assume the parent will outlive the child. It makes code like this, valid.

class parent;
class child
{
public:
    child(parent &p) : parent_(p), life_(10) {}
    void feed();
private:
    int life_;
    parent &parent_;
};

class parent
{
public:
    parent(int life) : life_(life) {}
    void give_birth()
    {
          children_.emplace_back(*this);
    }
    void feed_children()
    {
         for(auto &child : children_)
             child.feed();
    }
    void subtract_life(int val)
    {
        life_-=val;
    }
private:
    int life_;
    std::vector<child> children_;
};

void child::feed() 
{
   life_ += 10;
   parent_.subtract_life(10);
}

int main()
{
    parent p(100);
    p.give_birth();
    p.feed_children();
}

In this example, we know that the parent owns all the little vampiric children, so they can be sure that it’s safe to suck the life from their parent.

It’s important to remember the distinction between stack and heap allocated objects. A stack allocated object one like p above, will live the lifetime from the line after it’s defined on, to the closing brace it resides within. A heap allocated object will live from the line after new, to the line which delete is called. You can have long and short lived stack objects, and long and short lived heap objects. Though typically, you’ll want short lived objects to live on the stack, and sometimes you’ll want long lived objects to live on the heap. However, it can also make sense to have long lived objects that are on the stack, allocated at the beginning of your program and live until it completes.

If we understand object lifetime and ownership model, it can start to paint a picture in our mind about how we can use RAII and object passing semantics to ensure that our applications are leak free. In fact, we can design our applications in a way that we know there isn’t a leak. We can do this with 3 basic steps.

Subscribe to RAII, ensure any sandwich code is properly encapsulated in an RAII wrapper, which acquires the resource in the constructor, and releases the resource in the destructor. Use smart pointers for heap allocation of objects.
Define a well known ownership and lifetime model. Define long and short lived objects, and have the longer lived objects own the shorter lived objects, where possible.
Based on ownership and lifetime, choose the appropriate argument passing semantics.

The first two are relatively straightforward, we want to prefer stack allocation where possible, but use smart pointers when we need to allocate, and use an RAII wrapper when we need to acquire / release a resource. Then we want a well-defined ownership and lifetime model. Meaning that we know which objects are long-lived, and which are short, and who owns what. Then, we let these two models dictate how we handle object interaction within our program.

void do_something(std::string s);
void do_something_else(const std::string &s);
void do_something_to(std::string &s);
void do_something_again(std::string *s);
void do_something_pass_ownership(std::string &&s);

If you’re coming from a language like C#, or JavaScript these function declarations might look strange. Let’s be honest, to a rookie C++ developer the difference is subtle, and more often than not, the first pick is the default. Just because in other languages you don’t have to decorate your type when you’re passing a variable. In C++ though, those signatures all behave differently, and all have different meanings, and when you’re authoring an application they serve as self-documentation to allow clients of your class, to understand the functions affect on the ownership and lifetime model of your application.

void do_something(std::string s);

The first signature, says to the caller “you’ll have to pass me a copy of a valid object”, this implicitly means that the object is a) valid, and b) the callee can muck with it because it’s copy, and it has no effect on the original. Unfortunately, this is the go-to object passing method for beginner C++ developers, because it looks just like every other programming language’s method to pass arguments. What’s often misunderstood, is the cost associated with this operation. For small, plain-old-data type objects, it’s often faster to pass a copy rather than a reference, so it makes sense. But when we’re talking about full objects, larger than a couple of bytes, things start to change. With something like a string or vector, you’re paying to copy all the objects contained within. This means a heap allocation, and a copy. So you want to save passing by value for small objects, and when it’s intentional. Such as the case where you want to guarantee you’re receiving an object, you want to muck with it, but you don’t want to mess up the original.

void do_something_else(const std::string &s);

Next up, pass by const-ref. This should be the default argument passing semantic you reach for. This tells your caller “you have to pass me a valid object, and I won’t modify it”. If you start here as the base, you can start to change the signature based on your functions needs. If your need is that you just want to read the object being passed in, then this is great. The only ask from the callee, is that the reference is to a valid object. This is where defining your ownership & lifetime model will allow you to ensure what you’re passing will in-fact be referencing a valid object.

void do_something_to(std::string &s);

You want a valid object, and you want to mutate it. This is the signature for you. That’s exactly what you’re telling the caller, “pass me a valid object, and I’m going to mutate it”. Just like with const-ref, passing by reference says you want a guarantee of an object, but removing the const lets you muck with the object they’ve passed in. Historically, this has been used for out-params, where you basically fill the object being passed in. However, this is modern C++, if you need to get an object out, just return it.

void do_something_again(std::string *s);

Now, the nice thing about passing by reference is that you’ve got a guarantee you’re getting an object, where in some other languages you don’t have a method to guarantee that. So you’ve got to do null checks for defensive programming. Well, in C++ we can explicitly ask for that type of parameter passing. Pass by pointer (or const pointer). This tells your caller something a little different than the “by reference” options of above. This tells them “pass me a pointer to an object, or nothing (nullptr)”. The difference between pass by reference, and pass by pointer, is the ability to pass nullptr to the function. This comes in extremely handy when you want to indicate the beginning or end, or a special case with the nullptr. However, this comes with the expectation that we are able to handle both cases. So always always always do a nullptr check when passing by pointer. Though, it should be an indicator to change your argument passing semantic if you’re not handling this case in the first place.

void do_something_pass_ownership(std::string &&s);

This && is a bit weird, and to other languages that are reference counted such as C# and JavaScript, it really doesn’t have a parallel. But this is the r-value passing semantic, and it’s used to denote the passing of ownership. Arguments should be passed from the caller using std::move, this makes the transfer of ownership explicit, allowing you to maintain your ownership model.

Now, armed with RAII, a solid ownership and lifetime model, and proper passing semantics you can start to craft code that is Leak Free, by Design.

#include <memory>
#include <vector>

int make_handle();
void free_handle(int);

class widget_manager;

struct window_event {};

class widget
{

public:
    virtual ~widget()=default;
    virtual void dispatch(const window_event &e) = 0;

};

class widget_manager
{
public:
    widget_manager()=default;
    widget_manager(widget_manager &)=delete;

    void dispatch(const window_event &e)
    {
        for(auto &wptr : widgets_)
        {
            wptr->dispatch(e);
        }
    }
    void add_widget(std::unique_ptr<widget> ptr)
    {
        widgets_.emplace_back(std::move(ptr));
    }

private:
    std::vector<std::unique_ptr<widget>> widgets_;
};

class main_widget : public widget
{


public:

    main_widget(widget_manager &manager)
    :manager_(manager), handle_(make_handle())
    {
        
    }    
    ~main_widget() { free_handle(handle_); }

    virtual void dispatch(const window_event &e)
    {
        // do something main here

    }

private:
    widget_manager &manager_;
    int handle_; 
};


class widget_factory
{

public:
   std::unique_ptr<widget> make_main_widget(widget_manager &m)
   {
       return std::make_unique<main_widget>(m);
   }

};

class application
{

public:
    application(widget_manager &manager, widget_factory &factory)
    :manager_(manager), factory_(factory)
    {
            auto widget = factory_.make_main_widget(manager_);
            manager_.add_widget(std::move(widget));

    }  

    window_event get_window_event() 
    {
        // get a window event, if end return nullptr
        return window_event{};
    }
    void run()
    {
        while(1)
        {
            auto event = get_window_event();
            manager_.dispatch(event);   
        }
    }

private:
    widget_manager &manager_;
    widget_factory &factory_;

};

int main()
{
    widget_manager manager;
    widget_factory factory;
    application the_app(manager, factory);
}

The problem with contrived examples for complex problems, is that they never quite illustrate the point you’re trying to make. For me, at least. The point this code illustrates, is that you can come up with a model for long, and short lived objects. You can have long lived objects own shorter lived objects. Knowing this set of information can dictate how you want to pass these objects around. You want to use the appropriate smart pointers, i.e. unique_ptr for unique ownership, and shared_ptr for shared ownership, and always use RAII wrapper classes when you have sandwich code, in this example make_handle() and free_handle(int). If you make these rules front of mind when you’re programming in C++, you’ll be able to sleep a little easier knowing you’re not going to leak resources.

Now, if your application isn’t modern, or you’re dealing with legacy code. Well, then you should look to a tool like DeLeaker to help you find those leaks. Once you spot them, you can hopefully refactor your code pulling from some of the above guidelines to guide you into the Leak Free direction.

You can find DeLeaker here. It’s important to note that I’m not endorsing this product, nor was there any financial incentive, only a copy of DeLeaker itself. I just want to help out others in the community.

The other interesting topic that had come to mind when I was writing this post, would be how something like a leak detection software would work. If you’re interested in this, let me know. If you enjoyed reading this, and would like to read more, make sure you subscribe and share!

Until then, stay healthy and Happy Coding!

“Design and programming are human activities; forget that and all is lost.”
― Bjarne Stroustrup

References:
Guideline Standards Library
GSL Passing Parameters
Leak Freedom…. By Default – Herb Sutter
Resource Model – Bjarne Stroustrup, et al.

Unparalleled Adventure

Leave a comment Cancel reply

Leak Free, by Design

Share this:

Leave a comment Cancel reply