The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 1)


Have you ever really been interested in Microsoft’s C++ Unit Test Framework? I mean really interested? Like where you’d go to great lengths to figure out how it works. Well, I have… The story goes back a few years, maybe 3 or so.

At this point in my career I was deep into C++ development, and I had fallen in love with unit testing. I had fallen in love with Visual Studio’s Test Window, the ease it allowed me to get quick feedback about my development. My defacto standard was the built-in C++ Unit Test Framework. It was accessible. I didn’t need to install anything (past VS), and I didn’t need to download anything special. The project templates came built in, it was the easiest path to unit testing my work. I loved it. However, as with many things in life, there are always the ‘little things’ you have to learn to love.

My main gripe, was that if I wrote a function and a std::exception would escape, for whatever reason, the test would fail with the message “An unexpected C++ exception occurred”. Thanks Microsoft, for this useless information… I put up with it. I would use tactics like wrapping my calls in try/catch, I even wrote a new header that made an extension to the TEST_METHOD macro that would make a function level try/catch. It wasn’t enough for me, I could not believe that this wasn’t built in. For instance, if an Exception escapes in the C# test framework, you get the data about the Exception. This is a novel idea, so why doesn’t it work in C++? My second major stumbling block, was that if you didn’t have the right dependencies in the right directory. You would get an error, on all your tests, something along the lines of “Failed to setup execution context.” Also, a very very helpful error. This was the straw that broke the camels back. The amount of times I had run into this, the amount of times that junior developers scrambled to figure it out. It was too much. Something had to be done. Rather than divorce myself from Microsoft’s framework, and use something like boost::test, like so many had said I should do. I decided to do the sane thing, and just write my own. Not write my own, like re-invent the wheel. Just write my own test executor. I already had all the tests written, I didn’t want to redo that work in some new framework. I wanted to just build my own engine to run the tests I already had. My thought was that if someone at Microsoft could build it, so could I. They’re humans too — I think. Armed with only naive curiosity, my trusty Visual Studio, and the internet. I set out to do just that.

Where do I even start? Let’s just start at discovering the tests. How can we discover what tests are available in the binary? Well, if you’re familiar with the C# Unit Test framework, defining test classes and methods is done with attributes, similar to the C++ macros. My thought is that the C# Test Discoverer, must use reflection, look for the attributes, discovering the test classes and methods. I don’t know this for sure, but I would bet that is the case. Cool. Well, apart from some third party libraries, there’s no built in reflection in C++. So that can’t be the case for the C++ tests, can it? Maybe they load the assembly and have it tell the discoverer what tests are available? That’s what I would do if I engineered this.

Stop for a minute. Let’s reflect.

I said that my second problem with the framework, was that when you tried to run the tests and the dependencies couldn’t be loaded, you would get the error “Failed to load execution context”. Now — let’s think about this. If you’re able to see all the tests, yet the assembly can’t be loaded due to missing dependencies. How are we able to see what tests are in the binary? Magic! That’s how. Just kidding — I don’t believe in magic. It means that they’re not “loading” the library, which means that information about the tests, lives somewhere in metadata in the binary… Reflection… Could it be???

Well, the magic was right there in front of us the whole time, if you’re using the framework. The magic lies in the ‘ CppUnitTest.h’ header file. It took me a few beers, and a few hours to figure out just exactly WTF they were doing in there. It was essentially like trying to decipher Cuneiform .

If you’re unfamiliar, a typical TEST_CLASS and TEST_METHOD(s) looks like this.

#include "CppUnitTest.h"

TEST_CLASS(DummyClass)
{
    TEST_METHOD(DummyAssert)
    {
        /// My Code Under Test Here
    }
};

If you build and discover this, you’ll end up with a test class named DummyClass and a test in your Test Window, that says DummyAssert. So the magic lives in that TEST_METHOD macro. We will ignore the TEST_CLASS for now. Let’s look at TEST_METHOD. This is the macro, pulled directly from ‘CppUnitTest.h’

///////////////////////////////////////////////////////////////////////////////////////////
//Macro for creating test methods.
#define TEST_METHOD(methodName)\
    static const EXPORT_METHOD ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo* CALLING_CONVENTION CATNAME(__GetTestMethodInfo_, methodName)()\
    {\
        __GetTestClassInfo();\
        __GetTestVersion();\
        ALLOCATE_TESTDATA_SECTION_METHOD\
        static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};\
\
        static ::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo s_Info = {::Microsoft::VisualStudio::CppUnitTestFramework::MemberMethodInfo::TestMethod, NULL, &s_Metadata};\
        s_Info.method.pVoidMethod = static_cast<::Microsoft::VisualStudio::CppUnitTestFramework::TestClassImpl::__voidFunc>(&methodName);\
        return &s_Info;\
    }\
    void methodName()

Okay — so humour me and let’s ignore the __GetTestClassInfo(); and __GetTestVersion(); calls and look to the line ALLOCATE_TESTDATA_SECTION_METHOD, which if we scan a little higher in the file is found here.

///////////////////////////////////////////////////////////////////////////////////////////
//Macros for creating sections in the binary file.
#pragma section("testvers$", read, shared)
#pragma section("testdata$_A_class", read, shared)
#pragma section("testdata$_B_method", read, shared)
#pragma section("testdata$_C_attribute", read, shared)

#define ALLOCATE_TESTDATA_SECTION_VERSION __declspec(allocate("testvers$"))
#define ALLOCATE_TESTDATA_SECTION_CLASS __declspec(allocate("testdata$_A_class"))
#define ALLOCATE_TESTDATA_SECTION_METHOD __declspec(allocate("testdata$_B_method"))
#define ALLOCATE_TESTDATA_SECTION_ATTRIBUTE __declspec(allocate("testdata$_C_attribute"))

But what does it all mean Basil? Well, without diving into too much history, we need to at least know about Windows’ binary formats. If you didn’t already know, every form of executable “binary” on the Windows platform is in the format of a Portable Executable (which was an extension of the COFF format). This is what allows the operating system to load and run executables, dynamic libraries, etc. It’s a well defined format, see the Wiki link above if you don’t believe me. A PE file looks like this.

Portable_Executable_32_bit_Structure_in_SVG_fixed

I’m not going to explain everything in this image, only the relevant information. If you look down just passed the DOS STUB, on the very right (your right) you’ll see a 2 byte number called #NumberOfSections, this tells us the count of sections in the binary. That’s something we care about. I know this, because I know they’ve made sections where the data lives. I know this, because of the

#pragma section("testdata$_B_method", read, shared)

and the

#define ALLOCATE_TESTDATA_SECTION_METHOD__declspec(allocate("testdata$_B_method"))

Then, if you look at the bottom, you’ll see the ‘Section Table’. It means, that from the COFF Header, the offset of the Optional Header, there lives N sections in the Sections Table. In there, we will find our “testdata$_B_method” section, and in there, we will find SOMETHING! Are you bored yet? Because when I got this far, you couldn’t pull me away. I was like a 13 year old watching my first R rated movie. What did they store in there? What was it used for? The only thing I could do, is dive a little deeper. My best bet, was that these MethodMetadata were stored in that section.

ALLOCATE_TESTDATA_SECTION_METHOD\
static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};

It would be a block of data, that would contain a bunch of strings. The first being a wide character string, “TestMethodInfo”, the next a wide character string of the ‘methodName’ defined in the macro, the next a character string of the __FUNCTION__, next the string of __FUNCDNAME__, a wide character string of the filename __WFILE__ , and lastly the __LINE__. (If you’re interested in a list of Predefined Macros there you go.)

This was my assumption, but I couldn’t know for sure unless I saw it with my own two eyes. But how do I do that? Well there are a few third-party tools that will dump the PE (I’ll let you figure out what to search…), but I needed to write my own tool anyways so I just jumped in feet first. A few quick Bing searches (just kidding I used Google), and I found out I needed to open the binary as a flat file, and then map that file into memory. From there, I could get a pointer to the start of the file and use some macros, structures and functions in Windows.h to move about this file.  The “pseudo” algorithm is as follows

1) Open the binary as a flat file
2) Map the binary into memory
3) Obtain a view of the map (a pointer to the map)
4) Navigate the PE to understand file type, and number of sections
5) Iterate through each section definition until we find the correct section
6) Using the mapping offset, along with the section definition, find our data

There we go, simple as that. Let’s try it. The memory when we do that, and we point to the section table, looks like this.

0x07440210  2e 74 65 78 74 62 73 73 00 00 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 00 00 e0 2e 74 65 78 74 00 00 00 82 15 02 00 00 10 01 00 00 16 02 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 60 2e 72  .textbss............................ ..à.text............................... ..`.r
0x07440262  64 61 74 61 00 00 c3 bd 00 00 00 30 03 00 00 be 00 00 00 1a 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 64 61 74 61 00 00 00 cc 0e 00 00 00 f0 03 00 00 0c 00 00 00 d8 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 c0 2e 69 64 61  data..Ã....0......................@..@.data...Ì....ð.......Ø..............@..À.ida
0x074402B4  74 61 00 00 86 29 00 00 00 00 04 00 00 2a 00 00 00 e4 02 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 6d 73 76 63 6a 6d 63 79 01 00 00 00 30 04 00 00 02 00 00 00 0e 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 c0 74 65 73 74 64 61  ta...).......*...ä..............@..@.msvcjmcy....0......................@..Àtestda
0x07440306  74 61 49 06 00 00 00 40 04 00 00 08 00 00 00 10 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 50 74 65 73 74 76 65 72 73 09 01 00 00 00 50 04 00 00 02 00 00 00 18 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 50 2e 30 30 63 66 67 00 00  taI....@......................@..Ptestvers.....P......................@..P.00cfg..
0x07440358  04 01 00 00 00 60 04 00 00 02 00 00 00 1a 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 72 73 72 63 00 00 00 3c 04 00 00 00 70 04 00 00 06 00 00 00 1c 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 40 2e 72 65 6c 6f 63 00 00 69 1a  .....`......................@..@.rsrc...<....p......................@..@.reloc..i.
0x074403AA  00 00 00 80 04 00 00 1c 00 00 00 22 03 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ...€......."..............@..B....................................................
0x074403FC  00 00 00 00 cc cc cc cc cc e9 86 36 01 00 e9 41 78 01 00 e9 9c 33 00 00 e9 b7 93 01 00 e9 b3 73 01 00 e9 7d b4 00 00 e9 68 a8 00 00 e9 13 62 00 00 e9 7e 76 00 00 e9 09 6f 01 00 e9 a4 59 00 00 e9 8f e4 00 00 e9 aa 9d 01 00 e9 98 73 01 00 e9 ce ab

So what gives??? I don’t see any section called “testdata$_B_method”.  I can however see a ‘testdata’ section. At this point no amount of research other than this anecdotal evidence, leads me to believe the ‘$’ is some kind of delimiter on the section name. I guess we have to assume. We assume that the “testdata” section will contain our test method metadata. The problem is now, there are other things that sit in this section. There’s class, method, and attribute metadata. So, if it’s all lined up in a single section, how do we decipher what is what? Meaning, if we’re just trying to use pointers to walk around, how will we ever know what type we’re pointing to?

Did anything strike you as odd about the MethodMetadata structure? Maybe, if I show you the structure definitions of all the metadata objects, you might see something.

struct ClassMetadata
{
    const wchar_t *tag;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
};

struct MethodMetadata
{
    const wchar_t *tag;
    const wchar_t *methodName;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
    const wchar_t *sourceFile;
    int lineNo;
};

struct ModuleAttributeMetadata
{
    enum AttributeType { MODULE_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const wchar_t *attributeValue;
    AttributeType type;
};

struct ClassAttributeMetadata
{
    enum AttributeType { CLASS_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const void *attributeValue;
    AttributeType type;
};

struct MethodAttributeMetadata
{
    enum AttributeType { METHOD_ATTRIBUTE };
    const wchar_t *tag;
    const wchar_t *attributeName;
    const void *attributeValue;
    AttributeType type;
};

Huh. If you look carefully, the first actual member of these structures is a wchar_t* called tag. Then if we go to our use of it.

static const ::Microsoft::VisualStudio::CppUnitTestFramework::MethodMetadata s_Metadata = {L"TestMethodInfo", L#methodName, reinterpret_cast<const unsigned char*>(__FUNCTION__), reinterpret_cast<const unsigned char*>(__FUNCDNAME__), __WFILE__, __LINE__};

You might notice, that there’s a L”TestMethodInfo” set as the tag, so one could deduce, dear Watson that that is how we can decipher our different metadata components. By their tag! Let’s readjust our rudders, and fly! If we get our ‘testdata’ section, then move to the area with the data, we should see a bunch of nicely spaced wide strings in memory, right? Wrong!

0x07471120  94 43 03 10 b8 43 03 10 d8 43 03 10 40 44 03 10 f8 44 03 10 67 00 00 00 00 00 00 00 94 43 03 10 6c 46 03 10 98 46  ”C..¸C..ØC..@D..øD..g.......”C..lF..˜F
0x07471146  03 10 08 47 03 10 f8 44 03 10 78 00 00 00 00 00 00 00 94 43 03 10 cc 47 03 10 08 48 03 10 80 48 03 10 f8 44 03 10  ...G..øD..x.......”C..ÌG...H..€H..øD..
0x0747116C  7e 00 00 00 00 00 00 00 94 43 03 10 28 4b 03 10 68 4b 03 10 e0 4b 03 10 f8 44 03 10 8b 00 00 00 00 00 00 00 94 43  ~.......”C..(K..hK..àK..øD..........”C
0x07471192  03 10 c8 4c 03 10 08 4d 03 10 80 4d 03 10 f8 44 03 10 95 00 00 00 00 00 00 00 94 43 03 10 4c 4e 03 10 78 4e 03 10  ..ÈL...M..€M..øD..........”C..LN..xN..
0x074711B8  e8 4e 03 10 f8 44 03 10 9d 00 00 00 00 00 00 00 94 43 03 10 dc 4f 03 10 20 50 03 10 a0 50 03 10 f8 44 03 10 a7 00  èN..øD..........”C..ÜO.. P.. P..øD..§.
0x074711DE  00 00 00 00 00 00 94 43 03 10 70 51 03 10 98 51 03 10 08 52 03 10 f8 44 03 10 af 00 00 00 00 00 00 00 94 43 03 10  ......”C..pQ..˜Q...R..øD..¯.......”C..
0x07471204  78 53 03 10 b0 53 03 10 28 54 03 10 f8 44 03 10 bb 00 00 00 00 00 00 00 94 43 03 10 0c 55 03 10 48 55 03 10 c0 55  xS..°S..(T..øD..».......”C...U..HU..ÀU
0x0747122A  03 10 f8 44 03 10 c3 00 00 00 00 00 00 00 94 43 03 10 dc 56 03 10 08 57 03 10 78 57 03 10 f8 44 03 10 cc 00 00 00  ..øD..Ã.......”C..ÜV...W..xW..øD..Ì...
0x07471250  00 00 00 00 94 43 03 10 6c 58 03 10 b8 58 03 10 38 59 03 10 f8 44 03 10 d2 00 00 00 00 00 00 00 94 43 03 10 30 5a  ....”C..lX..¸X..8Y..øD..Ò.......”C..0Z
0x07471276  03 10 60 5a 03 10 d0 5a 03 10 f8 44 03 10 de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ..`Z..ÐZ..øD..Þ
0x0747110A  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

What the hell? I was 100% expecting there to be a string sitting pretty right there in my section. Back to the drawing board I guess… Let’s recap, the first member of that struct is a wchar_t *. Huh. The first member of the struct, is a wchar_t *. What does that even mean? It means, that it is a ‘pointer to a wchar_t’. A ‘pointer’ to a wchar_t! Oh right! A pointer! I remember from school that a pointer is just an address! So where we were expecting there to be some text sitting pretty, was garbage, Or so we thought. Wrong again. It is a POINTER! That means sitting in that location is an address! An address to where though? It should be an address to my string, but where on Earth (quite literally) would that address be? It has to be somewhere that is constant, right?

If we study the sections of a PE fie, there’s a section called ‘.rdata’. Microsoft defines this section as “Read-only initialized data”. Thinking back a moment these are magic strings, (heaven forbid), aka ‘static strings’, static is read-only. If I was to hazard a guess, it probably means that they live somewhere in that section, because the compiler has to put magic string somewhere… So that garbage number, is probably a pointer to a string somewhere in that ‘.rdata’ section. So, if we take that address, and adjust it for where the section data lies, we can find the “TestMethodInfo” string.

0x07462D94  54 00 65 00 73 00 74 00 4d 00 65 00 74 00 68 00 6f 00 64 00 49 00 6e 00 66 00 6f 00 00 00 00 00 00 00 00 00 44 00  T.e.s.t.M.e.t.h.o.d.I.n.f.o.........D.
0x07462DBA  75 00 6d 00 6d 00 79 00 41 00 73 00 73 00 65 00 72 00 74 00 00 00 00 00 00 00 00 00 00 00 43 50 50 55 6e 69 74 54  u.m.m.y.A.s.s.e.r.t...........CPPUnitT
0x07462DE0  65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 3a 3a 6e 65 73 74 65 64 3a 3a 44 75 6d 6d 79 43 6c 61 73  estInvestigatorTest::nested::DummyClas
0x07462E06  73 3a 3a 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e 66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 00 00 00 00  s::__GetTestMethodInfo_DummyAssert....
0x07462E2C  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3f 5f 5f 47 65 74 54 65 73 74 4d 65 74 68 6f 64 49 6e  ....................?__GetTestMethodIn
0x07462E52  66 6f 5f 44 75 6d 6d 79 41 73 73 65 72 74 40 44 75 6d 6d 79 43 6c 61 73 73 40 6e 65 73 74 65 64 40 43 50 50 55 6e  fo_DummyAssert@DummyClass@nested@CPPUn
0x07462E78  69 74 54 65 73 74 49 6e 76 65 73 74 69 67 61 74 6f 72 54 65 73 74 40 40 53 47 50 42 55 4d 65 6d 62 65 72 4d 65 74  itTestInvestigatorTest@@SGPBUMemberMet
0x07462E9E  68 6f 64 49 6e 66 6f 40 43 70 70 55 6e 69 74 54 65 73 74 46 72 61 6d 65 77 6f 72 6b 40 56 69 73 75 61 6c 53 74 75  hodInfo@CppUnitTestFramework@VisualStu
0x07462EC4  64 69 6f 40 4d 69 63 72 6f 73 6f 66 74 40 40 58 5a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  dio@Microsoft@@XZ

Voila! At last we finally found it. We found a ‘TestMethodInfo’ (if you’re wondering why it’s T.e.s.t.M.e.t.h.o.d.I.n.f.o…, it’s because it’s a wide character string so each character is 2 bytes. Unicode amirite?).

To recap, we’ve loaded the DLL into memory, mapped it and walked to the section. We taken the tag pointer, adjusted the address to find it in the ‘.rdata’ section, now we know that we’re looking at the MethodMetadata structure. So, we can take the original pointer, and cast that to a MethodMetadata object.

struct MethodMetadata
{
    const wchar_t *tag;
    const wchar_t *methodName;
    const unsigned char *helpMethodName;
    const unsigned char *helpMethodDecoratedName;
    const wchar_t *sourceFile;
    int lineNo;
};

Then, for each of the other members, which are pointers to strings in the .rdata section we can just adjust and capture the same way we did the tag. In fact, the compiler did us a favour, and laid them out so you can see them in the memory dump above. Next, we just advance our section pointer the distance of the size of a MethodMetadata, and we are able to get the next test name!!! (This is mostly true, I’ve glossed over some details of the other things that can be in the section)

I really hope you’re thinking “Damn! That was so much fun!” Because I definitely was! You can see now, that the steps to tie this into a test adapter aren’t too far away. I won’t get into that until a later post, but as for this post, we’ve uncovered how to discover what tests lie in a Microsoft C++ Unit Test Framework created test DLL. Wasn’t that just a blast digging into that?

I hope you will join me for the next part, where we figure out how to actually load and execute the tests from this metadata.

If you’re interested in seeing this code, my cobbled together project lives here. I apologize in advance for the code. I had a really bad habit of putting as many classes as I could in a single file. I don’t do this anymore. It was out of laziness.  The PeUtils.h/.cpp and CppUnitTestInvestigator.h/.cpp, in the CppUnitTestInvestigator project have the code for loading the DLL metadata.

“It’s still magic, even if you know how it’s done” — Terry Pratchett

Happy Coding!

 

** As a disclaimer, I don’t own any of the code snips above, they are directly pulled from the VsCppUnit C++ Unit Testing Framework, which is Copyright (C) Microsoft Corporation **

 

References:
PE Format
Section Specifications
DUMPBIN

6 thoughts on “The one where we reverse engineered Microsoft’s C++ Unit Test Framework (Part 1)

  1. For section names (such as specified in “#pragma section”), the character(s) after the dollar sign are an indexing/ordering tool. While this is documented well enough, it has confusingly low discoverability; neither the “#pragma section” pragma documentation nor the “/SECTION” link option documentation actually mention this, even though they’re the ones you’d most expect to at least give a passing mention.

    It is, however, documented in the PE Format documentation, as the section “Grouped Sections (Object Only)” ( https://docs.microsoft.com/en-us/windows/win32/Debug/pe-format#grouped-sections-object-only ). And, weirdly enough, also mentioned in a comment in one of the code examples on the “#pragma init_seg” pragma documentation here: https://docs.microsoft.com/en-us/cpp/preprocessor/init-seg?view=vs-2017

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: