In Defense of The Anemic Data Model

According to Martin Fowler, who coined the term, this pattern is an ‘Anti-Pattern’, and it’s “contrary to the basic idea of Object Oriented design”. I’m not an Engineer (although I’m sure my parents wish I was), but engineering is about picking the best approach to solve the problem. The fact of the matter is, depending on the problem at hand, sometimes an ‘Anti-Pattern’ fits. This particular one, actually does serve some purpose, and I think the benefits outweigh the costs. So come with me for a moment, as I defend this pattern.

Firstly, a major argument against this architecture is that it “violates encapsulation”. It depends on your definition of the word ‘encapsulation’. If we refer to Wikipedia, we get two different definitions:

  • A language mechanism for restricting direct access to some of the object‘s components.[3][4]
  • A language construct that facilitates the bundling of data with the methods (or other functions) operating on that data.[5][6]

Some see the concept as either or, some see it as both. The reality is I could argue either. In fact, to me (and I borrowed this from Uncle Bob’s Clean Architecture) C has a very good mechanism for 1) bundling data and function 2) hiding direct access to functions of the object. That mechanism is the header / definition file pair. You expose everything you want the client to see via the header, and keep anything private in the compiled definition. Languages like C# and JAVA (shudder), expose ALL the functionality in one file, exposing to the client ALL the implementation details of the class. Isn’t that a violation of ‘encapsulation’? To me, having a class that stores data, and a class that stores function, then having a module that exposes them together. Is in fact, better encapsulation.

Another massive benefit of this pattern, is the ‘Separation of Concerns’. If you’re keeping all logic within a class, say you want to keep business logic, persistence logic, and presentation logic in the same class. You’ve now just created a coupling between your storage, your business logic, and your presentation layer. Trust me when I tell you, I’ve been to this Hell, and it isn’t a picnic. It’s an unmaintainable nightmare, the kind that Freddy himself couldn’t engineer. You can fight to use language features like partial classes to manage the distinction, but it only helps a little. You might argue that the idea is to only keep business logic with the class. Separate all else. Well what happens when the line between presentation and business logic become fuzzy? Well, people start coupling, and down the rabbit hole we go. This is illustrated in this statement by Fowler himself “The logic that should be in a domain object is domain logic – validations, calculations, business rules – whatever you like to call it. (There are cases when you make an argument for putting data source or presentation logic in a domain object, but that’s orthogonal to my view of anemia.)” [1] It isn’t orthogonal, in fact, it is the problem. You might be able to make the argument, but someone else might not. So now, you’ve got someone with a little less experience, and a little less insight, who sees this and replicates it, for the problem they’re solving. Next thing you know, you’re wondering how your UI layer now needs your Database layer to compile. Oh! What about code reviews, and proper discipline. We ALL know how software development works, you’ll spend hours in review, debating why this little piece of persistence logic fits, then why this little piece of UI logic fits. If you follow a clear cut pattern, don’t mix your logic with your data, you don’t have this issue, and it’s easier to catch in a code review, because there isn’t room for debate.

You can implement a very nice service layer, which works on your data model, AND use OO techniques. It is possible.

This pattern, keeping your data separate from your logic, finds massive benefit when you’re working with things like RESTful APIs or really any form of serialization / deserialization to POD data streams. This is due to the fact that serialization of function is difficult at best. Rehydration of complex type hierarchies isn’t child’s play, and doesn’t lend itself nicely to simple interfaces (see ODATA as an example). So you want the objects you pass back and forth to be light and airy. These objects are often referred to as DTO’s or ‘Data Transfer Objects’. In this pattern, you pass them between services, that can persist the object, do logic on the object, or display the object. You might decorate them to add functionality, but at the core, the object stands as data alone.

I’m not saying this ‘Anti-Pattern’ is a silver bullet, because the only silver bullet I believe in is Coors Light. I am however saying, if you’re running into the trap of a coupled nightmare, where one too many people made the argument that presentation and data source logic should be in your model, you might want to see if this architecture helps.

Happy Coding!

“There is nothing either good or bad but thinking makes it so.” – William Shakespeare

The Adventures of Dependency Injection and Inversion of Control.

I like to think my first words were ‘std::cout << “Hello World” << std::endl;’. The canonical C++ Hello World program. But alas, they weren’t. I cut my teeth on QBasic (Quick Beginners All purpose Symbolic Instruction Code, not to be confused with QuickBASIC). I did my time in QBasic, writing ‘viruses’ that would show download progress bars, and print ‘Ha Ha’ infinitely (sorry Dan).

Oh look, I’ve digressed; Back on topic — My first C++ experience was in CMPUT101. That’s when I fell in-love with the language. It was the :: that did it. Also, I always liked the simplicity of the language (Say what?). Anyways, as a young (and full of vigor, I might add) C++ software developer fresh out of school, I often found my code tangled and tightly coupled. Class design in C++ is hard. For me, often my class designs consisted of a lot of value members, and a lot of  concerns fused together.

A value member, is a instance of a class (or POD) as a member variable, of another class. In C++, value members are laid out in memory as they’re declared in the class.

class foo
{
public:
   int A;
   int B;
};

class bar
{
public:
   foo fooA;
   foo fooB;
};

If sizeof(int) == 4, then sizeof(foo) == 8 and sizeof(bar) == 16; Hopefully this makes sense.

[I’m getting there!]

So the thing about value members and call by value in C++, is that there is no polymorphism. You only get polymorphism, or virtual function calls, when you call on a reference or a pointer. Which means, when your class is full of value members, you’re EXTREMELY tightly coupled to the implementation of those members.

Here’s an example to illustrate what I’m talking about.

class file_to_database
{
private:
    file file_;
    database database_;
public:
    file_to_database(const std::string &filename, 
                    const std::string &connect_s)
    {
    }

    void parse()
    {
         std::string line;
         while(file_.read_line(line))
         {
              std::string tablename;
              std::string v1, v2;
              // do some parse work
              database_.insert(tablename, v1, v2);
         }
    }
};

int main(int argc, const char** argv)
{
    file_to_database ftdb(argv[0], argv[1]);
    ftdb.parse();
}

As you can see. We’re very very coupled to both the implementation of the file and database.

When you start delivering code for realz, and have to debug in the field. That’s about the time you remember back to school, and this thing called a ‘Unit Test’. Then it dawns on you, *that might not have just been a ploy to get me to do more homework, but actually useful!!!* So you think to yourself, *Self — I should probably unit test my code*. So you trudge down that road. Hi! Ho! Hi! Ho!

There’s only one problem with that. Everything in your class is a value member. Meaning no polymorphism. Meaning you’re stuck with the implementation you wrote, coupled to a database and all. *Shit.*

I headed on a path of discovery. I was going to solve this problem. How can I make my classes more testable and less coupled to implementation details?

If I boil down my research (and years of experience after). It comes down to 3 rules.

  1. Favour composition over inheritance – inheritance is the tightest coupling you can get. Unless you need it, don’t use it.
  2. Develop to an interface – use well designed interfaces to separate implementation from algorithms.
  3. Use Dependency Injection – dun. dun. dun.

Let’s see where I went wrong.

  1. Check. My classes were generally composed of other types, as value members.
  2. No polymorphism, no interfaces.
  3. What is this even??

Now, at this point I was part of the all mighty C++ development crew!! The language coursed through my veins. Oooh Rah. I had a major hate-on for Enterprise Developers and any type of managed languages. (I still hate JAVA, so if you’re a JAVA developer stop reading right now and go learn a real programming language. Just kidding! 🙂 No but seriously, my language compiles your language.) However, I had heard about this thing called ‘Dependency Injection’ from the Enterprise team at work. I also had read about it on the internet. I took a look at some ‘Dependency Injection’ code examples on the internet. Uhhh. Okay Mr.C-Sharp, where the heck does anything begin, who owns what? Wow. I’m going to stay as far from ‘Dependency Injection’ as I can.

By this time, I had solved my unit testing woes. I started using reference members to pure virtual classes (interfaces). Lastly I started taking all the implementations of the classes in my constructors. This allowed me to write tests against my classes, make sure they functioned correctly using mock implementations. Then in production, supply the real-world implementation. I could now mock out my databases and files, supply those in the constructor and test my classes. This also, really simplified my ownership models. I won. Check and mate, Mr. C-Sharp. No ‘Dependency Injection’ here.

  1. Check.
  2. Check.
  3. Hell no. Who needs that? I’ll control where things are created, destroyed and owned. Thank you very much.

It didn’t matter, because I was now cooking with dynamite, I could unit test! Hizuh! I don’t need your silly ‘Dependency Injection’ where I have no idea who made what, when or who owns anything.

Here’s the example above, re-imagined to suit my testing needs.

class ifile
{
public:
   virtual ~ifile() = default;
   virtual bool read_line(std::string &line)=0;
};

class idatabase
{
public:
   virtual ~idatabase() = default;
   virtual void insert(const std::string &tablename,
                       const std::string &value1,
                       const std::string &value2)=0;
};

class file_to_database
{
private:
     ifile &file_;
     idatabase &database_;
public:
     file_to_database(ifile &file, idatabase &database)
     :file_(file), database_(database)
     {
     }

     void parse()
     {
         std::string line;
         while(file_.read_line(line))
         {
              std::string tablename;
              std::string v1, v2;
              // do some parse work
              database_.insert(tablename, v1, v2);
         }
     }
};
class file : public ifile
{
public:
    file(const std::string &filename)
    {
    }
    virtual bool read_line(std::string &line) override
    {
        // implementation
    }
};

class database : public idatabase
{
public:
   database(const std::string &connect_s)
   {
   }
   virtual void insert(const std::string &tablename,
                       const std::string &value1,
                       const std::string &value2)
   {
        // implement
   }
};

int main(int argc, const char** argv)
{
    file f(argv[0]);
    database(argv[1]);

    file_to_database ftdb(file, database);
    ftdb.parse();
}

As you can see, this refactor takes the parsing code from near impossible to test without a file and a database. To extremely easy to test, by mocking out our inputs (ifile) and our outputs (idatabase). The awesome thing, I didn’t even touch the parsing algorithm. Just a couple smart interfaces, and introduce the dependencies in the constructor. Bingo. Bango.

So — a little while later, I’m attending CPPCON and I’m having a couple wobblies with a friend. He’s a C++ Guru and did some time at a C# shop. We got onto this topic. I tell him “Man, have you heard of this ‘Dependency Injection’ thing? What a nightmare. How do you even know when things are constructed? Who even owns what?”

He pauses. Then he delivers a mind-bottling statement. “Dependency Injection IS NOT Inversion of Control.” I said, “What?!?” He pauses again. “Dependency Injection IS NOT Inversion of Control. Dependency Injection is simple — just make your classes take what they depend on in the constructor…. When you use interfaces it makes it so you can replace the implementation of the dependencies, without modifying the class.” I think I almost fell off my stool (and it wasn’t the amount of wobblies I had). “No. No. No. Dependency Injection is this crazy-ass thing that obfuscates where objects get created and requires that everything on the planet be an interface, even simple things.” I cried.

He smiled, took a sip of his water — “You’re wrong. That’s an Inversion of Control container.” I take a sip of my beer. “Well. What’s an Inversion of Control container?”. He looked at me smugly, “What you just described. That is.”

It was this moment, when I learned the difference between ‘Dependency Injection’ and ‘Inversion of Control’. Dependency Injection is that simple, make your classes take their dependencies in the constructor. If those dependencies happen to be interfaces, you can swap out implementations as needed and your class won’t change. This is sometimes referred to as the ‘Hollywood Principle’ — Don’t call us. We’ll call you.

Inversion of Control, is succeeding control of object creation, and ownership to a framework. It utilizes ‘Dependency Injection’ in order to build up objects in a hierarchical fashion.  In my opinion, these were frameworks created because some developer had nothing better to do with his/her time. Enterprise developers, amirite!?! 😀 But in all seriousness, they really obfuscate ownership and creation. Which is my opinion are fundamental to software design. This is done in the name of modularity, so I understand. However, I’m not a fan. They do have their usage in large applications that need extremely customizable and modular foundations. Although, I’d argue these are few and far between.

Lastly, neither of these are to be confused with ‘Dependency Inversion’. Sometimes, people will confuse ‘Inversion of Control’ and ‘Dependency Injection’ and call it by some ugly step child name of ‘Dependency Inversion’. ‘Inversion of Control’, uses ‘Dependency Inversion’, so does ‘Dependency Injection’. Essentially, ‘Dependency Inversion’ is ensuring that your higher level components, don’t depend on your lower level components. The term though, is generally used to refer to architecture design, at the module level. By making both high and low level modules depend on a shared interface i.e. one caller and one implementer, the direct dependency is now ‘inverted’.

In summary, ‘Dependency Injection’ is the act of taking all class dependencies via the constructor. ‘Inversion of Control’ also called ‘Inversion of Sanity’, is the act of allowing a framework to control all of the important decisions of your program, and you pretending to control it via XML. ‘Dependency Inversion’ is neither of these, but an architectural pattern that helps to decouple modules within a system, by having high and low level components work through interfaces.

Happy Coding!

“Be yourself, because everyone else is already taken.” — Oscar Wilde

 

The Wedding Cake Architecture

First, let me start by stating that I’m not claiming to have invented this. Not even close. I ‘borrowed’ the idea, from “Uncle Bob”. Interestingly enough, I stumbled upon it (kinda), before I ever read about it. Here’s the story.

I had had my doubts about frameworks and ORMs like Entity Framework for some time. What struck me as odd was a lot of the examples show direct usage of the table classes (classes that represent directly the data in the tables) at the API or UI level. Now, I understand the need for this in a blog post or an example for simplicity. What ends up happening though, is this becomes the foundation for reality.  Then once you’ve exposed classes that directly represent your table structure to an API, be it REST or WCF, you’ve unintentionally (or intentionally) coupled your API or UI to the database.  This means that often if you want to make a change to your table structure or persistence medium, you better be willing to shell out the time to fix the broken API clients or UI.

Armed with that thought, I went out to create an architecture that didn’t have this problem. I was dead-set on architecting a solution that ensured decoupling of the database and the clients of the API layer. I was going to develop an API that had a fluent business interface, and ensured that the persistence happened behind the curtains of the API.

The original design principle was dead simple. ‘Keep the persistence out of the API’. So there was a Business Logic Layer (BLL) which exposed a consumable API and housed the business and translation logic. Then a Data Access Layer (DAL) which housed the logic of storing the table classes. Classic separation of concerns. Classic.

OriginalArch

I went about my business of implementation. Then next thing I know!? This architecture worked, it performed, it was unit testable (mostly) and it was easy to understand! HooRay! Kinda.

To exemplify what I mean, here’s a canonical example of a bank transaction.

// Exposed via BLL DLL
public enum EntryTypes { Debit, Credit }

public class Transaction
{
    public Guid Id { get; }
    public Guid Account { get; }
    public DateTimeOffset Time { get; }
    public EntryTypes Type { get; }
    public double Value { get; }
}

public interface ITransactionManager
{
    Transaction Deposit(Guid account, double value);
    Transaction Withdraw(Guid account, double value);
}

public class TransactionManager : ITransactionManager
{
    IPersistence Persistence { get; }

   public Transaction Deposit(Guid account, double value)
   {
       // create a new transaction
       var transaction = new Transaction();
    
       // ...
       // translate the transaction to a transaction entry
    
       var entry = new TransactionEntry();
       // ... copy the fields
    
       Persistence.CreateTransactionEntry(entry);
       return transaction;
    }

    public Transaction Withdraw(Guid account, double value)
    {
       var account = Persistence.ReadAccountEntry(account);
       if( account.Balance < value )
            throw new InsufficientFundsException();
       // create a new transaction
       var transaction = new Transaction();
       // ...
       // translate the transaction to a transaction entry
       var entry = new TransactionEntry();
       // ... copy the fields
       Persistence.CreateTransactionEntry(entry);
       return transaction;
     }
}

// Exposed via DAL DLL
[Table("Transactions")]
public class TransactionEntry
{
    public Guid Id { get; }
    public Guid Account { get; }
    public DateTimeOffset Time { get; }
    public EntryTypes Type { get; }
    public double Value { get; }
}

[Table("Accounts")]
public class AccountEntry
{
    public Guid Id { get; set; }
    public string Owner { get; set; }
    public double Balance { get; set; }
}

public interface IPersistence
{
    void CreateTransactionEntry(TransactionEntry entry);
    AccountEntry ReadAccount(Guid accountId);
}

public Database : IPersistence
{
    public void CreateTransactionEntry(TransactionEntry entry)
    {
        // Code to store the table entry into a database.
    }
    // ...
}

Now, this example is contrived and won’t compile in any language but it illustrates my point. You can see from the client API, there is no mention of persistence. This means that the clients of the API don’t need to deal in classes that directly represent your tables (or know about them). I’ve intentionally let the two classes have identical fields, because this is just the reality for now. It leaves us the ability to change it in the future. If we ever need to change a table data type, add a column or change where the values get stored.

But there’s a problem with this architecture, it’s subtle (at least it was for me, experienced people are probably screaming ‘YOU IDIOT’) but it’s there. The fact that the BLL even knows about the table classes is a problem. There are a few things wrong:

  1. I used Entity Framework in this example. So I’ve basically just enforced my business logic to use Entity Framework. This isn’t really a bad thing, Entity is widely used, supported by Microsoft and easy to use (IMO). However, if you ever wanted to depart from Entity Framework, you’re going to have a bad time.
  2. I’ve coupled the Business Logic to the table columns (yet again). In this example, if you ever wanted to have different storage implementations, they better represent the data the same way across the implementations (or build a translation shim).
  3. I broke the Single Responsibility Principle on my BLL. Translating objects from API objects to database objects isn’t Business Logic so it’s got no ‘business’ being there.

Now, the saving grace of this architecture is that I did one thing right. I kept those details, the persistence details, away from the clients of the API. So even if I had released the API and had clients consuming, I wouldn’t break clients fixing my mistake.

As the architecture began to grow this oversight became more apparent. Which is always the case. I started to realize that things like porting to a different storage medium, or changing ORMs would pose a huge undertaking.

Another problem, since the only downstream interface coming out of the BLL utilized the table classes, all my tests were directly coupled to the table entries. I had a huge problem.

At this point I had already started reading “Clean Architecture” by Robert ‘Uncle Bob’ Martin. I started to understand where I had gone wrong, and those three issues noted above started to become clear, and so did a solution — Clean Architecture. Like I said, I stumbled across this architecture. More like tripped, fell, and face planted right into it.

Uncle Bob talks about a 4 ringed solution. In the centre ring, we have ‘Entities’ these are your ‘Enterprise Business Rules’. In the next ring, you have ‘Use Cases’ these are your ‘Application Business Rules’. Next, ‘Interface Adapters’ are your controllers and presenters. Then finally, ‘Frameworks and Drivers’ the external interfaces. He talks about never letting an inside ring, know about things in an outside ring. Can you see now my blunder?

Okay — so what would the point of this blog post be, if I didn’t have a little perspective on that architecture? Otherwise, I could just say ‘go read the book’.  Disclaimer: I’m not knocking Uncle Bob. I see the value in his architecture, and he’s got many many more years on me.

I think it can be simplified… I gave it a KISS. My interpretation is the ‘Wedding Cake Architecture’ and it looks like this.

WeddingCakeArch

It’s very very very heavily influenced by the Clean Architecture. Except that I’ve shaved out a layer. I’ve also intentionally drawn the layers with differing thicknesses, this is to illustrate the weighted dispersal of the code. You’ll have less code in the Business Models module, than you will down in the everything else module. You personally might not have much code in those layers, but if you imagine the sheer amount of coding behind Entity Framework, or a different ORM, or the ASP.NET Web API, you’ll see why that layer is so thick.

Here’s the example, reimagined:

// Exposed via BLL DLL
public enum EntryTypes { Debit, Credit }

public class Transaction
{
    public Guid Id { get; }
    public Guid Account { get; }
    public DateTimeOffset Time { get; }
    public EntryTypes Type { get; }
    public double Value { get; }
}

public interface ITransactionManager
{
    Transaction Deposit(Guid account, double value);
    Transaction Withdraw(Guid account, double value);
}

public class TransactionManager : ITransactionManager
{
    IPersistence Persistence { get; }

   public Transaction Deposit(Guid account, double value)
   {
       // create a new transaction
       var transaction = new Transaction();
       // setup transaction
       Persistence.CreateTransaction(transaction);
       return transaction;
    }

    public Transaction Withdraw(Guid account, double value)
    {
       var account = Persistence.ReadAccount(account);
       if( account.Balance < value )
            throw new InsufficientFundsException();
       // create a new transaction
       var transaction = new Transaction();
       // ...
       
       Persistence.CreateTransaction(transaction);
       return transaction;
     }
}

// Exposed via DAL DLL
[Table("Transactions")]
internal class TransactionEntry
{
    public Guid Id { get; }
    public Guid Account { get; }
    public DateTimeOffset Time { get; }
    public EntryTypes Type { get; }
    public double Value { get; }
}

[Table("Accounts")]
internal class AccountEntry
{
    public Guid Id { get; set; }
    public string Owner { get; set; }
    public double Balance { get; set; }
}

public interface IPersistence
{
    void CreateTransaction(Transaction entry);
    Account ReadAccount(Guid accountId);
}

public Database : IPersistence
{
    public void CreateTransaction(Transaction entry)
    {
        // translate the transaction to a transaction entry 
        var entry = new TransactionEntry(); 
        // ... copy the fields
        // Code to store the table entry into a database.
    }
    // ...
}

 

Now, the Business Logic Layer has zero knowledge of anything other than it’s own logic. So lets look at the benefits:

  1. Any clients of the API won’t know about the persistence (so I meet my design principle #1).
  2. We haven’t coupled our Business Logic to our persistence. So we have the flexibility to change our storage medium, or have different store implementations.
  3. Our Unit Tests will now be flexible to any of those changes. Not like before where the unit tests were directly coupled to the table classes. Before we had to use them to check proper output from the BLL.

Overall, I learned a lot from this little venture down Architecture Alley. I hope you can learn something from tale of the adventure too.

Happy Coding!

PL