}

.NET Core In-Memory Database

GIGO - Garbage In, Garbage Out.

It's a fundamental concept in software development. However high the quality of our code, if we feed in invalid data, we'll get invalid results.

Information in, Information out

And when we're doing unit testing, we are specifically assessing the functionality of our code, not the validity of our data. As far as is possible, we want to test a small number of classes, perhaps even a single class. We don't want to be reading data from a database - or any external store - because if someone changes that data, our tests will give red lights, implying an error in the software, when in fact it's a problem in the data.

Mock Objects

So when testing, we use mock data - data structures that mimic as closely as possible those in the actual database, but which are populated with values provided by the test code itself. To make life easier, we'd like to be able to switch between this test setup and the actual runtime setup with the real database as seamlessly as possible.

Traditionally, these in-memory data structures have been collections such as arrays and generic lists. To allow the easy switch from testing to production, these are often hidden behind and interface (using the Repository pattern). Since the introduction of LINQ (in C# version 3), it's been possible to write identical queries and have underlying providers target them either at in-memory collections or at an external database, making the seamless switching much easier. But differences remain which can mean unit tests fail when the full system works fine or - much worse - vice versa.

Consider a simple parent-child structure:

public class Parent
{
public int Id { get; set; }
public ICollection<Child> Children { get; set; }
}
public class Child
{
public int Id { get; set; }
public Parent Parent { get; set; }
}

Entity Framework

Compared with an in-memory collection, a true database (in combination with the Entity Framework DbContext) will do three things:

  1. When objects are saved, unique Id values will be correctly filled in.
  2. Reciprocal associations will be hooked up, e.g. if in code we had set a Child's Parent then on saving, the Child would automatically be added to the Parent's Children collection, and also to the overall DbSet<Child>.
  3. The DbContext uses eager loading. If we read in a Parent from the database, we have to specify whether we also want to load that Parent's Children. With collections, the objects are already there in memory.

When using in-memory collections, we have to mimic this behaviour - and decide whether the effort of mimicking it is worthwhile. In my experience, auto-generation of Ids is pretty simple. Reciprocal associations are more tricky since each relationship needs to be addressed specifically, but it's usually worth the effort.

The hardest - to the point of impossibility - is dealing with eager loading. And it's a real headache, because it can cause green lights in testing (since the data is always there in memory) and failures in production. We can bypass the problem by using lazy loading, where the DbContext loads objects as and when they're needed. But lazy loading can, in many circumstances, have an enormous negative impact on performance. Furthermore, we want to avoid redefining the runtime behaviour just to make our tests work.

EF Core

In Entity Framework Core we have a solution to all three of these issues: the In-Memory Database.

We can create one like this:

var options = new DbContextOptionsBuilder<MyContext>()
.UseInMemoryDatabase(databaseName: "MockDB")
.Options;
var context = new MyContext (options);

The name ("MockDB") can be whatever we like, but bear in mind that if we do this twice with the same name within the same test run, there will only be one in-memory database.

To use it:

Parent p = new Parent();
Child c = new Child();

p.Children.Add(c);
context.Add(p);

context.SaveChanges();

After the SaveChanges(), p.Id and c.Id will be set up, and c.Parent will contain a reference to p. Neither of these things would happen with in-memory collections (not without a lot of additional coding).

And then the real gotcha:

public Parent Load(int id)
{
return context.Parents.Single(p => p.Id == id);
}

Eager Loading

[TestMethod]
public void Load()
{
ClassUnderTest obj = new ClassUnderTest();

Parent loaded = obj.Load(27);

Assert.AreEqual(3, loaded.Children.Count);
}

We haven't done an eager load on the Children, and so this should give a red light, with loaded.Children.Count being zero (or possibly loaded.Children being null). But if we were using in-memory collections (assuming we'd loaded three children) then we'd get a green light, and then a failure when we run the application. An in-memory database correctly gives us a red light.

To fix it, we need to change the code to:

return context.Parents.Include(p => p.Children).Single(p => p.Id == id);

Which eager-loads the children and gives us a green light, plus the correct runtime behaviour.

Moving Forward

An additional benefit is that we no longer need to use a Repository interface to easily switch between real and mock data - the switch is done simply by changing the DbContext configuration. It can seem like a wrench to throw away all the Repository code we've written, but the simplicity of the in-memory database gives us extremely reliable unit testing, from only a few lines of code.

For more details, have a look at the following videos:

 

Optimize your output after inputting the best in .NET and Visual Studio training. Available In-Person, Online, or as Private Team Training!

 

This piece was originally posted on Aug 13, 2020, and has been refreshed with updated styling.