Todd Schiller

Machine ✘ Human Intelligence

Improving Unit Testing with Data Provenance

Dec 29, 2013 by Todd Schiller

In an ideal world, each unit test would run in isolation. In practice, environment setup (e.g., database schema setup) is expensive, so it's common to share resources between tests.

In the worst case, interference between tests can actually mask bugs that the test suite would otherwise catch [1]. In my experience, a more common outcome has been that sharing resources hinders debugging by making it difficult to determine where data came from (its provenance).

An effective way to address this issue is to "tag" data, at the time of its creation, with information about the test context. For example, in one of my projects, we use the test name to identify data provenance:

:::C#
using System.Diagnostics;
using Microsoft.VisualStudio.TestTools.UnitTesting;

public static string GetTestContext()
{
    var stack = new StackTrace();
    var testMethods = stack.GetFrames().Where(frame =>
    {
        var testAttr = typeof(TestMethodAttribute);
        var attrs = frame.GetMethod().GetCustomAttributes(testAttr, true);
        return attrs.Count() > 0;
    }).ToList();

    Debug.Assert(testMethods.Count == 1, "Unique test context required");
    return testMethods[0].GetMethod().Name;
}

Note that Debug.Assert (or Trace.Assert for release builds) is used instead of throwing an exception. Unlike an exception, an assertion failure cannot be caught by the program under test or the test suite. As a rule, instrumentation code should never throw an exception. Instead, instrumentation code should crash if an assumption is violated.

In our case, storing the test name with data records did not require any modifications to the data model — the records of interest already had an associated string property that does not affect program behavior (used to store user-entered notes). However, attaching string metadata is not an option in all cases. In these cases, you may be able to locate a non-functional part of the data record (e.g., a unique identifier field) to store partial provenance information, maintaining a mapping to the test context as necessary.

The benefit of storing the information in the data record itself as opposed to doing a more complex form of tracking, e.g. using a CLR program instrumentation tool, is that the provenance data is not lost when the record is persisted in an outside resource such as a database.

This leaves the question of how to invoke provenance tagging methods. If the test suite itself is setting data, it's easiest just to call the method directly. If not, the standard approach works of abstracting the data creation method as an interface and then using a fake (i.e. mock) when testing. An interesting alternative (which I haven't yet tried) is to use the Microsoft's Moles tool to "detour" the property setters to insert the test context. Since Moles works via the CLR's just-in-time compiler, this could be done without modifying the program under test.

References

[1] Test Dependence: Theory and Manifestation. Jochen Wuttke et al. University of Washington Technical Report UW-CSE-13-07-02. 2013.