alltom.com

Thoughts On: Automated Testing

Status: DRAFT

At work, we argue a lot at work about how “integrated” our tests should be. Outside my corporate bubble, many developers have decided that computers are big enough that there's no need to use fake objects any more. That is not true at scale, which makes the argument interesting because it's not a question of whether, but of how much.

Where do you draw the lines?

The easiest test to write is a table of input/output pairs. But the length of that table grows exponentially relative to the number of dependent branches. One way to ameliorate that is to split the critical path into small chunks that contain only a few branches. Then it's easy to exhaustively test those branches, but with decreased confidence that the expected behavior of the system emerges from connecting all those segments.

We also pick subsets of those branches that repeat and test that they work correctly under all the conditions they're likely to be invoked in the actual, long stream of branches.

Why do we lose trust when we test units?

If code never changed, I think we'd trust unit tests the same as integration tests. A good unit test verifies that the unit interacts with the fakes in accordance with the contract. If the test for the fake object verifies that it also behaves according to the contract, then the system is well-tested. When does that fail?

It fails if there is spotty coverage of the contract. I ignore this because test coverage will be spotty no matter what you do, even if all your tests are integration tests.

It also fails if different contracts are verified on either side. This can happen by accident, and if the contract changes but the change doesn't propagate to all affected tests.

Simple example: an API changes to throw an exception where it previously returned a default value. Unless you verify that every caller behaves in accordance with the new contract (which the tests will not automatically verify), then you won't know anything is broken until an exception is thrown in production. But an integration test may have caught it.

When is writing tests too expensive?

A friend I admire thinks that not every behavior needs to be tested. We agree that you should only write tests that pay for themselves over time, but disagree on the ROI. I have found that using dependency injection and a mocking framework makes writing Java and JavaScript tests so easy that it's always worth the cost. My friend writes primarily C and C++, which may explain the difference.

Nearly all of my tests look like this:

@Test public void theThing_shouldDoSomething_whenPoked() {
  Input input = new Input("foo");
  when(aDependency.performCalculation(input))
      .thenReturn(precomputedReturnValue);

  Result result = theThing.poke(input);

  assertThat(result).isEqualTo(expectedResult);
}

It's very nearly just a table of expected input/output pairs. With some next-level meta-programming skills, maybe I could turn it into a literal table. Since most of my tests look like this, I test every code path. As my TL says, the only reason not to test a behavior is if you don't care if it breaks.

But it falls apart if my units are too big. I try to keep them so small that no code path has more than ~2 branches. Because the length of an exhaustive table of input/output pairs grows exponentially with the number of chained branches.

A code path through a full program probably consists of millions (billions?) of branches. But if you slice thinly enough, it's not that many tests.

Mocks are Ad-hoc Fakes

Mock objects are just fake objects that have a general-purpose API for configuring them.

When do you use a fake, and when do you use a mock? I think it's similar to deciding when to extract duplicated code into a shared module.

Change detection

Store business logic in data structures and then only test that the code behaves correctly given the data.

At work, we usually don't go all the way: we use data structures that way, but still write tests as if it's a black box. For example, if a class has a private table of inputs and outputs, we'll write a test that iterates over the table and tests that all the outputs match the inputs. It gives us freedom to refactor the class to use a different data structure without needing to rewrite the tests. Whether that's correct depends on the size of your unit.