Spies, Integrations, and Contracts

At a recent guild meeting with some of my colleagues, we had an interesting discussion on the use of spies. After some wrangling over what a spy is, and the different types of test double in general, the discussion turned to where and how people were using them. One of the developers raised the issue of testing Express middleware. Specifically using a spy to ensure that next() is called so that the middleware under test calls the next middleware in the chain.¹

At first glance, this seems like a reasonable approach, your middleware has logic, therefore it needs a unit test. The mistake is treating the Express wrapper as the unit. Good test practice is to test behaviour, and not implementation. This means we should test that the middleware behaves as we expect it to, not that it’s implemented how we think it is. I’ve written about my dislike of spies like this before. Testing the implementation has many drawbacks, not least of which is preventing safe refactoring, because if you change the implementation, but not the behaviour, your tests will break. Here’s how the developer suggested we test this:

it('should call next to run the next middleware', function () {
  let sut = require('middleware/foo');
  let next = sinon.spy();

  sut({}, {}, next);

  next.called.should.equal(true);
});

Here’s the problem: the test proves only that our function calls the callback we passed as the third argument. Express middleware is a framework contract, and this test bypasses Express entirely. It also keeps the business rule tangled up with the wrapper, which makes the test look unit-shaped without giving us useful unit-level feedback.

Changing tests when you refactor implementation, without changing behaviour, is extremely dangerous. If you change code and tests at the same time you lose the confidence that the behaviour of your application hasn’t changed.

Integration

When middleware contains business logic, the first step is to extract that logic and unit test it directly. Leave a small wrapper around Express, then integration-test the wrapper in an Express application, where next(), redirects, and middleware order have their real meaning.

Here’s an example with the rule extracted from the middleware:

function requiresLogin(url) {
  return url === '/protected';
}

function foo(req, res, next) {
  if (requiresLogin(req.url)) {
    return res.redirect('/login');
  }

  next();
}

The unit tests belong around requiresLogin. The integration test covers the Express behaviour:

describe('middleware/foo', function () {
  const express = require('express');
  const request = require('supertest');
  const middleware = require('middleware/foo');

  it('should redirect when the page is /protected', function (done) {
    let app = express();
    app.use(middleware);

    request(app)
      .get('/protected')
      .expect(302)
      .expect('Location', '/login')
      .end(done);
  });
});

This covers the redirect path through Express, but it doesn’t check that a request which is allowed through reaches the next middleware in the chain, so now we need another test for that.

it('should run the next middleware in the chain', function (done) {
  let app = express();

  app.use(middleware);
  app.use(function (req, res) {
    res.status(200).send('hello');
  });

  request(app)
    .get('/hello')
    .expect(200)
    .expect('hello', done);
});

This kind of works for us, but seems quite verbose, I wouldn’t like to do this a lot, and although this is slightly better than the original (as it doesn’t spy on next directly) it’s still not as good as it could be. Instead, let’s try testing that the next middleware in our real app runs instead, because that’s what we really want to happen, and the behaviour we really want to test.

Integrated

Building on our tests above, an integrated test is a type of integration test that calls the entire app stack, rather than testing the combination of two small parts of it. Technically, the above tests are also integrated tests. They just test a dummy application, instead of your real one. In the PHP world, you’ve probably written integrated tests if you’ve used your framework’s own TestCase class, instead of PHPUnit\Framework\TestCase directly. It’s basically a cut down version of your full stack that runs in memory, instead of spinning up real web servers, browsers, and databases like your end-to-end tests would.

Here’s the next middleware in the chain:

function bar(req, res, next) {
  res.setHeader('X-Bar', 'baz');
  next();
}

Now using it in our actual application:

const app = require('app');

app.use(require('middleware/foo'));
app.use(require('middleware/bar'));

app.get('/hello', function (req, res) {
  res.status(200).end();
});

And to test:

describe('app', function () {
  const request = require('supertest');
  const app = require('app');

  it('should redirect to /login when the page is /protected', function (done) {
    request(app)
      .get('/protected')
      .expect(302)
      .expect('Location', '/login')
      .end(done);
  });

  it('should set an X-Bar header with value baz', function (done) {
    request(app)
      .get('/hello')
      .expect(200)
      .expect('X-Bar', 'baz')
      .end(done);
  });
});

We’re now testing that our application behaves as we expect it to, without making any assertions on implementation. Tests like this prove the customer-facing behaviour for the paths we care about, including the application wiring, and in many applications that is enough.

The limit appears when we expect app-level integration tests to prove every small collaboration inside the codebase. This is the problem behind the Integrated Test Fallacy.

Integrated Test Fallacy

There are two major problems with integrated tests when we use them as programmer tests. The first, is that they’re slow, and the second, is that you can never write enough to cover your code adequately.

As a result, you end up writing just enough to cover your primary paths, and some errors. Then whenever a customer picks up a bug, you write more tests to cover that bug, but you can never cover everything.

This is fine for integrated tests that are used as Customer Tests: functional, system, or end-to-end tests that prove the application behaves how the customer expects. It becomes a problem when we use the same kind of test as a Programmer Test, where the goal is to prove the correctness of that functionality and the collaborations behind it.

With the example above, the app-level tests prove two customer paths: /protected redirects, and /hello includes the X-Bar header. They don’t give us a reusable guarantee that each middleware obeys its contract in every composition. If foo and bar only live in this app and those paths are all we care about, that may be fine.

If we want stronger confidence, trying to get it by adding more app-level tests gets expensive quickly. Every new route, middleware order, and error path creates another combination to exercise through the whole app.

You will never write enough integrated tests for that to work as a general programmer-test strategy.

Collaborators & Contracts

At this point, we could decide that the app-level tests are enough. If we need stronger confidence around the contract between components, we can use a collaborator test with a spy, provided we add a contract test for the thing we doubled.

A collaborator test is an integration test that uses a test double to simulate a collaborator’s behaviour. This is what my colleague was originally calling a unit test (it’s not). In this case, the collaborator is Express’s next callback:

it('should call next to run the next middleware', function () {
  let sut = require('middleware/foo');
  let next = sinon.spy();

  sut({}, {}, next);

  next.called.should.equal(true);
});

Here’s the secret, though: For every double that you put in a collaborator test, you have to write a contract test that ensures the behaviour of your double is checked against the real implementation.

The benefit of this is that if you don’t know how to write a unit test for the collaborator you’ve doubled, you don’t understand its interface well enough to be writing a double for it. You need to go away and learn it better.

By differentiating your tests this way, and sticking to the rule of creating a contract test for every double in your collaborator tests, you get to negate many of the negative aspects of using doubles.

If you refactor your original code (for example, no longer requiring the call to next in middleware), you would expect your contract tests to fail, but not any other type of test.

Once you refactor your contract tests, that will force you to refactor your doubles as well, and when you do that you would expect your collaborator tests to still pass, but you can be confident that all of your doubles are up-to-date with the real-world implementation and you haven’t had a change in behaviour.

The contract behind the spy is that Express runs the next middleware when a middleware function calls next(). The contract test validates that assumption against Express itself:

describe('Express middleware contract', function () {
  const request = require('supertest');
  const express = require('express');

  it('should run the next middleware when next() is called', function (done) {
    const app = express();

    app.use(function (req, res, next) {
      res.setHeader('X-First', 'yes');
      next();
    });

    app.use(function (req, res, next) {
      res.setHeader('X-Second', 'yes');
      next();
    });

    app.use(function (req, res) {
      res.status(200).end();
    });

    request(app)
      .get('/')
      .expect(200)
      .expect('X-First', 'yes')
      .expect('X-Second', 'yes')
      .end(done);
  });
});

While the example of next() in Express is a trivial one, it applies to all components of our application that integrate with others. If this test starts to fail after an Express upgrade, the failure points at the framework assumption behind the double rather than at foo’s business rule.

Conclusion

I think this quote from J. B. Rainsberger sums it up better than I can:²

Strong integration tests, consisting of collaboration tests (clients using test doubles in place of collaborating services) and contract tests (showing that service implementations correctly behave the way clients expect) can provide the same level of confidence as integrated tests at a lower total cost of maintenance.

Express, Using middleware. ↩︎
J. B. Rainsberger, Clearing Up the Integrated Tests Scam. ↩︎