A Test of Knowledge

Kevlin Henney
7 min readApr 24, 2018
This article has been grown from an earlier, shorter version, which first appeared in the magazine accompanying the XP Day 2011 conference

Software development is often described as knowledge work. This label is invariably used as a shorthand for “work that doesn’t involve getting your hands dirty”, “jobs your parents never had and you struggle to explain to them” or, without any apparent irony over the use of fingers on keyboards and screens, “the digital economy”. But step back from these simple substitutions and an obvious yet deeper truth becomes visible: knowledge work is about knowledge. It’s about knowing stuff and, most often, also about how you deal with stuff you don’t know.

By far and away the most popular response to ignorance is to ignore it. Sound good? Feel free to stop reading! You must have stumbled across this page by mistake.

Still here? In that case I’ll assume curiosity on your part. Which is a good thing: curiosity is key to how we address a lack of knowledge. Questions are the agents of curiosity. Even without answers, questions can help us to increase and refine our knowledge, to learn.

Software testing is a form of learning. A set of tests can be considered a set of questions. The most obvious question a unit test poses is “Does the code pass?” to which there are two simple answers: yes or no. A test allows us to move from belief to knowledge — for example, to move from merely believing something works to knowing that, in a particular case, it does or does not. Even limiting the scope of testing to just this question and these two answers reveals more than a binary set of possible outcomes:

  • It passes, which is what we expected.
  • It passes and we are surprised, as this is not what we expected.
  • It fails, which is what we expected.
  • It fails and we are surprised and disappointed, as this is not what we had hoped or expected.

The reasons for failure (or success) run even deeper: a test might fail because the test is at fault, not the code; a test might pass because both the test and the code are at fault and in sympathy; and so on.

Although we might like to generalise and say that a set of passing tests means the code works, we need to keep in mind Dijkstra’s observation that “testing shows the presence of bugs, not their absence.” A set of passing tests may increase our confidence in the general working of the code, but a single failing test undoubtedly shows something is wrong — the question is, what is that something?

For a good unit test, a failed test indicates that the code under test has a fault. In other cases, it is the test that is at fault: the test may contain a simple thinko or it may reflect a deeper misunderstanding, or it may be too brittle, too precisely reliant on a particular outcome where something less specific or less strict will satisfy the requirements. For tests that take in external dependencies, failure may arise from problems in the outside world — file permissions, network availability, etc. — rather than the code under test.

A test is an act of communication — both the writing of the test and the outcome of running the test — which suggests the name and structure of a test should be geared up to tell you what is meant by failure or success. If your tests are numbered — test1, test2, etc. — what is being communicated is the order in which the tests were written. Perhaps not the best use of naming bandwidth. If your tests simply name the artefact they are testing — testThisFunction, testThatClass, etc. — you have said nothing about the properties being tested, but you have at least communicated that your tests are uncohesive.

There are many variations and competing styles for what qualifies as a good name, all of which are based on the idea that the name says something about usage and outcome, capturing specification and intent. Perhaps the simplest is the propositional style, where the name of the test, when passing, indicates a contingent truth about the code, and, when failing, it indicates a property that is not true or a capability that is missing. For example, A_new_stack_is_empty or Stacks_are_initially_empty rather than testStackConstructor, testEmptyStack or testNewStack.

A good test should, therefore, tell you something when it fails, but it should also tell you something when it succeeds.

What if a test always succeeds? What if a test does not fail for over a year? Is it useless? Is it not providing you with any information? Should you delete it? Although that may be a possibility, it is only one of many possibilities and, depending on the general quality of tests, it is not particularly likely. Instead, let us look to understand what the test’s recurrent success might be telling us, what we can actually learn from it rather than leaping into the jaws of an unquantified assumption:

  • The code hasn’t changed in a year. You have stable code and it has value. Should you need to change it, you would like to preserve that value. Keep the tests.
  • The code is dead. Your code is stable because no one needs it, so it has no value. Remove the code and, therefore, the tests.
  • The code has been changed, and all the changes have passed the tests. Well done, you’re awesome! Keep the tests — they confirm your awesomeness, but they may also catch you on your less awesome days.
  • The test isn’t being run. Check your build scripts and test set-up.

And you can probably think of more.

There is a great deal of knowledge that can be uncovered by asking questions of a simple red or green. Of course can is not will — having the opportunity is not the same as taking it — but the first part of any feedback-based process is generating the feedback; what you do with it becomes the next challenge.

But more learning opportunities are available from testing: in the formulation and writing of the tests, as opposed to their execution.

What does it mean if a test is hard to write? In particular, a unit test. Difficulty in writing unit tests is a principal demotivator among programmers trying to get into unit testing, causing many to get out of it. “We tried unit testing, but it took too much effort” is a common lament. A more experienced practitioner will often respond “if your code is difficult to test, it means your code is messy”, a comeback that is not necessarily unreasonable, but is also not necessarily correct. At best it is a provocative oversimplification intended to make you reflect. At worst it may prevent you from learning about your assumptions and the nature of your work.

The difficulty in being able to write a test can be boiled down to the two broad themes of complexity and ignorance, each manifested in a couple of different ways:

The essential complexity of the problem being solved. Such complexity is intrinsic to the problem domain and is not the fault of the software architecture, the code quality or, indeed, any of those who have worked on the code, past or present. If you’re looking for someone to blame try the problem domain or the customer. Perhaps the customer would prefer a “Hello, World” app instead of software to control the rail network? That would be one way to eliminate the essential complexity, but probably not what the customer needed. The reality is that some things just are harder to code and test no matter how you write the production code or the test code. When complexity was being handed out, not all applications and domains of interest were created equal.

The accidental complexity of the solution being implemented. Such complexity is an artefact of the way the software has been developed, external to the nature of the problem. It is non-essential and, in principle, it is avoidable. This is the realm of technical debt, baroque frameworks and rococo architectures. It’s where speculative generality has come home to roost, copy-and-paste code has blossomed and coupling has dropped anchor. This is where the observation “if your code is difficult to test, it means your code is messy” may often apply.

Uncertainty over what the code should actually do. One of the most common responses to the question of why one should test one’s code is “to show that it works”. This offers all the illusion of being a satisfactory answer without any of the substance. It leaves us with a lingering, killer question: What exactly do we mean by “it works”? What is it actually supposed to do? In short, what’s the spec? Tests may be difficult to write because we don’t actually know exactly what we want. We may understand the gist, but not the detail. We may apparently be able to write the code — having followed the gist we may have elaborated something in code that followed and meandered along that gist — but writing a test throws our ignorance into sharp relief. Or perhaps not: being unaware of our own ignorance is a common unawareness and ignorance, a difficult-to-see blind spot. This kind of difficulty in testing is an open invitation to talk to someone else, to clarify and find out more.

Lack of testing know-how. Programmers may be unaware of or unfamiliar with the techniques necessary to make a particular test easier, or even possible. If programmers don’t know about test doubles (mocks, stubs, etc.), what are the odds they will reinvent and rediscover this approach? Without help, would a typical programmer know how to test outputs whose sequence is non-deterministic or collections whose ordering is unspecified? What about testing the output of an image processing algorithm, such as lossy compression or edge detection? The difficulty of testing may reflect testing skills we need to acquire or techniques we need to improvise.

Of course, each of these aspects is not entirely independent of the others: a challenging domain may often require novel techniques for testing; uncertainty over the functionality may reflect a complex or, by being ill-defined, seemingly complex domain; and so on. Nonetheless, by recognising these four aspects we can learn more about our assumptions, our knowledge, our ignorance and the nature of our work than either writing off tests as too hard to be worthwhile or assuming that messy code is necessarily the root cause.

Testing? It’s an education.



Kevlin Henney

consultant · father · husband · itinerant · programmer · speaker · trainer · writer