Last evening I had an interesting exchange with a @GeePawHill on twitter that went something like this:

@GeePawHill: if a function involves a chain of objects, A -> B -> C -> D, and we satisfy ourselves, A works if B works. B works if C works, and so on... ..we have come very close to satisfying ourselves that ABCD works.

My reply (summarized) was the following

@sklivvz: What this is really testing is that "if the A->D function is implemented as ABCD, then..." but this way of testing hinders refactoring.
@sklivvz: testing a specific "chain" of units is testing a specific implementation of a function. Whereas one should be testing the contract of the function, and only that.

We agreed to disagree and I certainly don't want to escalate the discussion here, but this point has come up enough times in my career that it's worthwhile writing a blog post explaining why in my professional opinion fine grained tests are harmful to the health of your code base.

What unit tests are meant to do

My code is testng
source

The original idea of unit tests, as far as I can tell, comes from the necessity of testing libraries. If your product depends on many libraries, we want to ensure that the way libraries work does not change as new versions evolve. This is still a huge use case of unit tests: any good library I know makes extensive use of unit tests to ensure there are no "regressions" when releasing new versions. In fact some libraries go as far as testing whether historical mistakes which are depended upon are still present!

I'd like you to focus on the assertion that the original purpose of unit tests is to ensure there are no changes to the behavior of a library. Another way of saying this is that unit tests enforce contracts. Like interfaces, they make sure that an API has the correct "form", but they go beyond syntax and check some of the semantics of the library. For example, they check that a sin(x) function exists at compile time (or run time in some languages) and that it actually returns a sine and not a cosine at run time.

I'd also like you to focus on one thing library unit tests don't do. They don't check how sin(x) is implemented. This is not a mistake. This is by design. The library author reserves the ability to change their implementation at any time, while they guarantee the tests will keep on passing. In other words, unit tests are meant to enable refactoring, not to hinder it.

What unit testing won't get you

I agree with the twitter user that testing each library we depend on is important, but this check gives us no information, by itself, on whether our code is buggy or not. I'm pretty sure the .Net class libraries are well tested, however that doesn't mean that my .Net code is bug free!

Of course, the devil is in the detail, it is how these tested blocks are used and chained together that makes our code buggy. The "chain" example is faulty because it somehow means to remind us of "a chain is as strong as its weakest link", but this is actually completely wrong when referred to code. A code chain is as strong as the way it's linked together, as well as how strong its links are.

This means that we need to test more than libraries to ensure correctness, which is great, but unit tests might not be the right tool for this. If a project has an API, then certainly unit tests can test it, but they need to stay at that level and not depend on the internals! If they do, then they are not only specifying a contract, but they are also freezing an implementation, thus preventing refactoring.

Did you ever have to delete or rewrite bad unit tests that broke because you refactored something and writing a different implementation broke them "at a distance"? Yeah. Welcome to unit test hell.

Handle your units like pizza
Handle your units like pizza

The only thing that we are allowed to unit test is the "crust" of our application unit, not the "soft middle" We want the soft middle to be able to freely change, while keeping the application functional. Does this mean that we should give up unit tests and only perform integration tests? Not at all. We should perform the appropriate tests depending on our circumstances. If we have a ton of separate libraries, unit test away! If there are unchangeable surfaces within our code, unit test as much as you like, but please remember that unit tests freeze contracts and as such they have clear consequences. Indiscriminate use on small, fine grained units is a disaster and makes a code base unmaintanable faster than having no tests at all (reference: I worked with both for many years)

It's about time we stopped overstating how useful unit tests are. There are very important features that we cannot unit test, like performance or quality and many of these are fundamental to a good application implementation. Unit tests don't check how well sin(x) is implemented. A mediocre implementation of an algorithm passes unit tests just as well as a good one.

Terrible trouble taking tests to the extreme

Frozen implementations are not the only kind of damage induced by abuse of unit tests, because bad implementations add up over time. I understand that this is a bit of a straw man, that no one is saying that unit tests are enough etc... But TDD proponents do say that implementing tests is a good way to design software. I'd like to see evidence that using a methodology that is completely blind to a whole class of bugs produces good code. In my experience, not only does it not do that, but it generates bad code, quite consistently.

Here's how it happens. TDD is meant to work in three steps repeated cyclically:

  1. Write a unit test
  2. Change your application so it makes the test pass using the minimum amount of work
  3. Refactor

The problem with this is the "...then a miracle occurs" factor in step 3.

...then a miracle occurs cartoon

You see, every doubt one might have about quality of code, is solved by mentioning "refactor". The solution found in step 2 is a bad hack? Refactor! The algorithm is O(N23)? Refactor! Real life is not like that, friends.

People refactor within the limits of passing unit tests, but they might make something else perform really badly. Say your refactoring is changing your JSON library from a fast but incomplete one to a complete but bloated one and—you can see where I'm going!—you end up with apparently nicer, "refactored" code and bad performance, and all your tests keep on passing, giving you the impression that it's a valid move! This kind of errors is not merely hypothetical. They are common errors made from all kinds of programmers, including experienced ones, and relying mostly on unit tests being green means that these errors accumulate until the application has so much debt that it's nearly impossible to refactor.

Conclusion

Do I suggest that we stop using unit tests, or that unit tests are harmful in general? Nope. But we should totally stop overstating how useful they are, and how appropriate they are in every conceivable circumstance. Unit tests are merely tools that can be appropriately used to improve our code and that can equally be misused to make our code terrible. Knowing the appropriate use of this tool is a matter of experience, judgement and professional maturity.

Unit tests are not a nearly universal solution to improve code quality as they are claimed to be. In the end, there's not (yet) a good substitute for a skilled engineer using their head appropriately.

discuss on Hacker News


A software engineer & Stack Overflow alumnus in London. I write about software development, coding, architecture and team leadership. I also speak at conferences worldwide.

About me

Follow me on Twitter

Gleanings

How Aristotle Created the Computer
Chris Dixon • Mar 20, 2017

What began, in Boole’s words, with an investigation “concerning the nature and constitution of the human mind,” could result in the creation of new minds—artificial minds—that might someday match or even exceed our own.

Read more…