I have now been on both sides of the code coverage debate. I have advocated that test suites should achieve higher code coverage. I have also advocated against using code coverage for blocking PR merges.
And fair warning: what i say next is based on my lived experience as a developer, and may not apply to all software engineering contexts.
Some things i have learned to be true, over my time researching and practicing software engineering:
- You can execute the same line of code a million time without ever catching a bug in that line.
- Higher code coverage does not mean better tests.
- Code coverage can be gamed.
- Some code cannot be executed naturally in a test environment. A lot of UI and configuration code falls neatly in this category.
Given that, I have landed on a very basic philosophy around writing tests, and how code coverage factors in:
I do not optimize my test code for higher code coverage. And instead use coverage data to prioritize what i need to test and leave untested.
The coverage metric itself is not meaningful to me. I rather have only 30% code coverage if i am testing the most critical sections of the codebase. I prefer that over covering 70% of the codebase which may be peripheral to the program’s implementation.
Beyond the coverage metric itself, i am more interested in the set of lines got executed by the test. And i like to examine this data by each method or function. That tells me if the lines that I think should get executed are actually getting executed. I also use such insight to think about the program inputs i need to conjure to exercise the lines of code that are getting skipped from execution.
Indeed, that sort of analysis can be time consuming. But to me it is no different than the time I spend in investigate bugs in production code with breakpoints and step-through debuggers. I use the full set of covered (and uncovered) lines as data in gaining a deeper understanding of how my test code is working.
Personal take: Code coverage is not something that should be optimized for. It should be treated as an honest reflection of what got executed by the test(s), and what did not. That gives real insight into the inner mechanics of the test code.

Leave a comment