Existence Proof

May 14, 2021

Suppose a fool wants to play Russian Roulette with you. You wisely decline, because the game is dangerous. The fool proceeds with the first round anyway. They load the revolver with a single cartridge, spin the cylinder, aim at themself, and pull the trigger. Nothing happens beyond the click: No round is fired. “The game is obviously safe,” claims the fool. “We have existence proof that pulling the trigger does not fire the weapon. Your turn!”

Now, imagine an engineer claiming some code is free of bugs. They run the code, and it produces the desired output. “As you can see,” they proclaim, “the software works as expected.”

A finite set of plausible results can easily lead to a misguided sense of confidence. For example, if our customers seem happy with our product, it’s easy to presume the product works as intended. On the back of such a presumption, we may block the introduction of quality control measures such as test suites, type systems, and formal verification, on the grounds that they are redundant.

It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. . . . Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask "What is the cause of management's fantastic faith in the machinery?"
—Richard Feyman, Personal observations on reliability of shuttle

Indulge me, if you will, in an interview question I have often asked engineering candidates: We are replacing some messy old code with clean new code that is better factored, tested, etc. We want to make sure that switching to the new code will not introduce any bugs. Suppose you are asked to supply sample input to both the old and new code, to check that the outputs match, and to release the new code to our users once you have validated it. You run both sets of code, and find a discrepancy. You investigate, only to find that the new code is correct: The old code has been producing bad results for years, but no one seems to have noticed. What do you do? Do you release the new code?

I ask this question because the situation arises all the time. Almost always, the shiny new code uncovers long-standing problems. Ugly code—badly formatted, unclearly documented, full of spelling errors, etc.—is invariably bug-ridden. Unpleasant though the truth may be, when it comes to code quality: Where there’s smoke, there’s fire. (The right answer, by the way, is to escalate the decision to management. Whether to fix a long-standing, apparently undetected bug is for a Product Manager to decide, not an engineer. If our users have already worked around the problem without telling us, then fixing it on our end may actually cause them problems. At the least, roll-out should be carefully coordinated.)

Existence proves that something can happen, not that something else cannot.

Summer

> Whether to fix a long-standing, apparently undetected bug is for a Product Manager to decide, not an engineer. If our users have already worked around the problem without telling us, then fixing it on our end may actually cause them problems. At the least, roll-out should be carefully coordinated.)

this is troubling to me. i want to empower my engineers to make big decisions. i understand team size, context matters, but, generally...

Expand full comment

3 replies by Jeff Schwab and others

3 more comments...

Deeply Nested

Discussion about this post