In 9 years of blogging, I’m guessing at least half of what I’ve written centered around code quality or debug. For all the writing and time I’ve put into these topics, it took me until yesterday, for whatever reason, to realize I’m without a solid definition for debug. Or even a definition for bug for that matter. I guess I was relying on a common understanding of what bugs are and what it means to debug code. Realizing that may not be the case, I came up with a definition that captures what bugs are to me…
Bug: A state where a feature is defined, implemented and/or expected to fulfill a specific purpose but fails to do so.
This is a pretty broad definition. I want to say it’s an obvious definition, but considering it took me so long to come up with – never mind the the fact I’ve revised it a few times in the course of writing this! – it’s got me wondering whether or not that’s actually the case. Let me explain how I came to this.
First, I think perspective matters and it seems to me there are three distinct perspectives when it comes to hardware bugs. The first is the perspective of the architect or decision maker; their perspective becomes the definition. The second is that of the developer and the implementation of a feature. Last is the perspective of the user and their expectation of how that feature factors into their usage model. At a macro level, it seems pretty easy to say the success scenario requires all three perspectives to be in alignment: a feature must be defined as expected and correctly implemented. Practically speaking, however, it’s more complicated than it sounds.
(There tends to be a lot of wiggle room when it gets to the details. Complete alignment relies on teams developing and working under common assumptions. I know we as an industry like to believe in documented specifications as product gospel, but I tend to see common assumptions as being more influential to a feature lifecycle than any specification ever could be. And they can only come from working together.)
Now move on to the “but fails to do so” to see all the different possible failure modes. We can define a feature that fails to meet expectations: it may produce an incorrect result by design, maybe it produces the right result but in an unacceptable format; maybe it fails to produce some expected result altogether.
Then there’s implementation bugs which we can split into a few sub-categories. The most common would be developers understanding the definition but not producing the implementation to match. Less but still common is a solid implementation of a misunderstanding. Of course we can have problems with the format and completeness of the implementation as well.
Notice that I’m defining bug in terms of expectation. Sometimes the expectation is valid and the design or implementation need to change; sometimes it’s the expectation that needs to change.
Finally, when is it safe to have an expectation (i.e. when does a bug become a bug)? For me, if a feature is released to users, or simply made accessible to users, it can be expected to work. If it doesn’t, there’s a bug. That’s a high bar to clear but that’s a minimum for maintaining a consistent level of quality through development and deployment. Not many hardware teams do this which is probably why we are where we are. Personally, I know that whenever I cheat my way around that rule, someone gets burned a bug that slips through the cracks.
And speaking of bugs slipping through the cracks, here’s a definition for debug…
Debug: Effort spent reacting to and recovering from a bug.
Another (intentionally) broad definition. When we look at it end-to-end, start to finish, bugs can motivate a long chain of events that’d never otherwise exist. There’s recognizing the existence of a bug; sometimes they’re obvious, other times not so much. Then there’s the root cause analysis. Basic flaws may go straight to someone designing and applying a patch. For more complex bugs, a proper resolution can require long email threads, water cooler discussion and/or more formal meetings. There’s also dealing with the bug reporting database. And don’t forget the time managers dedicate to grooming the database, building reports, etc. How about the time required to regress and re-regress patches? Finally there’s releasing patches which may or may not require more meetings and/or coordination between teams. That’s a lot, no? Probably some steps I’m missing here but bottom line: debug is more than just the time we dedicate to the analysis and re-coding.
For bonus points: how about putting a price on bugs considering that broad definition of debug?
Anyway… give this definition of (de)bug some thought and let me know what you think!