Hardware Bugs Need to Die

I used to accept bugs as a part of what happens in hardware development. Start a new project, write a bunch of code, deal with the bugs that inevitably arise, stress out about whether I can fix them before development milestones, cross my fingers that regressions pass the day before tape-out, repeat.

After several years of acceptance I’ve started taking bugs personally. I hate bugs. They’re no longer acceptable and I’ve changed the way I write code to avoid the embarrassment of creating them. For me that means using SVUnit to unit test my testbench code as I write it.

Knowing that test-driven development and unit testing with SVUnit has improved my code in every way, that others have found the same and that it’s become an option for everyone, I’ve turned advocating for SVUnit into a bit of a hobby…


svunit-smokes

I’ve grown to attribute bugs not so much to developer ability or quality of specification or language or other commonly held positions but to the habits we rely on to write design and testbench code. Primarily, I’m talking our debug later approach to writing code. Write the RTL then debug the RTL; write the testbench then debug the testbench. We don’t think about it, we just do it. It’s a habit. It kills our productivity, just like cigarettes kill people.

While EDA reacts to long debug cycles by investing in new debug tools, the only way I see to improve quality is by adopting proactive coding techniques (test-driven development being one option that I personally use. There are others). But before we do anything, we need to realize that bugs are formed out of habit. Kick the habit and voila… fewer bugs.


rtl-freeze

This summarizes the entire reason I started using SVUnit. I’d be working on a block level testbench. We’d be coming up to an important development milestone, one that’s visible to the entire team/management/etc. Everything would be fine. Then, suddenly, we find a bug that I built. It’s serious enough that fixing it requires another week which means we blow by our delivery milestone.

The point you realize that you’re the one that’s going to have to stand up the next day to explain why a bug you created is going to cause a week slip in schedule for the entire team is something I find stressful. Incredibly stressful. It’s also what drove me to unit test my code, all in hope that I could avoid it.


svunit-p1

I know this position tends to hit a nerve with other verification engineers, but after a few years of using UVM I honestly wonder if we’re better off with the extra overhead and complexity that comes with it. There’s a place for it, that I agree with. But we’ve come to the point where UVM is the default decision. We don’t have testbenches anymore, only UVM testbenches. And they happen regardless of complexity.

I think it’s goofy to jump through all the hoops people associate with UVM by default. There are situations where UVM is absolutely overkill and we verification engineers should be able to look past it when that’s the case.


ransom

When you sit down and think about it, bugs dictate development flow. They pop up if/when they see fit; effort is diverted accordingly. Ultimately, bugs decide when an SoC tapes-out or a new FPGA load goes to a customer which means they are the true gate keepers when it comes to delivery. In some teams, bug rates and trends are used to estimate quality and delivery dates. Think about that, bugs have actually convinced us that we need them to make delivery decisions!

Bugs come out of nowhere to take development process hostage. Productivity grinds to a halt. We use bug databases to negotiate our own release. It all ends when we cough up a ransom in the form of time spent debugging. Then we close the issue and wait for it to happen again. And again. All the while, we naively believe we have some kind of control or influence over when and how things happen.


dont-mess-with-rtl

Going for the visual on this one! Real bugs are pretty gross. They’re also intolerable when they’re buzzing around and won’t leave you alone. I put my life on hold to kill real bugs… which is exactly what we should be doing with hardware bugs.

Bugs are disease carrying irritants. The longer they live, the more disease they spread, the more havoc they create. We need to take bugs personally. Bugs are not something that we should tolerate or accept as part of hardware development, they’re a plague that need to die. In fact I killed this one personally – with my bare hands – and left it as an example for other bugs in my yard. We need to hate bugs and show them that they are truly unwelcome in hardware, from the first keystroke to the last. Prevent them from getting into the product by all means necessary. If by some stroke of luck a bug does make it into the code: take a second to feel the shame, then murder it as soon as possible.

-neil

One thought on “Hardware Bugs Need to Die

Leave a Reply

Your email address will not be published. Required fields are marked *