For a seemingly straightforward technology that’s almost as old as dirt, the concept of code coverage is still kind of confusing. When you ask people what code coverage means, or better, what value it brings to FPGA/ASIC development, you’ll get answers going in a lot of different directions. Comments start at useless and run all the way up to fundamental when it comes to factoring code coverage results into sign-off decisions. And reasoning for why code coverage is both useless and fundamental comes in several shades of grey. It starts feeling like that thing we all assume we understand… but it’s rare to find someone who can explain it in a way we all agree with.
Admittedly, I expect this post to be a tad controversial so today will not be the day for consensus 🙂.
Open the discussion of code coverage up to include both simulation and formal coverage and things get even fuzzier. Though that shouldn’t be surprising considering few of us use formal tools and roughly zero of us use formal code coverage. If you’re part of the sim crowd you might not even know there is such a thing as formal code coverage.
So why would I dive into the similarities between simulation and formal code coverage?
Well… one of my goals for the past few years has been to build a mental model for how simulation and formal tools can be used in a complementary way. Both technologies have their strengths and weaknesses. Logically, identifying where they’re best applied allows us to optimize verification strategies that include both to (hopefully) make us more productive.
I captured a version 1.0 model in The Ideal Verification Timeline back when I was at Siemens. Of all the obscure topics and opinions I’ve written into the world, I’m most proud of the graphic in that post because it captured an idea I hadn’t seen anywhere else. Lots of people do simulation, others use formal, but I haven’t seen many people take a shot at a complete, collaborative view that includes both.
When I pitched that ideal verification timeline I was only thinking about integrating applications; I wasn’t thinking about integrating results. But that’s where I am now: integrating results. For example, when you have code coverage data from simulation and code coverage data from formal, what do you do with them? Can you just put them together? Or is that even possible? And how do you interpret the outcome?
Here’s my take on what simulation and formal code coverage are separately and what they mean when you put them together..
Simulation Code Coverage
Simulation code coverage tells you if a line of code has been exercised. That’s it. We’ll call this a generous interpretation of covered because the lone criteria of being exercised sets the bar very low.
Exercised is easy. For example, here’s a (very) basic chunk of RTL and testbench. The RTL subjected to the stimulus from this testbench will give you a result that’s overlaid in green and red; the green lines are covered but not the red. You can see what I mean by being generous. With just a clock, reset and default value for flintstones much of the RTL is exercised and marked covered.
With simulation code coverage there is no link between being covered and being correct. It’s critical to understand that. Regardless of what you have in your testbench or the confidence you have in your checking strategy, the code coverage itself does not convey any aspect of quality or correctness, only activity. For that reason, there are no safe conclusions to draw about the correctness of the green statements. Therefore closing simulation code coverage to 100% shouldn’t hold much weight. Literally every line of RTL could be wrong and a simulation code coverage tells you nothing about it. Green is not super useful.
The red on the other hand, that is useful. The red identifies the code you haven’t exercised (aka: the holes). The holes in your report should motivate you to write more tests (likely) and/or prune code that isn’t necessary (less likely). So closing code coverage in simulation is more about identifying what you haven’t done (red) and nothing to do with what you have done (green).
Formal Code Coverage
Formal code coverage is different… and more complicated. In layman’s terms, a line of code is reported covered if it’s part of the cone of influence of a formal proof. We’ll call this a selective interpretation of code coverage; only the lines necessary for a proof are marked as covered.
For example, applying a property to the same RTL gives a much different result than the clock-n-reset simulation. The property proves the outcome of wilma and only the logic required to do that is covered.
That formal includes an element of correctness is a major difference relative to simulation. Both technologies tell you a line of code is exercised but only formal code coverage tells you that a line of code is exercised to validate an outcome.
We may be inclined to trust the green because it does have that link to correctness. But! Just because a line of code performs correctly for one property doesn’t mean it’s correct for all potential properties. So while the covered lines in a formal code coverage report are higher quality than what you get in simulation, formal code coverage is still incomplete and thus not trustworthy as a signoff metric on its own. So really, we’re back to red being the true value of a formal code coverage report. As with simulation, the red is definitely a hole. That hole should motivate you to write more properties (likely) and/or prune code that isn’t necessary (less likely).
Are Simulation and Formal Code Coverage Compatible?
I’ve found people that say ‘yes’ and others who are a hard ‘no’ so I’d encourage you to make up your own mind. As you may have guessed by the title, I’m a ‘yes’. Let me walk you through my reasoning…
To start, I’ll recognize that simulation code coverage and formal code coverage represent fundamentally different information. The fact that simulation has no link to correctness whereas formal does, in my mind, makes formal code coverage a far more objective and trustworthy result. If I had to pick between trusting formal as a signoff metric or simulation, I would take formal hands-down.
That said, on their own, neither formal nor simulation code coverage tell you definitively that a line of code is exercised and correct. Which means the only objective value of either is the red holes they identify. So compatibility hinges on whether or not the holes identified by the two technologies are compatible. And that, to me, hinges on whether you are generous or selective. It’s pretty easy to know which camp you’re in.
If you are formal-driven, you are selective. For you, formal and sim code coverage are NOT compatible because dumping generously interpreted simulation data in with selective formal data destroys the integrity of that selective data (i.e. you lose the link to correctness). To you, closing code coverage means closing coverage with formal only.
If you are simulation-driven, you are generous. Simulation and formal code coverage IS compatible for you. Adding selective formal data into your generous simulation coverage pot doesn’t change the quality of your results. Once merged the two sources of data are indistinguishable; generous + selective still equals generous. For you, closing coverage is a collaborative affair between your simulation and formal folks.
Going back to our RTL example, the merged result from the simulation and formal teams points to one remaining hole in our collaborative and complementary testsuite. Next is to decide if another sim or another property is the right way to go. Given we treat the unexercised holes as equivalent between the two technologies, it’s valid to pick either technology to fill them.
What Do You Take From This?
If you’ve been doing formal all this time, I hope you have a better understanding of simulation code coverage. It’s pretty weak relative to formal, but it’s useful nonetheless provided you use it for the right purpose (i.e. identifying holes). If you’re an engineer who’s done simulation exclusively, now you know formal isn’t just about proofs, it’s also about coverage. As such, it gives you another tool to assist with code coverage closure.
As to the validity of merging coverage from different sources, I know some will object to that. Objections I’ve heard so far come primarily from the formal crowd and usually revolve around merged coverage masking bugs. That’s a legitimate reflex but one I think is a non-issue.
Simulation only tells you what’s been exercised, not what’s verified, and being exercised counts for very little. The way I see it, sim code coverage is so generous it’s already masking bugs. Seriously. That’s why you can’t trust the green, only the red. Pulling in more selective code coverage data doesn’t change that fact. Ideally, using the two technologies together propels you faster into the more trustworthy metric of functional coverage coordinated with test planning and checker coverage.
In summary, code coverage is good for detecting holes in your testsuite. That goes for simulated tests and formal proofs. That’s basic, but useful nonetheless. If you read more into it than that, reconsider, because I don’t see that as safe. If your team relies primarily on simulation, merging code coverage from formal can be a more productive way to identify holes in your testsuite. It also gives you a choice of technology to address them.
That’s my thinking anyway. If you’re thinking otherwise, I’d be curious to hear why!
-neil
Thanks a lot for this! It’s very insightful to me. One additional aspect to bring in the functional Coverage. It would be even more helpful if you could provide your thoughts on how functional Coverage fits into this space.
I’d like to pull in functional coverage but I don’t have much experience with it yet. someday I’ll hopefully get there too.
There are multiple flavors of formal code coverage. You can also just cover whether your formal constraints allow lines in the RTL to be reached, which gives you the same amount of information as “regular” simulation code coverage.
There are multiple flavors of simulation code coverage. You can also cover whether mutating a line/condition/etc. in the RTL has any effect on your checks, which gives you similar information to what you get from cone of influence formal coverage. You can also mutate the RTL in formal verification and get stronger results than just cone of influence coverage. (Is mutation coverage from simulation as “strong” as mutation coverage from formal? Is cone of influence formal coverage “stronger” than mutation coverage in simulation? I don’t know, I haven’t really thought about it yet.)
I think ideally you’d want to keep the nuances when merging code coverage, so you can make more informed decisions. Some tools will do this, by storing not just “hit” and “not hit” for a code coverage element, but also the source of this information.
Thanks for this Neil! Very informative. I am at this very moment trying to explain the difference between simulation and fomal code coverage to a team who are new to the topic and this will help a lot.
How does this discussion change when applying conditional code coverage?
Hey Mike. The short answer is that I’m not certain yet. I’ve spent enough time with statement coverage and toggle coverage that I’m pretty confident. I’m less certain with condition, expression, etc. that said I haven’t heard a convincing argument against the other types of code coverage yet so I’m optimistic a little more experience will show they’re good for combining for the same purpose.