That’s what sets in after we’re done hitting all the easy stuff and move on to targeting the more obscure corners of a coverage model. The pace slows. Progress slows. We have lots of review meetings to debate the merits of coverpoints, some of which we may not even understand. Through trial-and-error we plod along as best we can until someone says whatever we have is good enough (because 100% coverage is impossible). Then we shrug our shoulders, add a few exclusions, write up a few waivers, shake off the hangover and move on.
I’ve had coverage hangover several times. I’m sure we all have. With some devices – the really massive SoCs – there are verification engineers that live through coverage hangover for months at a time. Their only reprieve, if you can call it that, tends to be bug fixing. If they’re lucky, they’ll get to implement a new feature now and then. Otherwise, it’s a cycle of regression, analysis, tweak and repeat.
The worst part of a coverage hangover is that the next hangover is guaranteed to be worse because the next device is always bigger. At least that’s what happens with current verification strategies. I’d like to propose we break common practice with a reset. Regrettably, it won’t change the fact that coverage space will continue to grow. But it will give some relief for the next few generations while folks smarter than me find better ways to define, collect and analyze coverage.
In closing coverage, our focus and timing have always struck me as being out of tune with the reality of the mission. I’m proposing we change that by breaking coverage into a series of steps that we can focus on independently, moving from one type to the next as features mature.
Code coverage is first – line coverage at a minimum but also expression coverage; collected, analyzed and closed immediately after a line of RTL is written. Closing code coverage first is a major deviation from and months earlier than the norm.
There are options for closing code coverage this early. The best option, option ‘A’ in my opinion, is through a designer written unit test suite that’s built concurrently with the RTL. Second best are unit tests and directed tests written by the verification team. Fall back would be code coverage collected from a constrained random test suite (i.e. what most of us do now).
I’m aware that most people will see option ‘A’ and push back with the idea that designers should not be trusted – or bothered for that matter – to exhaustively test their own code before handing it to the verification team. That and it sounds like the least efficient means of verification possible. If that’s your response, I acknowledge it’s a stretch but I stand by the suggestion. The absolute best person to verify the intent of some line of RTL is the person that wrote that line of RTL. Further, I think having designers test their own code as they write it helps all subsequent verification activity proceed much more smoothly than would any other option. By a mile. So it’s ok to push back, but if you’re quoting extra effort as the reason, keep in mind it’s effort you’ll spend one way or another. And the cost of that effort increases dramatically for every day a line of RTL goes untested.
Atomic functional coverage is next where functional snapshots of a design are captured and analyzed. Atomic functional coverage would apply to individual properties and be derived from atomic transactions. Then comes atomic cross-coverage derived from the same transactions but across a combination of properties. Last are temporal functional coverage and temporal cross-coverage derived over multiple transactions.
Up to here, code and functional coverage applies to exhaustive verification of subsystems and IP. I think there’s value in planning, implementing, collecting, analyzing, refining and closing each type of coverage in order because they build on each other and compartmentalizing increases focus by limiting the problem space. A further compartmentalization – necessary in my mind – is approaching each type of coverage on a feature-by-feature basis (i.e. incrementally closing coverage on features as they are built instead of closing coverage on entire designs).
After subsystem coverage comes integration coverage.
An integration coverage model can be derived exclusively from device and subsystem IO – which probably includes a good portion of the software interface – and contains a combination of code coverage and functional coverage focusing on (a) connectivity and (b) functionality.
The connectivity aspect of integration coverage is easier to define. We have toggle coverage for subsystem connectivity complemented by a minimum atomic functional coverage subset on each interface. Pin level activity under adequate load gives us the confidence in connectivity.
The functional aspect of integration coverage is more of a grey area that segues into our next topic of constraining the size of coverage models. Before we get there, I think a mandatory rule of thumb is that a system level functional coverage model must be derived from customer use cases. A system level functional coverage model must never be defined as some aggregation of subsystem functional coverage.
(fwiw… this blog is just an excuse to write that last sentence. If there’s one takeaway here, please make it be that).
Compartmentalizing coverage could be a great way to relieve the coverage hangover, but reducing the amount of coverage we collect would bring even more relief.
Our HVL coverage constructs make it very easy to create massive coverage models – hence massive coverage hangovers – with just a few lines of code. For example, it’s easy to:
- cover all values of ‘A’
- cover all values of ‘B’
- cross cover ‘A’ and ‘B’
- cross cover a history of ‘A’ and ‘B’
- if ‘A’ and ‘B’ why not ‘C’
- and ‘D’…
My one suggestion here is to think more critically about the functional coverage models we create with an emphasis on constraining the size of our coverage models as reasonably as possible. Differentiate between ‘must have’ coverage items that observe necessary functionality and derivative ‘nice to have’ coverage items that are easy to observe but don’t add much in terms of design quality and confidence. This is a tough distinction to make given our penchant for risk aversion paired with the coverage hyperbole we’re constantly subjected to, but it’s necessary to contain the cost of verification.
The lines may blur between each of these coverage steps, but trying to compartmentalize our coverage nightmare as much as we can would go a long way toward mitigating our coverage hangover. Likewise for constraining the size of our coverage models.