Pretty much every testbench I’ve ever built, used or seen has a free-running clock that’s driven within a while or forever loop. Not much can happen without the clock in a synchronous design so defining the clock logic is usually the first and most obvious thing we do as verification engineers.
Assuming your design-under-test is synchronous to the positive edge (they almost always are), testbench components usually do their work somewhere off the posedge to avoid races. With even a simple testbench these days, there will be several components, each with their own thread, pushing or pulling data from various interfaces on the DUT. The free-running clock is that steady bass drum beat that holds everything together.
When I started unit testing RTL in SVUnit, I carried on with this same clocking strategy. I had a free-running clock in my unit test harness and everything that happened in the unit tests did so relative to edges on the clock. After a while, though, synchronizing to the free-running clock in my unit tests started to feel a bit awkward. I can’t quite put my finger on why, but I think it had something to do with constantly being tripped up by not knowing exactly where I was in a sim relative to the clock and what exactly was happening. Was I waiting, driving stimulus, checking outputs or some/all of the above? As the free-running clock started to feel awkward, I took a step back and overhauled my clocking strategy.
I think “normal” testbenches use the free-running clock because many components need to stay synchronized and having 1 clock master is really the only way to manage that. But in a unit test, there is no need for the same kind of multi-component synchronization. As long as your tests stay simple, there isn’t much in terms of synchronization that happens elsewhere. So what I ended up trying is a clocking strategy that looks more like my unit tests are stepping through a code debugger.
Just as you’d see in a debugger, when I need to move ahead in time I call a step(n) task. The ‘n’ argument is the number of cycles I’m stepping through. If I want to advance 4 clocks, for example, I ‘step(4)’ instead of the ‘repeat (4) @(posedge clk)’ I might do with a free-running clock. This change alone gives my tests a little different feel.
Considering unit tests can master the clock directly with the step(n) task, I felt like I also had a chance to update how I drive and sample IO on my UUT. What I wanted was to use the step(n) function to go to a specific point in time, then drive and sample everything necessary without inadvertently causing any race conditions. In other words, I wanted simple procedural tests with 1 thread of execution (i.e. no fork/join) where I could forget about the timing details.
To start, I added a nextSamplePoint() task that would take me to a safe place away from the positive edge. Assuming I’m on the posedge when I call it, nextSamplePoint() takes me 1ns passed the posedge (I chose 1ns but I this could be any delay you like).
I made all these changes incrementally, as it made sense to do so, so by this stage in the overhaul I could drive an input, use the step(n) to jump ahead ‘n’ clocks then use nextSamplePoint() to advance to where I can safely sample a response. Here’s what that would look like in a test where I stall data flow on the first sample point, step ahead 1 clock, proceed to the next sample point and check outputs on a write port…
To make the tests shorter and more readable, I ended up calling the nextSamplePoint() task from within the drive/sample helper methods. In doing so, the above test became…
Lastly is the situation where I need to drive and/or sample several pins on the DUT simultaneously. In these situations, I wanted to sync once, drive and/or sample several of the IO, then carry on. What I chose to do here was update the nextSamplePoint() method so it only takes effect the first time it’s called on a posedge (i.e. advance 1ns if I’m at a timestamp that’s an even multiple of clkPeriod) which means it ends up looking like this…
and a test that drives and samples the UUT looks like this…
It’s not shown here, but both the expectEgressPixel() and expectEgressPixelRead() are checkers that first call nextSamplePoint(). Because only the first advances the simulation time 1ns, both expectations are evaluated on the same timestamp. So regardless of how many drive and sample methods I call without stepping, if they all call the nextSamplePoint() task before doing anything else, they all take effect on the same timestamp.
The last code snippet I have has an update to the step(n) task where it accommodates for the 1ns delay of the nextSamplePoint() task. The step(n) task would normally advance an entire clock period. If however, the nextSamplePoint() has just been called, step(n) would advance 1 clock period less the 1ns delay.
I’m not sure if this would qualify as a major deviation from the norm but it did feel major enough to offer it as a suggestion for people doing RTL unit tests. Now that I’ve gone through all this, I feel like actively mastering the clock from within unit tests has a few advantages over the free-running clock:
- actively mastering the clock with the step(n) task (like you would in a debugger) feels more intuitive than passively synchronizing to a free-running clock;
- the conditional delay in the nextSamplePoint() task makes it easier to simultaneously drive and sample IO on the UUT without needing to think about delta cycles and race conditions;
- the conditional delay in the nextSamplePoint() task is a little like defining delays in interface modports… though in all honesty, those modport delays have always kind of confused me (am I the only one??) whereas this nextSamplePoint() function is pretty easy to understand; and
- by packaging the step(n) and nextSamplePoint() tasks in a macro, I can use the same logic in multiple test harnesses so clock mastering and synchronization is the same across many unit tests.
Granted, I could probably have provided the same step(n) and nextSamplePoint() API with a free-running clock so maybe the real lesson here is to remove any direct dependencies between unit tests and the clock, not so much duplicating the implementation I’ve chosen (though obviously I prefer the what I have or I wouldn’t have written about it 🙂 ). I will point out that one downside is that this strategy falls apart when you have multiple clocks, but the lessons I’ve learned about design partitioning will at least partially address that particular downside!
More lessons to come, but so far this looks like a move in the right direction in terms of writing decent unit tests for RTL.
3 thoughts on “Stepping Through an RTL Unit Test”
I’ve used something similar to your step(…) task for my e unit tests. What I don’t agree with is the use of your nextSamplePoint() task. Synchronous logic implies that signals should be sampled at the appropriate edge of the clock. I get that your only focus is RTL and you don’t care about gate level portability (right now, but what about the future?), but IMO you shouldn’t be encouraging such a practice. I like to be as close as possible to what happens in real life. If you drive some stimulus to your DUT, you should sample its response on the next clock edge, as that’s when it’s guaranteed to be valid.
and here I thought the “testbench on the negative edge to avoid race conditions” was common practice! silly me 🙂
I see your point though I can’t ever see myself running these tests on gates. maybe that’ll change some day. if it did become an issue, at least the logic is centralized so you’d only have to change your nextSamplePoint() task.
Interesting implementation to addressing the need to have unit tests be independent of clocking mechanisms & mimic debug methodologies. I liked the approach & I agree one can accomplish these things in a variety of implementations. Thanks for writing it up, it’s always nice to see other approaches to common issues.
As far as when multiple clocks are concerned (& assuming all clock periods are different as 180 degree out of phase clocks won’t work with this approach without some minor modifications), I’ve created an “and_clk” which is “a_clk & b_clk & c_clk” & then utilize the negedge of the “and_clk”. Guarantees you are always at some midpoint location between posedges for all clocks that comprise the “and_clk”. I typically use this for when I need to de-assert a reset signal to ensure I do it with hold/setup to the previous/next clock edge for all clocks which just minimizes non-value-added headaches & also allows for dropping in gate models with minimal fuss.