Pretty much every testbench I’ve ever built, used or seen has a free-running clock that’s driven within a while or forever loop. Not much can happen without the clock in a synchronous design so defining the clock logic is usually the first and most obvious thing we do as verification engineers.
Assuming your design-under-test is synchronous to the positive edge (they almost always are), testbench components usually do their work somewhere off the posedge to avoid races. With even a simple testbench these days, there will be several components, each with their own thread, pushing or pulling data from various interfaces on the DUT. The free-running clock is that steady bass drum beat that holds everything together.
When I started unit testing RTL in SVUnit, I carried on with this same clocking strategy. I had a free-running clock in my unit test harness and everything that happened in the unit tests did so relative to edges on the clock. After a while, though, synchronizing to the free-running clock in my unit tests started to feel a bit awkward. I can’t quite put my finger on why, but I think it had something to do with constantly being tripped up by not knowing exactly where I was in a sim relative to the clock and what exactly was happening. Was I waiting, driving stimulus, checking outputs or some/all of the above? As the free-running clock started to feel awkward, I took a step back and overhauled my clocking strategy.
I think “normal” testbenches use the free-running clock because many components need to stay synchronized and having 1 clock master is really the only way to manage that. But in a unit test, there is no need for the same kind of multi-component synchronization. As long as your tests stay simple, there isn’t much in terms of synchronization that happens elsewhere. So what I ended up trying is a clocking strategy that looks more like my unit tests are stepping through a code debugger.
Just as you’d see in a debugger, when I need to move ahead in time I call a step(n) task. The ‘n’ argument is the number of cycles I’m stepping through. If I want to advance 4 clocks, for example, I ‘step(4)’ instead of the ‘repeat (4) @(posedge clk)’ I might do with a free-running clock. This change alone gives my tests a little different feel.
Considering unit tests can master the clock directly with the step(n) task, I felt like I also had a chance to update how I drive and sample IO on my UUT. What I wanted was to use the step(n) function to go to a specific point in time, then drive and sample everything necessary without inadvertently causing any race conditions. In other words, I wanted simple procedural tests with 1 thread of execution (i.e. no fork/join) where I could forget about the timing details.
To start, I added a nextSamplePoint() task that would take me to a safe place away from the positive edge. Assuming I’m on the posedge when I call it, nextSamplePoint() takes me 1ns passed the posedge (I chose 1ns but I this could be any delay you like).
I made all these changes incrementally, as it made sense to do so, so by this stage in the overhaul I could drive an input, use the step(n) to jump ahead ‘n’ clocks then use nextSamplePoint() to advance to where I can safely sample a response. Here’s what that would look like in a test where I stall data flow on the first sample point, step ahead 1 clock, proceed to the next sample point and check outputs on a write port…
To make the tests shorter and more readable, I ended up calling the nextSamplePoint() task from within the drive/sample helper methods. In doing so, the above test became…
Lastly is the situation where I need to drive and/or sample several pins on the DUT simultaneously. In these situations, I wanted to sync once, drive and/or sample several of the IO, then carry on. What I chose to do here was update the nextSamplePoint() method so it only takes effect the first time it’s called on a posedge (i.e. advance 1ns if I’m at a timestamp that’s an even multiple of clkPeriod) which means it ends up looking like this…
and a test that drives and samples the UUT looks like this…
It’s not shown here, but both the expectEgressPixel() and expectEgressPixelRead() are checkers that first call nextSamplePoint(). Because only the first advances the simulation time 1ns, both expectations are evaluated on the same timestamp. So regardless of how many drive and sample methods I call without stepping, if they all call the nextSamplePoint() task before doing anything else, they all take effect on the same timestamp.
The last code snippet I have has an update to the step(n) task where it accommodates for the 1ns delay of the nextSamplePoint() task. The step(n) task would normally advance an entire clock period. If however, the nextSamplePoint() has just been called, step(n) would advance 1 clock period less the 1ns delay.
I’m not sure if this would qualify as a major deviation from the norm but it did feel major enough to offer it as a suggestion for people doing RTL unit tests. Now that I’ve gone through all this, I feel like actively mastering the clock from within unit tests has a few advantages over the free-running clock:
- actively mastering the clock with the step(n) task (like you would in a debugger) feels more intuitive than passively synchronizing to a free-running clock;
- the conditional delay in the nextSamplePoint() task makes it easier to simultaneously drive and sample IO on the UUT without needing to think about delta cycles and race conditions;
- the conditional delay in the nextSamplePoint() task is a little like defining delays in interface modports… though in all honesty, those modport delays have always kind of confused me (am I the only one??) whereas this nextSamplePoint() function is pretty easy to understand; and
- by packaging the step(n) and nextSamplePoint() tasks in a macro, I can use the same logic in multiple test harnesses so clock mastering and synchronization is the same across many unit tests.
Granted, I could probably have provided the same step(n) and nextSamplePoint() API with a free-running clock so maybe the real lesson here is to remove any direct dependencies between unit tests and the clock, not so much duplicating the implementation I’ve chosen (though obviously I prefer the what I have or I wouldn’t have written about it 🙂 ). I will point out that one downside is that this strategy falls apart when you have multiple clocks, but the lessons I’ve learned about design partitioning will at least partially address that particular downside!
More lessons to come, but so far this looks like a move in the right direction in terms of writing decent unit tests for RTL.