Neil Johnson – Principal Consultant, XtremeEDA Corp

Basic Sanity in Hardware Development

You’ve written your test plan, defined–and re-defined–your coverage model, architected your verification environment, coded your environment, wrote your first test, used that test to debug your own code and resolve the first serious issues with your design. Time has passed and after months of effort, you finally–finally–have your first packet through. It doesn’t matter that it’s the simplest packet possible or that it barely qualifies as being part of a real use case, this packet is special because it means that this week you get to report real progress: you put a packet in and you got a packet out.

Basic sanity is a critical milestone in hardware development because it provides the first objective measurement of a team’s progress. Basic sanity is a celebration milestone. Everyone understands it, development team and management team alike. It’s significant technically because it provides an indication that a device is verified to be at least partially correct. It’s also significant mentally and emotionally because it’s the first time the team sees weeks and months of hard work paying off.

This article aims to reinforce the importance of the basic sanity milestone with an approach to building a sane design faster. The most significant characteristic of the approach is that it requires no new tools nor does it depend on state-of-the-art languages, verification libraries or techniques. Teams do the things they’ve always done. All that changes is when they do them.

The Usual Road To Design Sanity

There are a number of common steps that every team will take on their way to finding a functionally sane design.

Planning

Most products start as a high-level functional specification. From that functional specification, the design, DV and modeling teams derive a set of project artifacts to define development with respect to their particular function. Design teams create detailed architecture documents. DV teams create a verification plan that includes a description of the DV environment, functional coverage model and/or test list. A dedicated modeling team builds their own architecture document with descriptions of the modeling algorithms. All create a detailed schedule that is rolled into one master schedule.

The amount of time dedicated to planning and rigor of the process varies greatly depending on the team. The results however, are similar between teams in that the planning phase is used to define the details and pace of development for the entire project, kick-off to delivery.

Coding

Plans in hand, the design team goes away and creates the design, the DV team creates the DV environment and the modeling team builds the reference models. Development of each is normally measured block-by-block where developers complete one block before moving on to the next.

Compilation

While developers more than likely compile their code as they write it, there is a point where both the DUT and modeling components are integrated into the verification environment and the entire blob of code is compiled and simulated for the first time. Tabulating the list of files and cleaning up first order syntax errors are first priority at this point. Any obvious problems with respect to low level compatibility are rectified here as well.

Sandbox Test Debug

What’s a sandbox test? It’s not the sanity test but it’s a start. The sandbox test is an un-official, possibly even private test (i.e. not submitted for revision control) that doesn’t exist in the verification plan. A sandbox test is where a lot of design and DV engineers begin the debug process. It starts as an extremely basic test and evolves as the maturity of the design, DV environment and reference model permit.

Because a developer is working with pre-sanity code, they are looking for visual confirmation that development is on the right track. That being the case, the checkers in a sandbox test are log messages, file IO, waveforms and eyeballs as opposed to checkers and scoreboards.

A sandbox test will go through a few recognizable stages. As each stage is cleared, the sandbox test gets a little more involved until it eventually becomes a sanity test.

Clock and Reset Sanity

For clock and reset sanity, verifying clock and reset propagation throughout the design is the goal. The simplest thing to do here is to look for X’s and Z’s to settle drive conflicts and no-connects. This is normally pretty easy from a functional standpoint depending on the number of clock and reset domains in the design. But as simple as it is, if clock and reset sanity is skipped it’s sure to cause problems down the road (i.e. “why the heck am I getting X’s on my data here?”).

Design Configuration and Power-up

There aren’t many cutting edge ASIC and FPGA developments that don’t rely on some sort of embedded software component. For sanity, however, embedded processors and software are usually excluded because of the performance hit. What is verified during this process is access to relevant configuration space and the power-up procedure. This is where the first real functional bugs are found. Incompatibilities on the pin-level interface or in the definition of the address space are common and relatively easy to identify. Incorrect register settings or misunderstandings when it comes to power-up procedure are more difficult to diagnose and often creep further into the process of finding sanity.

Confirm Design Response

Once the design is configured and through the power-up procedure, it’s time to start applying stimulus. If it’s not part of the default behaviour of the DV environment, the sandbox test is updated to initiate start a data transfer, exercise a peripheral, trigger an ISR, toggle an input or perform some other relevant behavior that is supposed to result in a response from the design. That response rarely happens on the first shot and it never happens correctly.

This is the start of what is usually the longest and most intense debug cycle of the entire development effort. Debug starts with how the DV environment is stimulating the design. Eyeball checking ensues, as debug effort filters down through design layers and eventually ends with an eyeball confirmation of an expected response. If the team has done it’s homework, the worst case outcome will be bug fixes. In some cases, however, this step will uncover the first major architectural deficiencies and initiate the necessary changes.

Closing The Loop

With the design responding correctly to sanity-level stimulus, the final hurdle to jump is the termination of that response in the DV environment.

In a directed test, the design response might be compared against some pre-determined, test specific response. In a constrained random environment, this is more involved and means debugging the automated checking path through the reference model and scoreboards. The time dedicated to debugging the automated checking can rival that spent on debugging the design because most of the same characteristics apply to both the reference model and design. It is also the first opportunity to identify unintended differences in functionality between the design and reference model.

Committing A Sanity Test

The design is sane so it’s time to put the finishing touches on the sanity test and commit it for revision control. The sanity test likely requires some fine-tuning and cleanup to qualify as part of a formal test list. A team may also try running the sanity test with several different seeds to ensure that any configuration and stimulus have been properly constrained and resolved.

How Sanity Can Drive You Crazy

Initial debug of any design can feel like an open-ended process where you never really know where you are with any certainty until you’re almost done. What makes the road to sanity so frustrating?

“It’s been a while, do we know if this thing works yet?”: that’s a common question from management. The team has done a lot of development and made good progress with the planning and coding but many managers seem to only understand tests passing[1]. Until your first test passes, it’s hard to come up with a good, objective measure of how well the design functioning that satisfies a manager’s curiosity.
Sanity requires code that’s done: when development proceeds block-by-block, the last block to be coded is normally the one that gates sanity testing (give or take a few weeks). Waiting this long to verify a design is sane can lead to intense schedule pressure.
You can’t find compatible versions of the DV environment, RTL and model: there are tons of files and you don’t know where they all live or which versions to be using.
You’re forced to debug code that doesn’t matter yet: You’re only ever interested in such a small part of the design, DV environment and model to achieve sanity. Yet because it’s common practice to write all the code up front, you end up hitting problems that ideally you shouldn’t have to encounter until well into the DV effort.
Intent has changed but code hasn’t been updated: take anybody that is writing for code for 3 or 4 or 6 months straight–maybe even longer–and compare the code they wrote first to the code they wrote last. Then measure all that code against the developer’s intent realizing that their intent has very likely evolved over time. A lot of early code will either need to be updated or removed entirely though that never seems to happen before hitting that stale or superfluous code during sanity testing.
Nobody recognizes their code when asked to debug it: take the same person and ask them to debug the code that they wrote 3 or 4 or 6 months ago. That code will be hard to recognize to say the least making bugs hard to diagnose.
You don’t have a goal for sanity: this is a trap primarily for teams doing constrained random testing. You start with a constrained random test (more random than constrained) and go where ever the test takes you as opposed to limiting stimulus and configuration to a sanity subset. This happens when teams fall for the mystical 1-test-testsuite: one unconstrained test that is supposedly able to cover the entire design space.
The DV environment doesn’t match the design: as a verification engineer, it can be frustrating to build a bus model according to specification, only to find that what you’ve built is incompatible with the current version of the design and no one has told you what has changed.
The design doesn’t match the reference model: same as the above but this can be particularly frustrating for DV engineers because they’re debugging code reams of code that belongs to two different teams.
“Uh-oh… I didn’t know <it> worked like that”: the granddaddy of them all… it’s 4:30 on a Friday, you’re 5 months into the project and you’ve just now discovered that the assumptions made during planning the design, DV environment and/or reference model are entirely incorrect. Back to the drawing board!

A Faster Way To Sane Hardware

All engineers are used to using decomposition as a means of simplifying problems. The practice of module and sub-system level verification prior to verification of a full chip is a perfect example. Verifying an entire chip all at once can be an extreme challenge so teams break the challenge into smaller manageable pieces, verify them in isolation and then put the pieces back together. This is decomposition; turn big problems into a series of smaller problems and then solve those problems one manageable piece at a time.

The approach detailed in this section involves treating one project as a series of smaller projects. The first project is called Operation Basic Sanity and the first deliverable is a sane device. Treating one project as several smaller projects is simply an extra level of decomposition that is not normally done in hardware development. Instead of stopping at the module or sub-system boundary, the development challenges are decomposed further to the feature/function boundary. A chip is broken into modules; the modules are broken into features. Features are incrementally verified to deliver a module; the modules are individually verified to deliver a chip. The planning is the same. The code is the same. The deliverables are the same. The only difference is that all of the above are done in smaller pieces.

Start With a Definite Goal

Since achieving basic sanity is the first truly objective milestone the team is due to hit, make that the development goal from the outset. Start with a goal of achieving basic sanity and make sure that goal is shared by the entire front-end development team. Determine a set of minimum criteria for basic sanity and work together to achieve it.

Rev 0.1 Architecture Planning

When teams take the time to plan in detail to the end of the project, they fool themselves into thinking they’ve got all the bases covered, but plans necessarily change due to dynamic project circumstances. Change will happen regardless of how strongly it’s resisted. The only viable option for teams is to adopt a planning strategy that allows them to absorb change.

This approach to achieving basic sanity as quickly as possible assumes the team will spend a comparable length of time dedicated to planning. The planning, however, gets spread over the life of the project as opposed to it all happening up-front. The planning starts with basic architectural decisions. Later on comes the implementation details.

In the initial planning, a team should limit itself to coarse architectural decisions. Call this the rev 0.1 architecture planning. It should be descriptive yet also as concise as possible. The details will almost inevitably change so avoid them until later in the process.

Design Architecture Planning

Identify modules and interfaces/protocols
Create block diagrams that show partitioning, hierarchy and connectivity
Estimate memory requirements
Identify high-level operating modes
Assign functions and requirements across each module

DV Environment and Reference Model Planning

Identify test hooks, environment components, transactions and interfaces
Create block diagrams that show partitioning, hierarchy, connectivity and interaction
Assign requirements and functions across each component
Devise a high-level re-use strategy
Devise a model integration strategy

Integrate First

Integrating code first is intended to confirm some of the architectural decisions made during the rev 0.1 planning. Integration requires that components be defined and connected but don’t yet contain functionality. In effect, the team builds a skeleton of the design and reference model and integrates both into the testbench.

Building A Design Skeleton

Define modules and interfaces
Instantiate and connect modules and memories (block and top level)

Defining DV And Modeling Components

Define interfaces, methods, components, transactions and other public members required for connectivity

Integrate The Design and Model Into The DV Environment(s)

Instantiate and connect components in block level environments
Integrate the DUT and model into the block level environments
Create a top-level environment by instantiating and connecting components and/or reusable block level environments
Integrate the DUT and model into the top-level environment
Ensure all of the above compile successfully in the simulator

Review And Revise The Rev 0.1 Plan

The point of integrate first is to confirm assumptions made during the architecture planning so before moving on, it’s important to measure the skeleton code against the rev 0.1 plan. There will almost certainly be cases where the exercise of defining and integrating components has nullified some assumptions made during planning. For each of those cases, use the knowledge gained from building the skeleton code to update the rev 0.1 plan.

The review of the rev 0.1 plan is the first of many checkpoints where the team reviews it’s progress against their plan and makes updates as required.

complete a portion of the detailed planning
write the code that corresponds to that portion of the plan
review the code against the plan, updating either or both as required
repeat steps 1 – 3 until done

Rev 0.2 Implementation Planning

With some architectural and partitioning decisions confirmed by the integrate first exercise, it’s time to start planning the details of implementation for each of the design, DV environment and reference model. Consider this the start of a project within the project where the deliverables are of a much smaller scope (i.e. a small device with a handful of sanity-level features).

The rev 0.2 updates to the existing plan includes all the implementation details required to get to basic sanity but not much further. Just as any team would normally start, consider the features that are required. Plan how those features are to be implemented in the same level of detailed that the team is used to using (i.e. module/component/class descriptions, timing diagrams, FSM state diagrams, etc).

Write The Sanity Test

The team has already decided the criteria for basic sanity so forego the sandbox test and go straight to writing the sanity test. Compile and run the test with the skeleton DV environment (hint: the test should fail!). Commit the test to revision control so that it is accessible to the entire team.

If your team is using a constrained random approach, seriously consider a directed test for basic sanity. A directed test is another way to narrow down the state space and avoid losing focus on the sanity objective.

Write The Sanity Code

Code the features and functions that are required for sanity within the skeleton, populating each component to the extent that it supports the basic sanity objective. Be aware that the team will not progress block-by-block, as would normally be the case. While some components may be completed entirely in the first round of coding, others may include very little code or even none at all. This applies to all of the design, DV environment and reference model.

Debug The Sanity Test Pass

The only substantial difference when it comes to debugging the sanity test with respect to what is traditionally done in hardware development (where all the coding is done at once) is that there is less code. Regardless of the different in code base at the time the sanity test first runs, that code is sanitized using the same steps: verify clocking and reset logic, verify configuration and power-up procedures, apply stimulus and verify the response from the design, close the loop through the automated checking and commit the final version of the sanity test.

Review And Revise The Rev 0.2 Plan

Sanity ends with a review of the rev 0.2 plan. As a group, compare the code and functionality of the sanity test against the rev 0.2 implementation planning. Revise the plan if necessary to keep it current.

Final Considerations

As discussed, the approach outlined above does not require a team to move very far out of its comfort zone in order to dramatically change how it delivers results, particularly early in a project. The underlying theme of the approach is to decompose a design further in pieces that correspond to functions and features. Then, working together as a front-end team of designers, DV and modeling engineers, build and verify those functions a features. While sanity is the first major functional milestone, teams are encouraged to continue with the approach of developing features in plan, code and review cycles.

The final word on this approach is reserved for planning. The motivation behind spreading detailed planning across the entire project comes down to a matter of visibility. If a team feels that they have absolute visibility into the requirements of a product and they are confident that the details of the plan will not change, they may choose to complete their planning before any development begins. If, like most projects, however, there is some feature creep expected and/or some development or exploration is required before the details of the design solidify fully, seriously consider delaying planning and implementation decisions until they are absolutely necessary. This just-in-time (JIT) decision making can help to eliminate re-work of original planning, enable teams to make better decisions based on feedback from past development and keep project documentation current.

Summary

To close, let’s restate the importance of basic sanity. Basic sanity is the first time a product is objectively verified as having performed some subset of its functionality successfully. Basic sanity is tangible, identifiable and meaningful to members of the development team, management and customers. With it comes a confirmation in direction, increased confidence on the development team and a tangible baseline for subsequent development.

Development teams should be getting to basic sanity as fast as possible. The approach detailed in this article, coupled with the tools and techniques a team has always used, can help them do it.

[1] to be fair, some managers are starting to understand functional coverage results. But coverage results don’t mean much until they come from a passing test.

AgileSoC

Bring Agile to the World of Hardware Development

Operation Basic Sanity A Faster Way To Sane Hardware