A handy bit of guidance that I’ve gleaned from books on lean product development comes via the recognition that unfinished work sitting in someone’s in queue, which in lean manufacturing lingo is called inventory, is waste in your development process. Lean software practitioners take things a step further to describe untested and/or unreleased code sitting on a file server as inventory. I reckon doing the same can benefit us in hardware development.
It makes sense to me that built up inventory can be responsible for poor productivity or quality. Why? Think about it… do you do a better job when the amount of work in your queue is manageable or overwhelming? Do you work better when someone shows up at your cube with 10 feature requests or 1? Do you work better under the weight of 25 outstanding bug reports or no bug reports?
I’ll take manageable over overwhelming any day so I can see minimizing the amount of untested and/or unreleased code… I mean inventory… in your development process over time is a good thing. Minimize inventory and you’re minimizing waste. Minimize waste and you’re maximizing productivity and quality.
How about an example that applies to hardware development.
Think about the work flows throught RTL design. Designers are writing new code, reacting to bugs filed by the verification team and reacting to bugs filed by the implementation team. That’s a combination of new features (the green files in the pictures) and bug reports (the red files in the pictures) being added to their in queue while they push new/revised RTL to their out queue. As long as a designer can handle the work in his/her in queue without getting bogged down, everything is fine… which is usually the case near the start of the project when there’s only new features and not much for bugs. Effectively, a designer has 1 input stream and one output stream where he/she can manage the flow without interruption; new features come from a spec, new RTL passes smoothly to verification and implementation. Everyone has something to work on and things are manageable.
Now pile on a few bugs from the verification and implementation teams and watch system inventory start to grow. With 3 steady streams feeding their input queue (1 for new features and 2 for bug reports), the designer is suddenly overwhelmed and his/her inventory grows. While productivity might be the same – which I’ve actually learned is highly unlikely as a person is loaded beyond 80% capacity – it’s clear the designer has become a bottleneck. The implementation and verification teams are now waiting for new code so they’re not happy either.
So what do you do here?
- Keep piling up the inventory? I’d hope not. Your designer is going to burn-out. Quality and productivity are going to suffer.
- Add a designer? Possibly. You need increased design bandwidth and adding a designer is an obvious way to do that, but do you have a designer to spare? If not, then you’re going to have to get a little more creative than simply throwing more people at the problem.
- Train implementation and/or verification engineers to clear design inventory? Yes! The designer has become a process bottleneck. You don’t have anyone else to chuck into the fire. Implementation and verification engineers are already assigned to the project. They know the product and are working at a reduced throughput. Why not have them help clear design inventory?
I like the last option best, which coincidentally is found in lean manufacturing. Pretend you have 2 guys on a line, Joe passing an assembly to Jim. On Tuesday, Joe is moving a little faster so the line is starting to back-up at Jim’s station. Jim’s feeling the heat so instead of continuing to pile on, Joe steps over, gives Jim a hand for a few minutes, then goes back to his own station. Jim is relieved and everything goes back to normal. Next day, Jim is moving a little faster. Instead of sitting and waiting for Joe, Jim takes a few steps over and gives Joe a hand clearing some of his inventory, then steps back to his own station.
Pretty basic teamwork, no?
Now let’s imagine the same technique used in hardware development. In my example where the designer’s inventory is building up, implementation or verification engineers can help clear it instead of continuing to allow it to grow. Depending on the skill-set of the individual, the extra help could come in the form of some exploratory bug diagnoses, full-on coding of new features or anything in between. Obviously this does require implementation and/or verification engineers to step out of their area of expertise temporarily, but they’re doing so to improve overall process throughput.
It all comes down to working toward the good of the team instead of letting one person toil while 2 others wait around.
So you’re telling me to turn my implementation and verification experts into design experts?
I knew you’d ask that question! No… I’m not suggesting that at all. I’m pointing out that with minimal cross-training and a little hand holding, you could easily teach implementation and verification engineers to diagnose and/or patch RTL to burn off a little inventory in your development process. Hardware developers are smart responsible people, so to say it’s not doable seems odd to me. From experience in other industries, that kind of basic teamwork can improve quality and productivity so why not give it a try?
-neil
Neil – all good points. I’ve noticed two other reasons why queues are so damaging to projects. The less time that has passed from when the designer writes the code to when a bug is found, the faster the designer can fix the bug because it’s easier for the designer to remember what they were thinking when they wrote the code. Also, with a short interval there will have been fewer subsequent changes made to the code, so there’s less chance that debugging will follow paths that didn’t even exist in the version tested by verification. This is one of the great virtues of TDD in the software world. (I’m not aware of anyone doing TDD for RTL.) So, for the designer long queues for bugs increase the amount of time it takes to fix the bugs, and the decrease in real productivity can be quite large. (That’s one of the reason that bugs discovered in implementation can be so hard to find.)
The other reason I’ve seen for queues to be so damaging is that long queues mask the real progress made on a project. It’s very difficult to know whether untested work is really “done.” Is it sort of done? Done done? It’s way too common that functionally complete but untested code is considered 90% done, when in fact the testing and debugging may take as much time as the original coding. Long queues can hide the fact that an entire project is far behind schedule and cannot be completed on time. No matter whether it’s software, embedded software or RTL, we’ve found that reducing queues significantly reduces the overall project risk.
Brian
Hi Neil,
Interesting juxtaposition of Hardware development and lean manufacturing. Also your definition of “inventory” in hardware development is very appropriate.
To me a complete opposite solution comes to mind … rather than verification engineers doing design work (which I feel won’t work baring just the simple cases), have them focus on their own deliverables.
Change the flow of information and deploy the SoC (separation of concerns) principal. Instead of Spec feeding the designer, who passes on his/her finished goods to the Verification team. Have the Designers and the Verification engineers directly access the Spec and build their deliverables.
Designers go thru the testing/verification phase before checking in their code so that the need to file bugs is eliminated. This reduces the inventory throughout and as you suggest improve the quality.
However, this assumes a clear, constantly updating spec is written in the first place. If the spec is outdated, or management doesn’t give it a high priority, or verification team has to constantly hunt the designer for information then this scheme probably won’t work.
Anupam
Anupam, thanks for the comment.
I think if you look at the overall goal as minimizing inventory (aka: untested code) then I’d expect our respective “solutions” to be complementary rather than opposing. Ultimately, teams should be looking at techniques that synchronize the progress of design and dv such the neither gets too far ahead or too far behind. That has to be the goal. If it isn’t, then you end up with the situation that brian points out where the quality of the code comes into question and people can’t remember what they did when it comes around to debugging it.
What you suggest is an approach resembling TDD (or more likely ATDD) and I’m just starting to realize the impact it could have on bug prevention. After some pondering over the last month, I’ve come to think TDD will eventually become standard practice in hardware development, just as has happened in sw. There would still be imbalance periodically though so at the point design (or dv) inventory grows, you use a temporary re-assignment of people from one discipline to another to correct. Used together, I think you end up with a very good system for minimizing untested code.
thanks again for the comment!
neil
With the entry of OOPs and AOP based Verification themes, I see a gap between Designers and Verifiers widening. Some Designers shun the idea of cumbersome and bulky OOPs model and are more comfortable with the traditional approaches. Take a Simple example, RTL fellow adds a new interrupt bit to an already occurring situation in the RTL. That Takes him 1 hour at max, to compile clean his code and checkin. Mr. Verifier, thus proceeds to add a bit to his user configuration class, adds a testcase to create the scenario, tweaks his generators, tweaks his scoreboard , then run his entire regression to see nothing else broke and voila its 2 days. And a very candid RTLer can guffaw at this all, sighting examples of days when traditional techniques ruled and single threading was the norm. So What I trying to bring to Light here is that maybe Joe could help Jim, But Jim wouldn’t be as eager to help Joe.
But as stated, I do see a tremendous load balance, if RTL and Verif help each other out. Verifiers can do better debug and hand out a very thorough investigation, RTLer can Suggest test scenarios upfront if found missing in a testplan, also suggest ways in which a phased testing of a feature can be done, rather than testing a feature full blown on day one.
In the end, its the comraderie that matters and sees a least BUG RTL.
On the idea of completely testing an RTL block before moving to the next one: one situation where this is difficult, is when you are designing a number of new-technology blocks that all have inherent risk (new algorithms for example). Until you have actually coded them and sanity tested them, it is hard to evaluate your gate count, speed and even your project schedule. To alleviate the risk, we have found, it is better to code up a block and move onto the next block right away. But, as your article predicts, this has led to verification engineers debugging and fixing RTL.
A verification engineer who can dig in and debug and fix RTL code is far more valuable than one that is afraid to do so, or not approved to do so. And multi-functional engineers are more suited to the Agile principle of self-organization.
Paul,
from your text, I read “waterfall”: the RTL engineer hands-over some quality level of RTL to the verification engineer who then has to pick up the tab for all the “faults” in the RTL! TDD would say that the verification engineer gets his work in in parallel to the RTL development and you may end up with RTL engineers debugging the verification.