Martin d’Anjou <martin.danjou14@gmail.com>
version 1.0, 24 October 2009
(C) Copyright Martin d’Anjou – all rights reserved
Abstract
This article gives a simplified description of a continuous integration (CI) system that automatically performs the tasks of building, testing, and tagging the source code in an ASIC development environment.
The main goal of the CI system described in this article is to automatically find, on a frequent on-demand basis, first order issues such as syntax errors, coding rules violations, simple synthesis rules violations, missing files, and other similar problems in order to provide to the ASIC designers and verifiers, checkouts from the CVS database that are known to compile and pass sanity tests.
Introduction
Continuous integration is widely accepted as best practice in the agile community. In particular, it is recognized as a primary practice of eXtreme Programming. The motivation behind continuous integration is to limit the time lost debugging integration conflicts that inevitably arise during development with even the smallest teams. It advocates regular integration of small updates to the code base that are as localized as possible and easily understood. This runs counter to the tendency of some developers to code for days without integration, leading to widespread and possibly incompatible adjustments to the mainline code. The purpose of the CI system to automate the process of continuous integration so that regular integration is as fast and easy as possible.
The first part of the article describes the CI system from the point of view of the ASIC developer and the user commands to interact with the CI system. The second part of the article focuses on the details of the CI system itself. Lastly, we briefly talk about the project specific program that builds the ASIC and runs the sanity checks and tests.
The CI system from the ASIC developer point of view
The ASIC developers are mostly interested in creating new RTL code for implementing deliverable features, or in fixing existing bugs in the current code. Both activities are indistinguishable from the point of view of the CI system.
Tags are used by the CI system to manage the most recent correctly functioning version of code in the mainline development. To start a new feature development or a bug fix, the ASIC designer checks out a copy of the ASIC database from a central repository, using the most recent tag assigned by the CI system. For example, a feature called KXE needs to be developed on the gigantor ASIC, with the file versions being managed by CVS:
$ cvs co -d KXE -r ci_750_pass gigantor
This pulls in the CVS gigantor project at the ci_750_pass tag into a directory called KXE. This KXE directory becomes the working area for the KXE feature.
Pulling a stable baseline to a local working copy
With a local working copy in hand, the designer starts editing the necessary files to implement the KXE feature, and only those files. It is not recommended to tangle multiple features in the same working area. In this working copy, the designer is free to run all kinds of tests to verify his local changes. While this takes place, the CI system continues to create tags which lie ahead of the baseline checked out by the designer. At the same time, other ASIC developers continue to work and commit files to the HEAD of the existing files.
The files in the CVS database advance while local changes are made
When the KXE code is done, or when the designer needs to obtain changes that lie ahead of the ci_750_pass tag, the designer must then catch up on either or both of two fronts:
- CI tags that lie ahead of the local working copy (i.e. > ci_750_pass)
- the HEAD of the locally modified files
This can be accomplished by using raw CVS commands, or more easily with CI utility commands. The first catch up to do is with the tags that lie ahead of the local working copy:
$ ci-catchup
This command pulls in all the file sets that passed the sanity tests at the CI server since the local checkout to tag ci_750_pass. If conflicts arise due to the catch up, the designer must resolve conflicts. If this includes new files that don’t exist yet in the local working copy, it brings those files to the local working copy. If this includes files that have been removed in the most recent view of the central repository, they are removed locally as well. The whole idea is to bring the local working copy to a state identical the to central repository’s most recent stable baseline, leaving the local modifications uncommitted in the local working copy.
The other catch up is to the HEAD of CVS for the locally modified files. This is done with a little CVS helper script:
$ cvs-update-locally-modified-to-head
If there are merging conflicts, they must be resolved.
Once all the merges are done and the conflicts are solved, the designer can opt to execute the CI tests locally:
$ cd <top of CVS checkout>
$ ./bin/ci-local-tests
…
All tests pass. Done.
$
This command prints information as it progresses through the local tests. All output and logfiles are stored to some shared temporary disk space. All the logfile parsing is performed under the hood and the ci-local-tests command returns an exit status of zero if it passes, or non-zero if it fails. It also prints the pass/fail outcome. The designer fixes the problems if any, when satisfied, commits the local changes:
$ cvs ci -m “KXE feature” file1.v file2.v file3.v
The CVS database immediately after this commit looks like this:
View of the CVS database after the above commit and before sending CI request 753
As hinted in the figure above, the CI request is formed and sent to the CI system. A single command performs this operation:
$ ci-request “KXE feature”
Your request: 753
The ci-request program analyzes the local working area, forms a list of all files that differ from the most recent CI catchup performed, and sends this list to the CI system.
[NOTE] To simplify the explanation, all locally modified files are included as part of the CI request as opposed to letting the user cherry pick a subset of locally modified files.
The ci-request program returns to stdout the CI request id number. This request id number will become part of the tag after the CI system is done with the request.
A CI request file for the KXE feature CI request looks like this:
REASON: KXE feature
FILES:
gigantor/design/rtl/file1.v 1.234
gigantor/verification/kxe/file2.v 1.327
…
DELETED FILES:
gigantor/lib/file3.vr
It is simply a list of file names and version numbers, accompanied by a reason for the request. Deleted files do not have a version number since it is the HEAD revision that is used for the files that are to be removed from the baseline by the CI request. Removing a file from the baseline does not mean removing the file from CVS, it simply means that the CI tag will not be applied to the file, preventing its inclusion in subsequent stable baselines. Then the deleted files could be brought back, or CVS deleted.
[NOTE] CI tagging is a separate issue from physically deleting the file or removing it with cvs remove.
Once the CI request is sent, the local working copy can be removed immediately. The two reasons for this are that the CI request is processed in the CI system’s account and not in the designer’s local working copy, and that all the files have been committed. When the CI request has been fully processed by the CI system and the outcome of the CI tests are known, the files are automatically tagged by the CI system. Two tags are possible and correspond to the two possible outcomes. In the example above, the tag ci_753_pass is applied when the all the tests have passed, and the ci_753_fail is applied if one of the tests fails. The tag is applied to all the files in the CVS repository, except the file that are deleted. A CI report is emailed to the designer, with links and paths to the logfiles. In the worst case, the designer needs to perform a checkout at the ci_753_fail tag and execute bin/ci-local-tests to troubleshoot a failure.
Using multiple CVS checkouts
ASIC designers want to make the best use of their time so they tend to develop multiple features at the same time on the same files.
There is a big problem with this practice. Multiple features are tangled in the same local working copy. It is very hard to quickly deliver a feature when it is tangled like that. For this reason, it is best to have one working copy per feature being worked on, and update those local working copies using the commands shown above to catchup with the most recent stable code baseline being advanced by the CI system. Multiple local working copies are represented by the art below:
Designer can handle multiple working copies (and is happy)
CI system overview
The CI system is completely automated, meaning that no user intervention is needed for its operation, other than sending CI requests to the CI system.
As seen above, the CI request is a list of versions of files that are proposed to become part of the next stable baseline.
CI System overview
The CI system is a series of programs which together process a CI request in five steps:
receive, check and queue CI requests
- dequeue the CI request and populate a working area
- build, check and test the code
- tag the database
- send an email notification back to the designer
The next paragraphs explain each step in details.
Step 1: Receive, check and queue CI requests
The ci-request command is used to submit single CI requests to the CI system. This command stores the CI request in the SQL database, which then assigns it a unique id. This id is returned to the user on stdout, as we have seen earlier in this article. The SQL database is used to track CI requests over periods of time that exceed the life span of a CI request, and provides an overall project performance metric.
Pulling from the SQL database, a web server outputs pages where the CI request queue is made visible to all users, so they can see the upcoming features and bug fixes. Managers love to visit this web page. But most importantly, ASIC developers at Neterion can no longer live without this page working, as I sometimes find out in my Monday morning inbox!
Once the CI request id is obtained, the ci-request command submits the request via email to the CI robot’s email account, which is just a unix account with email. This email account is capable of running the unix procmail program. Upon reception of the email, the robot (implemented as a program called by procmail) checks the legitimacy of the request and re-orders requests by CI request id number (emails are not always received in the order CI request ids are created – so they are reordered here).
The CI requests are queued in a specially configured Platform LSF queue, by the program executed under the procmail environment. It is not really the CI request that is queued, but rather the ci-request-processing program, with the request id as one its command line arguments.
When the instance of the ci-request-processing program that processes CI request 753 reaches the head of the queue, it is dequeued and executed.
Step 2: Dequeue and populate
The dequeuing step is performed by the queueing system. In the case of Platform LSF, it takes the command at the head of the queue, sets up the environment and the paths, and executes the ci-request-processing program. The ci-request-processing program calls other programs that perform the rest of step 2, as well as steps 3 through 5.
The first operation performed by the ci-request-processing program after dequeuing is to populate the robot’s local working copy with the most recent passing code baseline (e.g. ci_752_pass). On top of this baseline, CI request 753 is applied. Applying the CI request is done by running cvs update commands with the versions listed in the CI request database entry for CI request 753.
If anything fails at this step, the CI request is rejected, and ci-request-processing jumps to step 5 immediately.
Step 3: Build, check and test the code
Surprisingly, the CI system does not know how to perform the details of this step. It only knows to run the special ci-local-tests command and to examine its exit status, so that is what it does. The ci-local-tests is a project specific program that builds and runs the sanity tests for the project managed by the robot. The ci-local-tests command also runs synthesis checks when either RTL is submitted or synthesis scripts are changed. The ci-local-tests file itself can be modified by users and submitted to the CI system as it is not part of the CI system but rather part of the system under test.
If ci-local-tests passes, its exit status is zero, and if it fails its exit status is non-zero. This is all the ci-request-processing program looks at.
[NOTE] It is worth noting that the faster ci-local-tests runs, the quicker the turn around time at the CI server. Techniques that make ci-local-tests execute as fast as possible include the extensive use of GNU Make and its ability to build prerequisites in parallel on a queueing system. Using GNU Make to optimize this step has resulted in a 12-minute turn around time at the CI server per CI request at Neterion Corporation where such a system has been in use for several years. The ci-local-tests program covers key block level and top level environments, synthesis checks, and critical script changes.
[NOTE] It is also worth noting that the CI system is not concerned with logfile parsing as the exit status is all that is needed to make a pass/fail decision. All logfile parsing is buried in the deeper layers of the programs called by ci-local-tests.
All the logfiles, compiler output and post-processed files are available to the users while the CI request is being processed, as well as after the CI request is fully processed. A cronjob ensures all output resulting from processing a CI request is available for a period of up to 7 days. After this period, it archives the most important data (logfiles) and throws away the intermediate compiler output. The last 7 requests are always kept in full.
Step 4: Tag the code
If ci-local-tests passes, a tag of ci_<request_id>_pass is applied to the entire CVS database marking a new baseline of stable code (except for the deleted files which are not tagged). In the example we’ve used so far, this would be ci_753_pass.
If ci-local-tests fails, a tag of ci_<request_id>_fail is applied. In this case, the tag marks a failed attempt at moving the stable baseline. The value in marking the failed attempt is it gives the ability to easily reproduce the failed baseline and fix it with a new CI request.
The new baseline tagged
All the files (except the deleted files) are tagged with ci_753_pass or ci_753_fail: the files of CI request 753 are tagged at the versions specified in the CI request, and the files from the previous stable baseline, which are not specified in the CI request, are tagged at their previous stable baseline (ci_752_pass).
Step 5: Send email notification to the designer
The last step is to send an email back to the original designer, with a copy to the CI administrator. With proper email forwarding, the designers can reply to this email and contact the CI administrator in case they need clarification on the status of their CI request.
This concludes the tour of the CI system. Many details have been left out, but hopefully this gives the reader a good view of a Continuous Integration System for an ASIC development environment using CVS for source code management.
The next section briefly talks about the program behind which the project specific magic happens: ci-local-tests.
ci-local-tests: the project specific command
The ci-local-tests is not a part of the CI system, but rather a project specific program which performs the build of all the block level environments, of the top level environments, the execution of sanity tests, and of various other checks.
Among other things, the presence of RTL in the CI request triggers the execution of various RTL synthesis and checking programs. When RTL is present, the turn around time at the CI server ranges from 20 minutes to 2 or 3 hours per request. However, we have found that the turn around time of non-RTL CI request is so fast that it encourages frequent commits and thus continuous integration of new code into the ASIC database.
The ASIC verification engineers use the ci-local-tests program to run a quick verification-only unit test on the packet generator. This keeps the verification environment stable.
Designers have found that running ci-local-tests locally on uncommitted code is much faster than attempting to run each block environment sanity test by hand because ci-local-tests is optimized for parallel execution. It has thus become the standard command to run to verify that local changes do not break existing code.
Since the ci-local-tests runs quickly, iterating through first order bugs can be done multiple times per hour, in all the block level and top level environments at the same time, without the need to wait for long regressions.
[NOTE] At the same time, anything not covered by a sanity test still suffers from basic integration problems.
Having a single point of entry like ci-local-tests into the CVS project has allowed the CI system to remain independent of the CVS projects it manages. As long as those CVS projects provide the ci-local-tests program in the same relative path, the same CI system can be used without being re-configured for each ASIC project.
The CI system is however strongly tied to the CVS source code management tool.
Conclusion
Fast turn around time at the CI server and ease of integration with current CVS practices are the two most important characteristics of the CI system presented in this article. There is great value in having a CI system continuously at work to move the stable code baseline ahead several times per day.
The ASIC designers and verifiers at Neterion are no longer hampered by broken code. The physical layout team can pull from any of the passing CI tags whenever it needs to start a synthesis round, knowing that all first order issues have already been addressed and pre-synthesis checks have passed.
With the CI system, the team gets feedback on the code several times per day. The team has adopted ways to implement features that maximize the chance of having successful CI requests. They are also better able to predict the scope of the upcoming features and bug fixes because they are constantly aware of all the sanity tests that keep their code in check. They commit code often and want it tested often too. Gone are the days where code was thrown over the wall to verifiers or to the backend team. ASIC designers no longer keep multi-feature, massive code changes to themselves for weeks, and now mostly prefer smaller, more frequent incremental commits because they get LOTS of passing CI requests for their work.
Although this CI system is based on CVS source code management, the same principles are applicable to other source code management tools.
Acknowledgments
The CI system has been in use for several years at Ottawa based Neterion Corporation, and is the results of years of fine tuning, and incalculable user feedback. The current version is written almost entirely in GNU Bash version 4, and the ci-local-tests uses a locally modified version of GNU Make 3.81 which aborts at the first error. The CI system is an original idea by L. Smith, implemented by C. J. Sheppard, improved by B. Morris and re-written and optimized by the author. The author would also like to thank the reviewers, P. Bird and N. Johnson, for carefully reviewing this document.
The author also wants to thank ams for the help with the GNU Make hack that made the 12-minute turn around time possible, Paul D. Smith for patiently answering his numerous GNU Make questions over the years, and greycat for his amazing BashFaq, Wiki and patience on the freenode #bash channel.
Suggested reading
- The GNU Make manual
- Recursive Make Considered Harmful
- bash version 4 man page
- A CVS book
- How to get make to terminate itself at the first error
- Procmail tips
(C) Copyright Martin d’Anjou – all rights reserved