Eliminating the "Long Loop" in FPGA Design

Contributor: GateRocket, Inc.

July 12, 2010 -- The number of FPGA design starts continues to grow at an ever-increasing rate. In some cases, teams which previously focused on ASIC designs are migrating to FPGA implementations. This is because modern, high-end FPGAs have the capacity and performance capabilities required by many applications, but without the expense, risks, and time-to-market delays associated with ASIC technologies. In other cases, research and development teams working on projects like robotic vision systems may need some way to accelerate their algorithms, and FPGAs offer an ideal solution.

One thing that is common to newcomers to the FPGA domain is an underlying belief that working with these components is relatively easy, fast, and painless. Most folks involved in any form of electronics design have at least a rudimentary understanding of how FPGAs work. In particular, they know that an SRAM-based FPGAs can be reprogrammed with a new configuration as required. Unfortunately, there is also an unstated impression that the process of capturing a design, translating it into a configuration file, and loading that configuration into the FPGA consumes relatively little time and effort.

This may have been true 20 years or so ago when FPGAs contained the equivalent of only a few thousand logic gates. But today's state-of-the-art FPGAs can contain the equivalent of millions of logic gates, thousands of DSP functions, megabits or RAM, and a multitude of other hard IP core functions. This causes major problems with regard to verifying the functionality of the design because a software simulation run of the full-chip RTL that once completed in hours can now take days or weeks.

The solution is to migrate as much of the design into a physical FPGA as soon as possible, because this will allow those portions to be run at-speed, and it will also dramatically reduce the loading on the software simulator. Unfortunately, many elements of the design process are being stressed to the breaking point. For example, full-chip logic synthesis and place-and-route (PAR) runs that used to complete during lunch can now exceed 24 hours. This means that whenever a bug slips through to the system test lab and requires a change to the FPGA design, it can take more than a day to get the device re-programmed with a fix ready for testing.

The result is a "Long Loop" with regard to detecting, isolating, debugging, and fixing a bug. In many cases, actually identifying the source of a bug can be problematical, because bugs can be introduced at any stage of the design process. Furthermore, since one bug may mask several others, it is not uncommon to re-spin the FPGA and re-test it in the system, only to discover that additional changes are required. It's easy to see how this slow, iterative, "Long Loop" process can become unwieldy, and can lead to weeks or months of project delays. So, is there any way in which we can eliminate the "Long Loop"? Read on...

I see bugs everywhere...

When it comes to FPGA design, bugs can be introduced anywhere in the design flow. Consider the (very high-level) view of the design flow illustrated in Figure 1. Purely for the sake of these discussions, we'll restrict ourselves to considering only a few of these design flow elements: IP selection and integration, RTL design, synthesis, and place-and-route.
 

Figure 1. Bugs can be introduced at any point in the design flow.


Let's start with the IP. In the case of ASIC designs, any third-party IP is typically presented in the form of RTL (it may be encrypted, but it is still RTL). This means that the RTL representations that are used for initial software simulations are subsequently synthesized, placed, and routed along with the rest of the design. This provides a reasonably high level of confidence that the RTL and gate-level representations of the design are functionally equivalent.

In the FPGA domain, by comparison, it's common to be presented with two models: a high-level representation containing behavioral constructs for use in simulation, and a gate-level representation to be incorporated into the FPGA. The problem is that there may be subtle differences between the behavioral and gate-level representations, and these differences only manifest themselves when the FPGA design is deployed in its target system.

Or consider the RTL that you capture yourself. Following software simulation, you typically have a high level of confidence that your RTL is functionally correct, so when you synthesize the design and load it into the target FPGA, you may not initially consider these functions as being the source of any errors. Eventually, you realize that it's your RTL that's at fault. You re-check your simulation results but these still appear to be correct. So next you add some debugging logic around what you think may be the problem and then re-run synthesis and place-and-route, all of which may take hours or days.

And still the design in the FPGA doesn't work? What is going on? What you don't see is that your simulation runs are ignoring the pragmas you added into the RTL for use by the synthesis engine. Perhaps one of these pragmas told the synthesis tool that it could make arbitrary decisions about unspecified choices; maybe this results in a register being overwritten when an unspecified address is written to inadvertently; and maybe this is contrary to what happens in the software simulator.

Or consider the tools themselves. Generally speaking, we tend to believe that synthesis tools are much more robust than they actually are. In reality, even though some synthesis tools have been around for years and years, users are still logging bugs against them. One problem is that today's designs are extremely large and their corresponding synthesis runs can take a long time, so the developers of the synthesis engine start to perform aggressive optimizations. But every time a corner is cut it's necessary to account for an enormous set of conditions, which sets the scene for errors to be introduced.

And problems aren't limited to differences between simulation and synthesis. In many cases these two tools may perform their roles as expected, and then the place-and-route engines make their own decisions and optimizations that introduce unexpected functionality (read "bugs") into the design. For example, the place-and-route engines may decide that a register has to be initialized to some state, so they make arbitrary choices that can cause the silicon to do something odd and expected.

All of these bugs are insidious, because you don't know what is happening and you can't identify the problem because every step in the process appears to produce the results you expect ... until you reach the programmed FPGA. Simulation was fine, synthesis was fine, place-and- route completed without any warnings or errors, the netlist loads without issues, but the FPGA doesn't work in the system and resolving the issue is going to require you to cycle many times around the "Long Loop."

Eliminating the "Long Loop"

One solution is to combine actual FPGA hardware and RTL simulation models in the same verification run. This can be achieved by means of a RocketDrive and associated RocketVision software from GateRocket. The RocketDrive is presented in the form of a removable "caddy" that plugs into a standard drive bay on a desk-side workstation. RocketDrives come in a variety of models, each targeted toward a different family of FPGAs from Altera or Xilinx. In each case, the RocketDrive contains the largest member of the family with which you are working.

Let's consider a typical scenario involving a new project. In some cases this new project will be based on a previous generation of the product and/or platform, in which case you will have access to a number of previously proven functional blocks. Using RocketVision, you can direct the system to place all of the previously proven blocks in the RocketDrive, and to keep any unverified third-party IP blocks and any new blocks you've developed in the software simulator. This immediately lets you benefit from the acceleration of much of the design yielding dramatically faster simulation iterations.

As each new block is verified at the RTL (or behavioral) level in the context of your full-chip design, its synthesized/ gate-level equivalent can be moved over into the physical FPGA in the RocketDrive. As soon as a problem manifests itself, the verification run can be repeated with the RTL version of the suspect block resident in the simulation world running in parallel with the gate-level version realized in the physical FPGA. By means of RocketVision, the signals from the peripheries of these blocks (along with any designated signals internal to the blocks) can be compared "on-the-fly."

Using this technology — combining conventional simulation with physical hardware and an appropriate debugging environment — it's possible to very quickly detect, isolate, and identify bugs, irrespective of where they originated in the FPGA design flow. Once a bug has been isolated to one block of the design, a change can be made to the RTL representation of that block, which can then be re-run along with the hardware representation of the other blocks. In this way, a fix can be immediately tested and verified without re-running synthesis and place-and-route, and with only the suspect block running in the software simulator.

This technique provides the ability to make multiple design-change-debug iterations in a single day. This approach can also reduce the number of RTL-to-bitstream iterations by 50%.

The end result of RocketDrive and RocketVision is that design and verification engineers now have the ability to see how the design behaves in the physical chip running like it will in-system while still having access to all the capabilities and flexibility of a software simulator. This new technique lets engineers quickly detect, identify, and correct differences between the original RTL and the physical chip. In addition to accelerating verification runs by a factor of 10X or more, this new approach reduces the in-silicon debugging process by a factor of 30X, thereby dramatically speeding the debugging of the FPGA design.

Using this technique to eliminate the "Long Loop" in the FPGA design flow can save weeks or months of valuable engineering time and resources, speeding time-to-market and time—to-profit.
 

By Dave Orecchio.

Dave Orecchio is President and CEO of Inc. and has 24 years of semiconductor industry experience at four venture-backed companies with a focus on semiconductors, ASIC, and FPGA design and development. His leadership brought three of the four companies to successful exits for the investors. Prior to GateRocket, he held executive positions in marketing, sales and general management at LTX, Viewlogic Systems, Synopsys, Innoveda, Parametric Technologies and DAFCA.


Reprinted from SOCcentral.com, your first stop for ASIC, FPGA, EDA, and IP news and design information.
Copyright 2002 - 2011 Tech Pro Communications, 1209 Colts Circle, Lawrenceville, NJ 08648