Designing a flexible, programmable DSP system architecture is a daunting task. From evolving mobile standards to the newest video compression techniques, the latest algorithms are rapidly growing in complexity. For example, a customer that previously was satisfied with standard definition resolution MPEG-2 video compression may now demand that the next product support high definition resolution H.264, which will require more than an order of magnitude increase in system performance. At the same time, the pressure to increase system channel count is unrelenting as network capabilities continue to grow. Consequently when starting a new design, the engineer must consider not just todayís requirements, but understand that this system might also be called upon to address unforeseen challenges.
So what are the different design options? Historically, the choices for building a high performance DSP design for high-speed digital communications or real-time video processing have been limited. A typical approach was to populate a board with as many DSP processors as possible (colloquially known as a DSP farm) and then hope that the software engineers would not write applications that outstripped the maximum processing capacity.
In addition, issues such as high design complexity and total system power limited the scalability of this method. Further, this design methodology hinged on the assumption that DSP processor vendors could continue to increase clock speeds and reduce power consumption, which was never guaranteed. Now, however, thanks to remarkable improvements over the last few years in FPGA performance and the incorporation of hard embedded multipliers in these devices, there are new architectural options that can address the issues of performance, flexibility, and scalability.
A FPGA co-processing architecture can be an ideal approach to tackle these challenges. However, there are numerous issues to consider before heading down this path.
Figure 1. By intelligently partitioning a DSP algorithm between a DSP processor and an FPGA co-processor, a number of benefits can be realized including dramatically boosted performance and a reduction in total system costs.
Specific system requirements and preferences of the engineering team will play a large role in the final architecture decision. Some of the doís and doníts system designers should consider when designing a FPGA co-processor solution for a high performance DSP system include:
Donít assume that you can use the same approach to developing DSP algorithms on an FPGA the way it is done on a DSP processor. It is tempting to think that you can simply instantiate a soft DSP processor on the FPGA and create code in a similar manner to traditional DSP software development. This is a common misunderstanding. A completely different approach must be used. To get the benefits of FPGA co-processing, the datapath must be re-architected and implemented in a parallel manner, not the serial, sequential DSP processor coding style. While a DSP processor and a FPGA both have embedded multipliers, FPGA-based designs can potentially execute a much greater number of multiply-accumulate (MAC) operations per cycle than traditional DSP processors. Evaluate your DSP system and the required algorithms and consider how they might be ďparallelized.Ē Careful architectural planning and development of the FPGA co-processor can provide an order of magnitude performance boost over DSP processor based designs.
Understand what DSP design flow methodology will work the best for your designers, especially for those unfamiliar with FPGA design flows. One of the first questions to ask: how does the algorithm group prefer to prototype the DSP system? Will the group develop in-house models written in the C language that are not based on any specific tool or environment? If so, there is a great deal of flexibility when choosing a DSP design flow. The team can then choose a modular approach and create a hardware implementation for each block using a particular chosen method. This preference may determine the best starting point for the FPGA co-processor design.
Perhaps the team is more comfortable using a simulation environment to quickly model and simulate the algorithms that have been specified for this project. This may be a welcome approach for a team that has more experience with DSP software implementations. Does the team have a background with an ASIC or FPGA design flow? If this is the case, it is also possible to develop the DSP datapath by directly writing VHDL or Verilog and bypass the use of higher level design abstraction tools. While potentially the most labor intensive and time consuming, the final design might be optimized for size and performance. What about a C-to-gates methodology? A few EDA vendors have introduced C-entry tools specifically targeted for DSP applications that can generate HDL code ready to be synthesized and incorporated into FPGA design software. All of these approaches can be incorporated into DSP design flows to implement an FPGA co-processor.
Decide how the DSP algorithms will be partitioned in a DSP processor/FPGA co-processor architecture. A straightforward, well-understood approach is to offload the most computationally intensive pieces of a DSP algorithm to an FPGA and let the DSP handle the more control-flow oriented pieces. This datapath/control path architecture, while simple to visualize, may not necessarily be optimal for your project. The popularity of soft embedded processors instantiated on an FPGA makes it possible to execute a large part, if not all, of the control path on the FPGA. In fact, multiple soft processors can be incorporated to provide a finer degree of granularity to the control flow. On the other hand, the existence of legacy DSP code might make the team hesitant to implement the entire datapath processing on the FPGA, especially when a number of man-years have already been invested to develop libraries on a DSP processor platform. In this case, the team may decide only to move smaller and/or newer parts of the processing chain over to the FPGA at first.
Remember, flexibility is one of the key benefits of this architectural approach. Suppose for the first FPGA-based design that you take a conservative approach and only implement a small portion of the processing on the FPGA and leave the rest to be executed on the DSP processors in the system. For the next generation design, shift more of this processing to the FPGA and boost system performance without having to redesign the current board architecture. To provide this kind of extensibility will require careful planning.
Evaluate whether to ďmake or buyĒ the key DSP intellectual property in the design. Is the target DSP design composed of standard DSP blocks, or will most of it be a completely proprietary effort? More than likely, the final design will use a combination of classic textbook IP cores and your teamís own custom logic. The best design option will depend on the project requirements which might include cost considerations, future design reuse, or time-to-market. Using off-the-shelf cores might be a less expensive, faster option compared to building a block from scratch, assuming they are well supported and have the right feature set.
The next question is whether you can identify a provider to meet the design requirements. Certainly a large 3rd party IP network has existed around DSP processors to fulfill this need. A similar ecosystem has developed around FPGAs in the last few years to accommodate the large number of FPGA-based DSP designs. The most common blocks such as FIR filters, fast Fourier transforms (FFTs), and forward error correction (FEC) cores are readily supplied and have been successfully deployed. Even more exotic or specialized IP such as an H.264 video codec are available from IP vendors as packaged FPGA cores. Finally, make sure that the seller can provide complete documentation, performance benchmarks, verification test benches, and a well staffed support organization to address any issues you might have.
Determine how the FPGA co-processor system integration will be performed. Once the processing partition has been decided, how will the two halves be integrated? Specifically, what will be the primary hardware interface between the DSP and FPGA? The peripheral feature set of the DSP will likely determine what choices are available. More than likely there will be multiple links between the DSP processors and FPGAs in the system. Will they be low-speed serial connections for control or high-speed parallel connections to shuttle data between the devices?
Depending on the processing partition between the devices, the interface with the appropriate throughput will have to be chosen. Perhaps the FPGA will be called upon to create an ad hoc bridge for proprietary audio or video data buses in your system. FPGAs can be used to increase the capabilities of the DSP processor by providing peripheral and memory expansion. This can be especially useful when trying to adapt a design to meet emerging industry standards that had not been envisioned by DSP processor vendors.
Now that you have chosen your preferred hardware interface, does the FPGA design flow incorporate a seamless method to integrate this interface into your design? While it is possible to create a custom block to perform this function, there are comprehensive system integration tools that can perform the potentially tedious task of connecting it all together. This software typically includes libraries of peripheral components to address a wide range of connectivity options. Secondly, will this design tool generate an application programming interface (API) or a memory-mapped header file that can be incorporated into the DSP software integrated development environment? Donít underestimate the value of this step. The integration of the hardware accelerated algorithms into the DSP software architecture is critical to extracting the benefits of the FPGA co-processor architecture.
Donít be constrained by the requirements of the initial design. Now that you have created your first FPGA co-processing architecture, you are ready to exploit the benefits of this flexible, scalable platform. If the system feature set needs to be enhanced or you need to reduce the system bill of materials (BOM) cost, you have several options that do not involve redesigning the current board. FPGA vendors typically offer pin-compatible devices across a range of densities to allow vertical migration. To reduce manufacturing costs, you could decide to use a smaller FPGA (design permitting). Alternatively, you could move more of the functionality from the DSP processors into the FPGA and reduce the total number of components without changing the current board layout. To add performance to your platform, use a higher density FPGA and build a more powerful design with greater capabilities. This approach will allow you to maximize design reuse and shorten your next generation productís time-to-market. Just make sure that your original design is made as modular as possible to enable this option.
By Alex Soohoo, Altera Corporation
Go to the Altera Corp. website to learn more.