July 12, 2010 -- With their steadily increasing performance and density and their decreasing prices, FPGAs today are finding their way into more markets and systems than ever before. Many designers turn to FPGAs to avoid being "locked in" to costly ASIC platforms, particularly during the early design phases, and these same FPGAs frequently remain in place as part of the end-user product. Consequently, it has become important to account for—and minimize—FPGA power consumption because there may not be a low-power ASIC waiting in the wings to replace the programmable device when the prototype evolves into a final design. This article presents an overview of low-power principles and related FPGA design techniques available today.
Power consumption basics
Power consumption in FPGAs is made up of three key components: static power, dynamic power, and I/O power.
Static power is consumed by a programmed FPGA when it is powered up, but no clocks are operating. As transistor geometry gets smaller and device "size" increases, leakage current increases, adding to the power consumption:
…where N corresponds to the size of the device in terms of FPGA Gates.
Dynamic power, on the other hand, is consumed when the device is operating, signals are toggling, and capacitive loads are charging and discharging:
…where alpha is the activity of the gate, C is the parasitic capacitance of the gate, and F is the clock frequency.
I/O power is related to I/O current, which is calculated using the I/O capacitive coefficient. The slew rate and drive strength for the device’s receivers/ transmitters are key variables that affect I/O power. FPGA designers can choose from a host of topologies to control I/O power consumption, namely LVDS, HSLVDCI, HSTL, SSTL and IODELAY.
Of course, all three of these power entities also affect ASICs. But ASIC synthesis tools have long included facilities to control power across several abstraction levels of the design cycle. These tools offer user-defined power formats to help designers manage the static power of the chip.
With the growing pressure from end-users to similarly minimize FPGA power levels, leading FPGA vendors are announcing devices designed to reduce power consumption. Products such as the Virtex-6 from Xilinx, Inc. and the Stratix-IV from Altera Corp. apply cell architectures designed to control both static and dynamic power. For example, the Xilinx device family supports suspend-based system-power management. This feature shuts down all internal operations under specific conditions, considerably improving the device’s static power consumption.
The low-power design flow
When implementing a design on an FPGA, low-power optimizations should first be considered at the architectural level. After designing at this high level of abstraction, the RTL description is written in HDL. During RTL logic synthesis, the design is mapped to architectural cells consisting of look-up-tables (LUTs), registers, DSPs, and RAM primitives. All of this is fed to the FPGA vendor’s place-and-route (P&R) tool. The routed netlist and generated constraints after P&R are used by power-analysis tools to measure power consumption. Lastly, the generated bitstream file is used for FPGA programming.
Figure 1 depicts an FPGA design and implementation approach that is power-aware from beginning to end. The approach is divided into three steps encompassing the RTL level, the synthesis flow, and the FPGA vendor implementation flow.
Figure 1. A power-aware FPGA flow.
Starting at the top: The RTL flow
At the RTL level, users can improve dynamic power by controlling the clock flow and minimizing the design area. This can be as easy as following a few simple coding tips:
- Ensure that internal logic blocks are driven at their slowest required clock frequency by using multiple clock domains. Even though it may be feasible to sample certain design regions at the faster default clock, a lower frequency will significantly reduce power consumed in the given logic block and in the design overall.
- FPGA devices have dedicated resources available to control the clock line. These include both global clock buffers with chip enable and local clock buffers. These elements can be instantiated by hand but ideally, the RTL synthesis tool should infer them from the RTL design and use them to reduce activity on the clock paths.
- Use low-power embedded blocks such as FIR filters, RAMs, and RAM-based FIFOs. These implementations are available from various FPGA vendors, either through synthesis inference from the RTL or by cell instantiation,. In part, these "hard IP" blocks reduce static power simply by reducing the transistor count. They also reduce dynamic power by eliminating as much programmable interconnect as possible.
- Insert controls to stop the toggling of design blocks when they are not in use.
- For embedded RAM blocks, gate the memory clock with a memory Read/Write Enable to control memory access.
- Avoid using double data rate registers when possible. These increase the toggle rate, which increases power consumption commensurately.
Synthesis flow offers many power-saving opportunities
Some advanced synthesis technologies can perform transformations and optimizations to reduce power consumption while minimizing the impact of those processes on overall design performance and area. Optimizations include:
- Using clock buffers with enable pins. Synthesis can detect the use of enable pins of sequential elements and map them to the enable pins of clock buffering cells. When the enable pin is low in such a transformation, the entire area of that clock network is turned off, avoiding the unnecessary toggling of registers, embedded RAMs, DSPs, or other sequential elements.
- Implementing small memories with generic logic elements rather than using embedded RAM resources. This technique is specific to Xilinx environments.
- Optimizing RAM resources so that they are only enabled during actual Read or Write transactions.
- Implementing state machines with embedded RAM resources where possible, and similarly, implementing counters and adders in embedded DSP resources when possible.
- Use shift registers with enables wherever possible, rather than implementing with simple registers. This will ease the routing task as well as the packing effort for the placement tool.
Some synthesis tools apply power-optimal pipelining to implement a glitch-free data path. But this can rapidly increase the number of registers required. It is usually just as effective and more efficient to use clock gating techniques to minimize glitches.
Three ways to estimate power
In an FPGA flow, there are three popular methods for measuring power consumption. The first two approaches depend on power-analysis tools provided by FPGA vendors that use the routed netlist and constraints as input.
- Vectorless flow — With this approach, the FPGA vendor’s proprietary power-analysis tool assumes a default toggle rate (typically 12.5%) and provides results based on the input data and timing constraints. The power analyzer typically has the option to change the toggle rate in various areas of the design, such as at the input ports, certain registers, and memories, to check variations. It is worthwhile to examine the transition rate’s effect on the design when using the vectorless flow method.
- SAIF/VCD/Vector-based flow — With this method, specification test vectors are used to get the VCD (Value Change Dump) file or similar output file that describes the toggling activity of the circuit during gate level simulation. The test vectors should encompass all expected transitions from the use cases in the IP specification. This flow is more accurate than the vectorless approach. The user provides the toggle rate with the input data (e.g., VCD file and routed netlist) and timing constraints.
- On-board measurement — This is the most accurate approach to measuring power consumption data for a design. It is beyond the scope of the hypothetical examples in the following section.
Power estimation examples
Imagine a designer developing a device (chip) whose bus protocols include PCI, DDR, and USB. This designer must maintain a chip power checklist for the end-user’s reference, and the list must summarize the power consumption for each of the bus interfaces. This will require a test case that covers all three types of bus transactions while estimating power numbers for each. Though power consumption is dependent on design and device selection, the following examples will demonstrate the power savings that can be accomplished using an implementation solution with low-power optimization capabilities.
The first design example is a communication decoder synthesized for a Xilinx Virtex-6, consuming approximately 1,500 slices. The results for this example are based on a netlist generated using a low power synthesis flow with Xilinx ISE place-and-route. Power consumption is measured using the vectorless flow method and the default toggle rate.
Table 1 summarizes the power measurement results for the design implemented with a standard FPGA flow and no special consideration for low-power goals.
Table 1. Dynamic power estimation of a standard FPGA flow.
In contrast, when the design goes through a low-power implementation flow, several optimizations are performed at the synthesis level. As shown in Figures 2 and 3, a global clock buffer with Enable pin is used in place of the sequential element enable pins in selected areas of the design.
Figure 2. Circuit prior to low-power optimization.
Figure 3. Synthesis result after low-power optimization using a global clock buffer with Enable input.
Table 2 sets forth the power-measurement results for the design implemented with the low-power implementation flow. Almost all the power figures have been reduced to some degree and clocks have been reduced by more than 50%. This contributes to a substantial reduction in the total dynamic power value as well.
Table 2. Dynamic power estimation of a low-power FPGA flow.
The second example is a DMA controller which has been synthesized on an Altera Stratix-IV, having approximately 4500 ALMs. The result is derived from a flow consisting of low-power synthesis, Altera Quartus II place-and-route, and vectorless flow-based power estimation using the default toggle rate. The design contains one HDL-defined memory array. Power analysis results estimated dynamic power to be 74.66mW. In a low-power synthesis flow, the clock and clock enable of the RAM are swapped by the synthesis tool. Final results showed more optimal dynamic power consumption of 69.45mW, or a 7% reduction.
Results from both the communication decoder and the DMA controller confirm that dynamic power consumption can be reduced by astutely using the low-power synthesis optimizations described earlier.
It's important to note that because performance and area results are influenced by the degree of power optimization, a user should be able to control the "optimization effort" in order to balance the impact to quality-of-results.
The future of low power FPGA design
Because the FPGA design community is confronted with both static and dynamic power challenges, FPGA vendors are steadily improving their device architectures for low-power applications. EDA vendors must stay in step with this progress by introducing ever more powerful, easier-to-use solutions to optimize FPGA designs for low power at the implementation level. The goal is to achieve explicit control over power just as it exists for area and timing results today.
As an example, users in the future should be able to directly map low-ower domains of the design to low-power areas on the device. This can be guided using power formats, in which the user specifies low-power domains and synthesis maps these into low-power cells (from the device library). Downstream in the implementation flow, P&R can use the same information for better placement and routing to give expected result.
For emerging FPGA applications, the concept of quality-of-results (QoR) is no longer limited to Fmax performance and area. Power consumption is fast becoming the third dimension of QoR; an axis that is carefully scrutinized before project sign-off. Each phase of the design cycle must contribute to meeting aggressive power budget requirements. While HDL coding style and device selection are major factors in meeting such requirements, the software implementation flow also plays an important role. The message to designers is clear: choose a flow whose features make power conservation a priority, not an afterthought.
By Saurabh Kumar Shrimal.
Saurabh Kumar Shrimal is Lead Member Technical Staff at Mentor Graphics. He has over 6 years of experience in the ASIC/FPGA industry. His areas of expertise include RTL design, synthesis and timing, and low-power design techniques. He is currently working for the Precision FPGA Synthesis Team.
 Mentor Graphics Precision Synthesis Manual
 Altera (Stratix-IV) User Guide
 Xilinx (Virtex-6/Spartan-6) User Guide
Go to the Mentor Graphics Corp. website to learn more.