May 30, 2013 -- It's well-known and accepted that decisions that have the most impact on power (or performance, for that matter) occur early in the design process at the architectural level. Some experts, such as Dr. Gary Delp at LSI, claim that architectural decisions have the potential for reducing power by 80% [Delp09]. Understandably, an 80% power reduction may cause some furrowed eyebrows, so let me restate that another way — Architectural decisions that are poor for power (usually by not considering power) can cause a 5X increase in power over the best power architectures. Although it says the same thing, that's much more palatable. Given that understanding, the question becomes, "How do I know where I am in that five-fold range of power architectures?"
Relying on previous design experience is often the best guide, but it is possible that the previous implementation could have been designed for better power efficiency.
Answering the question is further complicated by the fact that power dissipation needs to be estimated at the register-transfer level (RTL) or gate-level. By the time you get there, the amount of architectural change could actually be implemented is severely limited. Even something as architecturally straightforward as changing to half- or quarter-speed memories would be intractable at that point, given typical design schedules.
Exploring low-power architectures
One proven solution is to spend more time working at architectural modeling and automating the path to RTL and gates via high-level synthesis (HLS), a.k.a. ESL (electronic system level) synthesis. With that approach, you can literally click a button (or keystroke, if you prefer) and change the architecture to use full-speed, half-speed, or quarter-speed memories. From there, you can automatically generate the RTL code or gates to more-accurate power, performance and area estimates with your downstream RTL tool set.
A growing number of companies are doing that today. Some of them have been doing that for years. In 2005, one of the users I was working with reported that they reduced power by 59% in a re-spin of a video codec. The previous implementation was hand-coded RTL, and they achieved this power savings by backing up and spending more design time with the architectural code. HLS was used to generate RTL code so they could evaluate the impact of the architectural changes. In the end, they also improved area at the same time.
Hopefully, this shows the importance of spending design time at the architectural level, and that HLS is an enabling technology for this. But to really improve power, HLS must provide more than the capability to explore multiple architectures. This is a necessary capability, but not a sufficient one.
The role of HLS with existing power tools and techniques
The high-level synthesis adopter must make sure that the HLS tool will comfortably coexist with existing low-power design techniques, tools, and methodologies. For example, to support multiple memory speeds as discussed above, the HLS tool needs to seamlessly synthesize RTL code for any of a variety of memory architectures without requiring changes to the input algorithm. Similarly, to support architectural power techniques such as power gating, the HLS tool must support multiple modules. Even simpler techniques, such as using multiple clock frequencies for different blocks still require that the HLS tool supports clock-domain crossing.
As with architectural exploration, however, coexistence with existing power techniques and tools is necessary, but again not sufficient. What you really need is an HLS tool that will not only allow exploration while coexisting with existing low-power tools and techniques, but also one that will make power-aware decisions as it is creating the RTL code.
High-level synthesis has a unique and broad view of the design, including both its intent (the behavior) and its implementation (the RTL design). With that knowledge, it can perform high-impact power optimizations during synthesis to further reduce the dynamic power dissipation of the implementation.
For example, an HLS tool can analyze the algorithmic code and use that knowledge during synthesis to implement fine-grained clock-gating opportunities often impossible to find when looking at only RTL code, decreasing register power and clock-gating power. In many cases, clock-tree power can be further reduced when the HLS tool recognizes that it can increase the size of the gated clock domains by enabling more registers with the same conditions.
Similarly, an HLS tool can reduce datapath power by using knowledge of the design to implement the finite-state machine (FSM) in a way that minimizes glitches, especially those on the select lines of multiplexors. These glitches can be especially costly in terms of power because they propagate glitches throughout the datapath. Memory power can similarly be reduced via HLS by optimizing access for design constraints of power and performance.
A key to creating power-optimized designs is choosing the correct architecture, and high-level synthesis that generates high-quality RTL code is an enabling technology for this. In addition, the HLS tool must be designed to play nicely with existing low-power tools and techniques, such as reduced-speed memories. Finally, the HLS tool should be able to improve the power of the design during its synthesis, including all aspects of the dynamic power of the design.
[Delp09] Delp, G., et.al., "Design & Verification of Low Power SoCs," ISQED09, Session 5D.
By David Pursley
David Pursley is Director of Product Marketing for Forte Design Systems, Inc. and based in Pittsburgh, Penn. Previously, he held various positions as a Field Applications Engineer, Technical Marketing Engineer, Marketing Manager, and Product Line Manager in the fields of Electronic Design Automation (EDA) and Embedded Computer Technology (ECT).
Go to the Forte Design Systems, Inc. website to learn more.