December 6, 2011 -- Low power is now a central concern of digital design, especially for handheld and wireless devices, but also for servers and other computation-intensive applications where the cost of cooling can be quite high. Consequently, along with performance and area, power optimization is an essential factor in meeting and improving quality of results (QoR).
Thus far, power-optimization efforts have centered on RTL models and gate-level netlists. This is unfortunate because the further a design moves downstream the less effective power-optimization techniques become. Opportunities for optimizing low power are significantly greater at the electronic system level (ESL) of abstraction — with as much as an 80% improvement over what can be achieved at the RTL and gate levels. Once the architecture is set and the design implemented at the RTL, many of these opportunities to reduce power are foregone because of limited visibility into the design and the impracticality of trying different micro-architectures.
Figure 1. The ability to optimize power at the architectural level far exceeds that at lower levels of abstraction.
High-level synthesis (HLS) operates at the architectural level, providing the ability to significantly reduce power through functionality-aware, low-power optimization. The latest generation of HLS tools, such as Catapult C Synthesis® from Calypto®, automates clock gating, multiple-clock-domain management, and dynamic voltage and frequency scaling — eliminating painstaking, manual techniques for optimizing power. The highly productive HLS process gives designers time to explore various micro-architectures for the optimal balance of power, performance, and area. Furthermore, a tight coupling with power-analysis and optimization tools, such as Calypto PowerPro®, provides the added advantage of quickly verifying the power goals of the design.
Based on feedback from the power-analysis tool, the HLS tool can be configured (by the designer or automatically) to generate a more power-optimized RTL. In addition, HLS tools can automate set-up of the power-analysis tool, which, if done manually, can be very time consuming and error prone. HLS tools can also generate several solutions that can all be analyzed for power in one shot (in parallel), so the best solution can be selected for further implementation.
Yet, in order to achieve the full potential to reduce power at the ESL, HLS tools and methodologies must also possess three basic characteristics: a wide application scope, features that enable immediate deployment, and the assurance of a high QoR. Indeed, a mature HLS solution greatly outperforms manually-coding RTL in terms of power usage goals. This alone is a compelling reason for hardware engineers to adopt HLS.
Casting a wider net
Expanding HLS to support SOC design entails automated generation of all of the new RTL, support of legacy IP, and the means to bring the two together in a single design and verification environment. An HLS methodology that targets all aspects of the design is the most productive way to reduce power and guarantee the QoR for the entire design. HLS tools must be able to synthesize the full chip as a whole and as individual blocks. To do this, they must support multiple standard languages as both inputs (C/C++, SystemC) and outputs (VHDL, Verilog).
Such a mixed-language tool flow offers several advantages. Mixed-language HLS flows that combine un-timed C++ with SystemC enable full SOC synthesis, including high-quality data paths, control logic, interconnects, and complex bus interfaces. HLS tools that support these popular standard languages also enable the reuse and integration of legacy RTL and existing IP with the newly synthesized portions of the design.
HLS tools must integrate easily with the rest of the design flow — both front-end and back-end tools. A mixed-language flow makes it easier to maximize power savings because it uses the same language as architectural-level tools that support virtual prototyping and architectural design. For example, SystemC opens the door to transaction-level modeling (TLM) platforms. TLM platforms allow designers to pull in power characteristics for the hardware blocks in order to see the power characteristics of the full SOC when running real software. Because the power consumption of an SOC is dependent on its application software, this information can be leveraged to dramatically reduce power.
Figure 2. Power usage of three applications running on the same platform (two process technologies). Source: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Tony Givargis, Frank Vahid, and Jorg Henke.
HLS tools provide essential information that makes this even more effective. HLS allows users to apply different technology and architectural constraints to the same functional model during synthesis, resulting in different RTL implementations with different power characteristics. The distinctive power characteristics of different hardware solutions are annotated up from the HLS tool to the virtual prototyping tool. The design architects then compare the impact of various derivatives on overall power consumption. Conversely, timing and power budgets specified by design architects using TLM can be passed on as synthesis constraints to the HLS tool, which then implements the RTL.
For example, a design architect makes decisions about how to achieve specific power budgets, given certain voltage and frequency scaling. The synthesis tool generates the corresponding RTL and runs a power analysis and optimization tool that provides power consumption information. Seamless links into existing power tools is another hallmark of a mature HLS flow. Now designers can run accurate power reports based on the actual implementation, instead of estimated numbers at the C-level.
Power analysis tools accurately measure the power consumption of various candidate scenarios in order to pick the best fit. Combined with an HLS solution and power optimization, these tools become even more useful because they can be used to help analyze and compare several netlists; whereas the time and effort required to create hand-coded RTL means that only one netlist is produced. This also can be analyzed and tweaked, but obviously, the opportunity to choose from a range of possible configurations in order to find the ultimate, optimal design is precluded. A good coupling between HLS and power analysis and optimization tools facilitates the invocation of power optimization from within the HLS flow.
Finally, designers will spend only a fraction of the time verifying the chosen RTL design, because the C++ code has already been exhaustively verified. Completing the low-power design chain, the RTL generated by advanced HLS tools supports clock gating and other optimization techniques used by back-end, power-optimization tools.
Teams must be able to quickly put a newly adopted HLS solution to use executing power-optimization technologies, and the HLS tool must be deployable the first time it is used in a production flow. This requires that the tool fits easily into existing design environments and that it delivers functionally correct RTL.
Mixed-language tools plug into whatever methodology design teams have in place in a non-disruptive fashion, including compatibility with high-level models and verification of power-reducing features at both the C and register transfer levels. This enables designers to easily go from abstract C++ source code to different RTL variants; determine the optimal configuration of performance, area, and power; then go to gates.
A readily deployable HLS solution must be architected with verification in mind. If it is, functional equivalence between the C and RTL descriptions can be guaranteed. This is essential because, as the C code goes through synthesis, a number of things are done that make the resulting RTL fundamentally different than the source code; yet the C and RTL representations must remain functionally equivalent.
Low-power techniques are a case in point. Clock gating, multiple clock domains, dynamic voltage, and hardware architecture exploration generate RTL that is structurally different than a standard piece of RTL. For example, with clock gating, the HLS tool inserts a MUX structure that identifies a register or bank of registers that can be gated. When designers make these changes to the RTL through an automated HLS process, the HLS tool must be able to easily verify that the RTL is still functionally equivalent to the original C design because this would be difficult or impossible to do manually.
Figure 3. Through its automated, multi-level clock-gating optimization, Catapult C Synthesis provides a major, low-power design technique that is often quite tedious to implement at the RTL.
Furthermore, readily deployable tools must be able to reuse verification components and environments. In order to reduce the effort and time required for RTL verification of all aspects of the design, including power, HLS flows should leverage existing C-level testbenches and apply the same stimulus to both the golden source model and the generated RTL. Because the functionality remains untouched, even when the constraints are changed, designers can retain the transaction-level functionality of a particular block as a golden input model, facilitating verification.
The final requirement for a production-ready tool is that it is easy to use. Even with a highly automated process, users want and need to interact with the tool, so the controls for changing power and other constraints must be intuitive and straightforward. This is another quality of a mature HLS tool.
Better than ever
The potential to improve the QoR for power is a major technological advance presented by HLS. The ability to explore a broad range of area, performance, and power configurations; the automation of preeminent low power techniques; and the visibility into the architecture contribute to the superior power results of HLS-generated RTL over manual coding.
Users of early generations of HLS tools were happy to come within 5% to 10% of QoR targets for area and performance. As synthesis technology evolved, this became a given, with some HLS tools consistently improving upon hand-coded RTL results. However, it is in the area of power where HLS can significantly exceed anything hand-coding designers can do. This is a significant milestone in the evolution of HLS and will encourage more design teams to adopt it.
Whereas, RTL designers are able to identify only 20% to 30% of the potential clock-gating candidates within their designs, HLS can achieve 100% perfect clock gating. What this means is that everything that can be gated in the design is gated. HLS can perform this remarkable feat because it builds the design from a completely un-timed description. It has, therefore, information about all of the I/Os and data flows throughout the design, giving it full visibility into the design. With this insight, HLS applies power algorithms to identify all clock-gating candidates.
HLS tools also have the ability to sweep the frequency of the design to explore various power architectures. This lets designers try different mixes of frequencies and throughput rates and see how they affect area and timing. For example, an architecture that uses a 16-MHz frequency that meets a 500-ns throughput specification would require parallelizing more hardware, resulting in more area. Whereas a 100-MHz version that meets the 500-ns rate would be more compact, but require more cycles.
QoR is also improved because HLS libraries are more refined and accurate than the initial generation of libraries. This is particularly critical as the industry moves toward smaller geometries, which require much higher-quality library definitions. These improved libraries are also beginning to include power characteristics. So for the first time, designers can perform up-front power estimations and start making power trade-offs without leaving the HLS tool.
SOC interconnects, multi-language support, integration into the overall flow, and the automation of low-power techniques are essential to optimizing power and improving the QoR for the entire design. Today's leading HLS tools, such as Calypto's Catapult C Synthesis, embody all of these things and, therefore, should overcome the reluctance to adopt what may seem like a whole new way of doing things. The latest generation of HLS tools is production-proven and deployable on a wide scale, delivering production quality ROI faster than traditional methods.
By Shawn McCloud.
Shawn McCloud is Vice President of Marketing at Calypto Design Systems, Inc. He was previously Product Line Director for the Mentor Graphics Corp. high-level synthesis technology and before that, a senior system architect responsible for RISC and CISC-based micro-processor design.
Go to the Calypto Design Systems, Inc. website to learn more.