July 12, 2007 -- When manufactured and deployed in the real world, digital ICs must operate across a range of temperatures and voltages, yield across all of the possible manufacturing process corners, and work in various functional modes for the design to be successful.
In the not-so-distant past, a typical chip would have relatively few functional modes; for example, a standard ("normal") mode and a test mode. Similarly, it was necessary to verify the design for relatively few environmental and manufacturing process extremes, or "corners," such as worst-case (minimum and maximum) operating temperatures and voltages.
The majority of today’s timing analysis engines are capable of evaluating a single scenario comprising a single functional mode and a single process/temperature/voltage (PVT) corner at a time. [In the context of this article, the term "scenario" refers to a design being verified for a particular set of operating conditions and in a particular operational mode.] At best, these engines may have an on-chip variation (OCV) capability that can be used to evaluate a fast and slow corner simultaneously. For this reason, the traditional development flow is based on analyzing and optimizing the design on a scenario-by-scenario basis.
Even in the case of designs with few operating modes, optimizing the design to achieve timing closure for one scenario may cause problems in a second scenario. Similarly, re-optimizing the design to achieve timing closure for the second scenario may introduce new problems in the first scenario.
One problem is that today's increasingly large and complex designs typically have a large number of operating modes and have to be verified for a large number of corners. If designers ping-pong back and forth from one scenario to another using conventional timing analysis and optimization engines, they may completely miss their market window (as a worst case, them may never achieve timing closure). Another problem is that the signoff timing verification engine is typically different than the timing analysis engine used to perform the design optimizations. If the design fails its signoff verification for one scenario, the fix may disturb one or more of the other scenarios, and the entire ping-pong process starts all over again.
The solution is simple. When designers work on timing closure, they want the timing analysis that is driving the optimization to exactly match the timing analysis they will use to sign-off the design. Furthermore, when working on timing closure, designers want the timing and optimization engines to be able to comprehend – and work with – all of the possible scenarios simultaneously, and to automatically adjust the design to meet all of its timing specifications.
Unfortunately, existing computing engines and algorithms have proved themselves inadequate for the task. This article first introduces the concepts of corners and modes that result in the different scenarios for which the design has to be verified. Next, it considers the problems associated with the various conventional techniques that have been employed in an attempt to solve the multi-scenario timing optimization problem. Finally, a new approach is introduced that can perform true concurrent multi-scenario timing optimization without requiring any changes to existing design tools and flows.
The misconception of statistical timing analysis
Traditional timing analysis engines have been of a form known as static timing analysis (STA). This form of analysis involves adding all of the delay elements forming a path through the chip, repeating this for all of the paths in the chip, and analyzing the results to determine the minimum and maximum delays for each path and to identify any potential problems.
In the case of today's chips, increasingly significant intra-die variations mean that it is no longer possible to assume that all delay paths in a chip are running fast or slow; instead, some areas of the chip may be running fast, others may be running slow, and others may be running somewhere in-between.
In order to address this problem, there is an increasing interest in statistical static timing analysis (SSTA). As opposed to working with fixed delay values, SSTA employs statistical probability functions to account for small intra-die variations.
There is a misconception, however, that SSTA removes the necessity to perform timing optimization for multiple corner conditions. In reality, SSTA does not allow the user to perform a single timing optimization run that encompasses the entire temperature range under which the device may be operating (0ºC to 100ºC, for example). Instead, SSTA conceptually "rides on top of" existing corners and accounts for any intra-die variations at each of these corners.
Currently, the EDA industry is divided as to the use of STA versus SSTA. If SSTA does become mainstream, it will still be necessary to be able to analyze and optimize for multiple scenarios simultaneously. Thus, for the purposes of this paper, SSTA may simply be regarded as just another timing analysis approach that needs to be accounted for.
|
The requirement for multi-scenario optimization comes from two sources, which are usually referred to as corners and modes.
Corners
Until recently, it was necessary to consider only three variables: process, voltage, and temperature (PVT). In the case of timing paths, for example, variations in the processes used to create the chips could result in slightly faster or slower devices. Meanwhile, if the supply voltage to the device were lower or higher than the nominal value, the device would run slower or faster, respectively. Similarly, a lower operating temperature would cause the device to run faster, while a higher operating temperature would cause the device to run slower.
In the case of this simple PVT scenario, the three variables equate to 2³ = eight corners. However, this did not equate to eight independent scenarios that needed to be verified. In reality, it was necessary to evaluate only two scenarios: the first for which variations in the manufacturing process combined with the lowest possible supply voltage and the highest operating temperature resulted in the maximum (slowest) delays; the second for which variations in the manufacturing process combined with the highest possible supply voltage and the lowest operating temperature resulted in the minimum (fastest) delays.
The reason the eight corners associated with the three PVT variables could be reduced to only two scenarios was that their "curves" (functions) were monotonic (in this context, "monotonic," refers to a function whose values are always increasing or decreasing). For example, consider the temperature "curves" shown in Figure 1.
|
Figure 1. Switching speed and output slew delays versus temperature for a device created at the 180-nm technology node. |
Assuming minimum and maximum operating temperatures of say 0ºC and 100ºC, respectively – transistors and logic gates would switch faster (have smaller delays) at the minimum temperature and they would switch slower (have larger delays) at the maximum temperature. Similarly, the delays associated with output slew (the "slope" of the output signal from a logic gate) would be smaller (faster) at lower temperatures and larger (slower) at higher temperatures.
The situation is no longer this simple with regard to devices created at the 90 nm technology node and below, because a variety of deep-submicron effects come into play and the "curves" describing the actions of the various variables are no longer guaranteed to be monotonic. In the case of switching speed, for example at some point increasing the temperature results in increased electron mobility inside the transistors. This causes more free electrons to come into play, which causes the transistor switching speeds to increase and their delays to fall as illustrated in Figure 2(a).
|
Figure 2. Switching speed and output slew delays versus temperature for a device created at the 65 nm technology node. |
However, the delays associated with the output slew of a gate (which are themselves associated with the wires) remain monotonic because they are dominated by resistive effects as illustrated in Figure 2(b). The end result is that it is now necessary to verify the design for three scenarios.
And this is only a very simple example; designs created at the 90-nm technology node and below can exhibit many such effects. Furthermore, many modern chip designs feature the use of multiple "voltage islands," which refers to different functional blocks being powered by different core voltages. Also, each of these voltage islands may contain a mixture of transistors with different switching thresholds. The end result can be a dramatic increase in the number of corners for which the design needs to be verified and timing optimizations performed.
The only practical approach to address this using conventional timing analysis and optimization engines that are capable of processing only a single corner at a time is to over-constrain the design. In addition to making the designers and the tools work longer and harder, these over-constrained designs leave performance on the table, which is unacceptable in today's extremely competitive marketplace.
Modes
In the early days of digital IC design, a chip would typically have only a single functional mode; that is, it was required to perform only a single function. Later, it became common for chips to have two modes: a test mode and the main functional run mode. Both of these modes would have to be run independently under all of corner-condition-based scenarios associated with the device, thereby doubling the number of scenarios.
In the case of modern designs, it is not uncommon for a chip to have a substantial number of modes. In the case of a personal wireless communications device, for example, there may be a search mode (locating available base stations), an idle mode (waiting for an incoming or outgoing call to be initiated), a receive mode, and a transmit mode. In addition, there may be an MP3 mode, a digital camera mode, a global positioning satellite (GPS) mode, and so on.
Furthermore, many devices feature low-power and "sleep" modes. Each of these modes needs to be verified in isolation, and also with respect to switching from one context (mode) to another. For example, it may be required to interrupt the playback of an MP3 file to answer an incoming call, and then resume the MP3 mode after the call is terminated. Alternatively, when returning from a "sleep" mode, it is necessary to verify that the contents of the appropriate registers and memory elements have been maintained as required.
Once again, when combined with the increased number of process/environmental variables and corners, the end result can be a dramatic increase in the number of scenarios for which the design needs to be verified and timing optimizations performed.
Problems with current design environments and flows
Hardware design engineers spend a large proportion of their time creating constraint files that capture the timing requirements associated with the various input-to-register-to-output paths through the chip. For example, out of a six-month period working on a design, two or more months may be devoted to defining these constraints.
In the vast majority of design flows, separate constraint files are created for each scenario. A typical design flow involves the users working with the most problematical scenario until all of the timing issues have been resolved. The users then attempt to optimize the remaining scenarios one-by-one. The problem with this sequential approach is that fixing issues one scenario may cause a ripple-on effect by introducing new timing issues in other scenarios.
Some design flows attempt to address the standard sequential technique by forcing the users to merge the individual constraint files into a single "super-constraint" file. This file is then used to perform timing optimization for all scenarios simultaneously on a single expensive computing engine containing many CPUs and tens of gigabytes of memory. Even during the early portions of the development process, there are a number of problems with this approach:
- Creating the super-constraint file is resource-intensive and time-consuming.
- In many cases, the constraints from two scenarios may be mutually exclusive; for example, a timing path might support a latency of only one clock cycle in one mode, but it might allow a latency of three cycles in another mode.
|
Signoff verification considerations
Signoff verification is almost invariably based to checking each scenario (corner-based and mode-based) in isolation with its own unique constraint file. If each scenario was verified with its own unique constraint file during the development process, then the team has already spent an inordinate amount of time solving timing problems, but at least the signoff verification should proceed relatively smoothly.
If a "super-constraint" file approach was used, however, then the individual signoff verification runs may uncover problems that were not revealed by the "super-constraint" approach. In this case, the design team has two main choices:
- Cycle around modifying the "super-constraint" file, re-optimizing, and then returning to the signoff verification process.
- Fall back to performing sequential optimization using the same individual scenario-specific constraint files that are used to perform the signoff verification. In this case, the users are placed back in the time-consuming situation where fixing an issue in one scenario may spawn new issues in other scenarios.
|
The Solution – concurrent multi-scenario optimization
The obvious solution to the problem of designs requiring an increasing number of timing optimization scenarios to be handled is to process all of the scenarios concurrently. This means that any change (optimization) introduced in one scenario will be immediately tested in all of the other scenarios to ensure that this modification doesn’t "break" anything in the other scenarios. (If a change does cause problems, it can be immediately "undone" and an alternative optimization can be attempted.)
There are a number of considerations, however, that need to be addressed. One key consideration is the fact that design teams already have a significant investment in existing design tools, including timing analysis and optimization engines. Another consideration is that these existing engines are not inherently equipped to perform concurrent multi-scenario timing optimization. Even if their creators were interested in doing so – which is not typically the case – augmenting these engines to perform multi-scenario timing optimization is a non-trivial task.
To address these issues, the scientists and engineers at Athena Design Systems have developed an innovative solution to this problem based on a unique distributed computing model (see Figure 3).
|
Figure 3. Performing multi-scenario timing optimization using a distributed computing model. |
Each scenario is run on its own compute engine/node (each compute engine may have multiple processors; also, it is possible to run several scenarios on a single compute engine if required). Furthermore, the master control node (which, among other things, maintains a copy of the master netlist) also resides on its own compute engine. Due to the fact that each compute node is running only a single scenario, these engines can be relatively small and affordable.
The master control node suggests optimizations that are then tested on each scenario running on remote network machines concurrently. If an optimization in one scenario causes issues in another, that optimization is discarded and a different approach is evaluated.
There are several key aspects to this solution that should be noted as follows:
- No changes are required to existing timing constraint files. Similarly, no changes are required to any existing (third-party) timing analysis and optimization engines, because each instantiation of these engines runs on only a single scenario.
- With regards to the previous point, the system does come equipped with its own tightly-coupled timing analysis engine that is fully incremental, OCV, and SI aware. This tight coupling provides extreme speeds that are not possible when working with loosely-coupled third-party timing analysis engines. The idea here is to use the tightly-coupled engine at the beginning of the process so as to quickly perform the vast majority of the timing optimizations and clear the vast majority of timing issues. Once this is done, the user can swap in existing STA (or even SSTA) engines to perform the final timing optimization (and verification) runs.
- By supporting discrete constraints per scenario, it provides the same accuracy of interpreting the timing constraints as in the traditional approach.
- Once the initial scenario databases have been established on the various nodes, only incremental changes to the database need to be propagated from the master control node to the sub-nodes. Similarly, only incremental changes to the timing are returned from the sub-nodes to the master node. All this serves to minimize network bandwidth requirements.
- With regard to the previous point, the underlying database is designed from the ground-up to address this usage scenario. That is, the database is not file based, but is designed to stream incremental changes to the data structures as required.
- The use of single optimization engine coupled with many small memory footprint processors – as opposed to a single humongous machine that is use to perform all optimization and timing analysis tasks – provides many advantages, including:
- Extreme TAT on optimizing one scenario while verifying on all scenarios in parallel
- Scale-ability on number of corners and modes by adding more computers
- No special IT setup is required and all complexity is hidden from the user. For example, if a compute node crashes of becomes disabled for any reason, on recovery (when the machine comes up again) the system automatically knows what state the node was in, automatically reinitializes it to that state, and continues with the timing optimization process.
|
In summary
Performing a single timing analysis and optimization run on a large, complex design can take hours or even days. Furthermore, fixing a timing issue in one scenario may trigger multiple failures in other scenarios, and fixing those problems may cause ripple-on effects in further scenarios. Solving all of these problems in a sequential manner could take so long that the design would be obsolete long before it could be made to work (if, indeed, it could ever be made to work).
The solution is to perform all of the timing analysis and optimization runs concurrently. In order to address this problem, the scientists and engineers at Athena Design Systems have developed an innovative concurrent multi-scenario timing optimization technology based on a unique distributed computing model. Using this technology, any change (optimization) introduced in one scenario is immediately tested in all of the other scenarios to ensure that this modification doesn’t "break" anything in the other scenarios.
In addition to requiring only standard, affordable workstations, Athena's solution does not require any modifications to existing timing and optimization engines and/or constraint files. The result is to dramatically increase productivity by significantly shortening the timing optimization portion of the design process.
By Dimitris Fotakis.
Fotakisis Founder and President of Athena Design Systems, Inc. He also serves as Athena Design’s chief technology officer and leader of its multi-disciplined development team that brings together expertise in extraction, layout, timing analysis, signal integrity, optimization and multi-processing applications. Fotakis previously co-founded AmmoCore Technology, after holding R&D and engineering positions at Cadence, High-Level Design Systems and National Semiconductor. He received his M.S. degree in Computer and Electrical Engineering from George Mason University in 1986, and currently holds two patents.
The author wishes to acknowledge the contributions of his colleagues during the development of this white paper on behalf of Athena Design Systems, Inc.
For additional Athena Design white papers on the topic of timing design closure, visit www.athenadesign.com/datasheets/design-closure-crisis.pdf
Go to the Athena Design Systems, Inc. website to learn more. |