Advanced process nodes enable the implementation of highly complex system on chips (SoCs) and multimillion-gate designs such as memories, processor cores, analog circuitry, DSP cores, and PLLs. For these large, complex designs, designers are using more hard macros to help speed logic-gate placement and to alleviate some tool-capacity challenges. Most of the logic gates - and, therefore, most hard macros used to place these gates - are dedicated to memories.
To succeed in their role, physical designers must deliver a competitive product “right” to market: the right features and functionality at the right time and cost. This means that the die size (area) cannot be too large, the design cost must remain within the range of profitability, the design must operate at its target frequency, and the entire design — from concept to silicon — must be delivered on time. As a result, accurate hard macro placement can make the difference between a design success and a design failure, and is becoming mission-critical to SoC design.
In this article, we look at the challenges — and solution — to accurate hard macro placement.
Figure 1. ASIC technology roadmap (design starts by drawn linewidth). Advanced designs have significantly higher gate counts. (From the January 2004 Gartner report, “ASIC and FPGA Suppliers Answer the Call, Market Trends”.)
Figure 2. For new designs, significantly more logic gates are dedicated to memories. In 2002, about one-third of all ASIC gates were memory. By 2007, it is expected that almost half of the gates in a design will be allocated to memories. (From the January 2004 Gartner report, “ASIC and FPGA Suppliers Answer the Call, Market Trends”.)
Hard macro placement: The challenges
In the past, physical designers could hand place designs that used a relatively small number of hard macros. However, as the number of logic gates and associated hard macro instances increases, hand placement becomes nearly impossible. As a result, designers are turning to design automation solutions to help speed and regulate hard macro placement.
Most placement automation solutions cannot effectively handle a combination of hard macros and standard cells. Many of these solutions are geared to handle smaller objects — approximated as point objects — during the optimization process.
Therefore, for automated solutions to become effective, they must incorporate cost functions to address the new challenges of:
- Handling predefined placement locations
- Dealing with widely varying sizes and shapes
- Defining macro orientations and pin positions
- Simultaneous standard cell and macro placement
- Congestion and timing driven placement
- Producing placements that support multiple power nets
- Producing results quickly
Handling predefined placement locations
Packaging requirements usually define relative I/O cell locations. Some hard macros must be placed next to specific IO cells such as PLLs, analog-to-digital converters, and digital-to-analog converters. Pins of these hard macros must align, and be close to specific I/O cells to allow wide-wire connections.
It is a normal practice to pre-place these macros in specific locations within the core area, and then “fix” their placements so they cannot be moved by automated placement algorithms. However, this practice of pre-placing and fixing hard macro locations produces placement blockages for the rest of the design logic. Pre-placing macros around peripheral I/Os results in a rectilinear — rather than rectangular — core area for the rest of the design. Placement algorithms need to be capable of handling rectilinear placement areas.
Dealing with widely varying sizes and shapes
It is not uncommon for today’s designs to have over a hundred hard macro instances whose sizes and shapes vary considerably. Sizes can vary from as small as 10x the size of an average standard cell to as large as 100x, 200x, or even 1000x the size of an average standard cell. Additionally, aspect ratios of macros vary widely.
Figure 3. Standard cells surrounded by hard macros are unroutable; designs can be optimized to improve routablility by placement algorithms that do not create these types of areas.
Placing macros of varying sizes without good optimization engines can result in fragmentation of placement and routing space, making a design unroutable. Narrow columns or closed spaces between several macros can cause routing bottlenecks. Figure 3 shows an example of closed space creation and the solution an algorithm that avoids this created.
Macro orientations and pin positions
Hard macros are usually several times larger than standard cells. In many placement algorithms, standard cells usually are approximated as objects with all the pins at the center. The length of a net between such objects is simply the Manhattan distance between their centers. Distance calculation in such a manner can easily be rendered into an optimization problem that minimizes total net wire length. For larger macros, approximating the pins to be at the center introduces significant inaccuracies rendering this model useless. Using the actual pin locations for distance calculations makes the optimization process more difficult. In addition, pins with locations different from the center make macro orientations a significant factor in distance calculations. Depending on the orientation of the macro, the lengths of the nets connecting to the macros can be different which further complicates the optimization formulation.
Given these factors, new techniques are required that include macro sizes, pin positions and orientations in optimizing placement wire length.
Simultaneous standard cell and macro placement
Traditionally, designs with hard macros are placed in two steps: Designers manually place macros, pushing them to the edges of the chip or a block. Then, designers fix the macro locations and let state of the art placement tools place the standard cells assuming the fixed macros to be placement blockages for standard cells. This does not result in optimized placement because both macros and standard cells are not considered simultaneously during wire-length optimization. The second reason for inferior results is due to simple heuristic that pushes all the macros to the edge of the chip or block. Though this mechanism ensures maximal contiguous (minimally fragmented) space for standard cells, it may result in long routes, and therefore inferior timing results for macros that have strong connections in multiple directions. A better algorithm will consider both standard cells and macros for optimization simultaneously. Furthermore, an optimal tradeoff between standard cell space fragmentation and wire length is necessary for superior results.
Congestion and timing-driven placement
Hard macros present congestion problems in two forms: First the hard macros themselves represent significant routing obstructions for routing resources. Most hard macros completely block the first three to four layers of routing. Therefore, any congestion estimates need to consider routing blockages due to macros. The second problem arises around the hard macro pins. Since hard macros typically have a large number of pins that need to be accessed, a sufficient number of standard cell free tracks are needed to access these pins. Automatic estimation of required tracks and accounting for them is essential to alleviate congestion.
Long routes can result in bad timing results. Typically reduction in total wire length and the length of critical nets is desirable for better results. Timing estimation should consider detours caused in routes due to the presence of hard macros.
Producing placements that support multiple power nets
As discussed earlier, the majority of hard macro instances in a design tend to be RAMs. RAMs can be sensitive to noise on the power supplies. A fairly common method of protecting RAMs against noise on power supplies is to physically isolate power routing to the RAMs from power routing that supplies the synthesized logic of the design. Groups of RAMs need to be placed together to accommodate distribution of isolated power.
Producing results quickly
Key to meeting design schedule requirements is the ability to produce results quickly. The added complexity introduced by the large numbers of hard macros which are invariably optimized manually results in long times before an acceptably good quality results are obtained.
Measuring the Quality of Hard Macro Placement
Given the new implementation challenges presented by many hard macros in a design, a quick determination of the quality of hard macro placement is difficult. This section discusses some of the measures that help to determine the quality of hard macro placements. Eventually, the final measurement of quality is a completed chip that meets size and operation frequency targets. However, it is not practical to wait until a chip is completed to measure the quality of the hard macro placement. Design teams need a way to measure hard macro placement quality as early as possible in the design development process.
Given a placement solution, some of the measures are as follows:
- Wire length
- Standard cell placement area
Of these measurements, routability and timing are most important. Wire length, dataflow, and standard cell placement area are good measurements, but of secondary importance. For example, if the placement solution is routable and meets timing requirements, sacrificing some wire length is acceptable.
Routability - Routability is a key measurement of quality. If a design cannot be routed, it cannot be manufactured. Routability can be assessed by analysis of routing congestion produced by global routers. Congestion analysis data generally consist of two parts, a textual report of statistics and visual heat maps. The statistics provide a good indication of how much of the design is congested. Highly congested designs are generally not routable. The heat map provides of way to see where the routing congestion exists within the design. If there are some areas of high congestion, or hot spots, these are where the trouble is likely to be. Hard macros tend to create congestion around their edges and corners. Hard macro placements should be carefully analyzed for routability early in the design cycle.
Timing - Given a routable placement, one should assess the timing of a design (if a design is not routable, it does not make sense to check timing; it is customary to resolve the congested areas first). The timing calculations can use the placement information to better estimate interconnect loading. In presence of macros, interconnect timing calculations should include routing detours caused by large macros. A good placement will minimize timing violations.
Wire length - Total wire length is a good overall indicator of placement quality when comparing multiple placement solutions. Generally, those that have smaller values of total wire length are better. This is a value to record and monitor when assessing multiple placement solutions.
Another aspect of wire length is localized to hard macros. Looking at the signal connections of a hard macro can quickly reveal if the hard macro is placed well. For example, seeing a hard macro placed in the lower left corner of the core area that is connected to logic in the upper right corner of the die indicates a poor location for the hard macro.
Dataflow - An engineer who knows the logic and intended operation of the design is best equipped to assess data flow. They can assess data flow by observing where logic modules are placed and how they are placed with respect to each other. Implementation engineers who are not familiar with the logic structure and intended operation of the design can use fly-line analysis to observe data flow.
Standard cell placement area - Standard cell placement area is assessed visually. Small areas surrounded by hard macros usually cause congestion hotspots. Unless the connections to and from standard cells in these areas are completely localized, it is hard to complete all the connections from within these areas to the objects outside these areas. Generally, a contiguous standard cell placement area without bottlenecks is desired.
Hard macro placement: The Solution
Design planning solutions should consider hard macro placement in the context of floorplanning to deliver accurate views of logic-gate placement for design prototyping and implementation. Using placement algorithms that address the challenges described above, such solutions should place standard cells and hard macros simultaneously while minimizing global wire length. The floorplanner should automatically align and pack hard macros that naturally group together, minimizing wire length and routing space fragmentation. In this way, when standard cell and macro placement is optimized simultaneously, the hard macro placement produces optimal locations to help timing and congestion.
To avoid macro congestion, an effective floorplanner should automatically compute the required routing channel area around and between hard macros. In addition, a user should be able to specify additional channel area around certain macros, if needed.
The floorplanner should also consider macro orientations and their influence on wire length and congestion. The floorplanner should automatically rotate and flip macros to achieve optimal placement results. Macro pin locations should be accounted for when wire lengths are computed and optimized.
Traditional techniques for macro placement push macros to the sides of the core or blocks, fix their placement, and then create the standard cell placement. This can lead to routability issues downstream. Therefore, an effective floorplanner should determine the best location holistically —while exploring the entire solution space — to optimize wire length and alleviate congestion and timing issues.
The floorplanner’s placement engine should support optional automatic, hard macro, grouping and alignment features. Often, groups of hard macros are used to implement a function. For example, to minimize read/write access time, it is sometimes better to use a group of small RAMs to make a larger RAM. An effective floorplanner could allow user inputs to determine macro locations. The inputs could include exact macro locations (such as those for pre-placed macro constraints) or more sophisticated macro constraints. Relative placement constraints, alignment constraints, grouping constraints, and region constraints could be defined prior to placement and honored during the optimization process. For example, if a user knows a set of macros requires isolated power routing, the user should be able to assign the macros to a group and the floorplanner should honor the group assignment when making placement optimizations.
The floorplanner’s placement engine should address the challenges of hard macro placement and standard cell placement simultaneously. Even with all the considerations, placement runtimes should be extremely fast.
An example of a floorplanning solution that offers these many capabilities, Synopsys JupiterXT™ has demonstrated simultaneous hard macro and standard cell placements for a 250K instance design, with nearly 400 hard macros, in less than five minutes. Another design having 2.2 million instances with nearly 100 hard macros was completed in less than 40 minutes. (These tests were done on a machine with two 1.8-GHz CPUs.)
The effective floorplanner should also offer rich editing capabilities that aid macro pre-placement. To enable designers to quickly refine macro pre-placements, the floorplanning interface should allow macros to be selected, grouped, aligned, and moved using simple mouse clicks.
The combination of high-quality automatic macro placement, a rich set of user constraints and easy manual editing capabilities would ensure highest quality of results during hard macro placement. Again, JupiterXT — an example of this type of floorplanner — offers these capabilities and has been tapeout-proven in designs with over 1000 macros. Figure 4 shows examples of automated placement results.
Figure 4. Automated placement results.
Hard macro placement assessment
In addition to the comprehensive placement algorithms and options to support optimal placement of hard macros, an effective floorplanner should provide designers with a rich set of floorplan analysis tools.
The flow chart below shows steps in a design planning flow and notes the type of quality assessment that can be performed at each step. The following sections give more detailed descriptions of the steps in the flow.
Figure 5. Hard macro placement in the design planning flow.
As noted previously, the floorplanner’s placement engine should simultaneously place standard cells and hard macros. Once a placement is complete, designers can immediately assess total wire length, data flow, and standard cell placement areas. However, given these are of a secondary nature, it is recommended to proceed to congestion and timing analysis first. If there are routability or timing problems, assessment of data flow and standard cell placement areas can often provide the designer with data necessary to solve the problems.
Assessing total wire length
Total wire length should be output in the placement log. Generally, the wire length number can be noted and monitored if the designer chooses to perform multiple placements. Smaller total wire length would generally indicate a better placement solution. (Total wire length is usually compromised by the traditional macro placement approach, which pushes macros to the periphery of the design, fixes their locations, then performs a standard cell placement.)
Fly line analysis, another aspect of wire length assessment that is more specific to macro placement, should also be supported. Ideally, the floorplanner should highlight the connections from all or selected sets of macros as fly lines to support quick, visual assessment of the quality of the macro location and orientation.
Figure 6. Fly lines, such as those supported in Synopsys’ JupiterXT, can help show short connects from hard macros to associated standard cell logic after initial placement.
Assessing data flow
An effective floorplanner should preserve the relationship of the logical hierarchy to the physical hierarchy throughout the design planning flow. Designers should have access to the logical hierarchy by means of a tool such as an hierarchy browser at any point in the design flow. Logical modules could be colorized to distinguish the children of the logical modules in the physical layout (using the same colors to highlight each logic-module’s children). Knowing the relationship of hard macros and logical modules in this way would help designers assess related data flow enabling them to observe where logical modules and hard macros are physically placed. Knowing the logical data flow, designers can then make better placement decisions to fine tune the floorplan, as needed.
Figure 7. Coloring by logic module, as supported in Synopsys’ JupiterXT, enables visualization of data flow for critical paths from hard macro to hard macro.
Assessing standard cell placement areas
The floorplanner’s placement engine should avoid creating highly fractured placement areas with minimum wire length degradation. In addition, users should be given parametric control over column widths between hard macros. Placement solutions should visually guide the user to help minimize column width — and control overall die area — while allowing enough space to accommodate interconnect routing. If no control or visual guide is offered, columns may become blocked such that standard cells cannot be placed within them. For assessment, users should be able to easily identify any trouble areas and have a complete view of the placement areas.
A global router is used to assess routability. To support routability analysis, the global router should create congestion analysis data. Also, a log of statistics showing the total number of tiles and over-congested tiles should be provided. Additionally, the global router should produce a heat map of the design to enable designers to see exactly congested areas exist.
Synopsys’ JupiterXT offers a global router that is unique in this regard: it provides correlation to final routing to ensure design convergence.
For timing assessment, the floorplanning and design routing should be completed using a common timing engine. Also, timing assessment using tape-out proven technology is a must to ensures rapid design convergence. Synopsys’ JupterXT is offered within the Galaxy Physical Design solution, which supports sign-off driven physical design using PrimeTime® and Star-RCXT timing and extraction engines.
Today’s SoC designs have hundreds of hard macros, with trends indicating that many of tomorrow’s designs will include hundreds — if not thousands — more. Placement and floorplanning tools developed to handle standard cells, alone, cannot produce best area and timing results, due to the implementation challenges presented by hard macros.
To be effective, floorplanning placement engines must now incorporate cost functions to address the new challenges presented by having 100s or 1000s of hard macros in a design. The ability to quickly assess macro placement and overall floorplan quality — combined with the use of tape-out proven implementation engines — enable users to quickly converge to best area and timing results. The placement paradigm has shifted and JupiterXT delivers solutions to address the present and future needs of today’s designers.
By Neeraj Kaul and Steve Kister, Synopsys, Inc.
Go to the Synopsys, Inc. website to learn more.