News & Analysis
Monterey, Synopsys lift veil on their physical-design tools
Richard Goering
4/14/2000 4:48 PM EDT
SAN DIEGO Engineers got their closest public look to date at two key IC physical-design products Monterey Design Systems Inc.'s Dolphin and Synopsys Inc.'s Physical Compiler at a pair of presentations at the International Symposium on Physical Design (ISPD) 2000 conference earlier this week. Speakers from both companies disclosed technical details that until now had remained under wraps.
Two speakers from Monterey (Sunnyvale, Calif.) described Dolphin as a "design closure" product that carefully exploits parallel processing. An executive from Synopsys (Mountain View, Calif.) described how PhysOpt, the technology brought to market as the Physical Compiler product, uses a new placement algorithm that's tied to every step of register-transfer-level synthesis.
These two products are vying to become the cornerstones of next-generation, deep-submicron design. Dolphin is a complete layout system that includes optimization, placement, routing and extraction, while Physical Compiler couples RTL synthesis with placement, but not detailed routing.
Monterey has been particularly tight-lipped about what's under the hood of Dolphin, so the ISPD presentation was a significant step forward.
Phiroze Parakh, a member of the technical staff at Monterey, said that linking synthesis with placement is a step forward, but it isn't enough. "The problem is that we need design closure, not just timing closure," he said. "You have to look at clocks, power and test."
Shankar Krishnamoorthy, R&D director for physical synthesis at Synopsys, countered with actual taped-out examples in which the "timing closure" permitted by Physical Compiler made a big difference. He spoke of a 750,000-gate block that reached 105 MHz only after three months and 14 iterations with a traditional design flow, but was quickly brought to the original goal of 125 MHz with Physical Compiler.
The annual ISPD conference, which has become the premier showcase for IC physical-design research, is attracting a good deal of industry attention. In addition to papers from professors and graduate students, the 2000 event offered "technology trends" presentations from Intel and IBM; a keynote address by Aart de Geus, chairman and CEO of Synopsys; and a panel on EDA and the Internet.
Most ISPD papers should be available at www.acm.org/ispd within a week's time, said Dwight Hill, Synopsys architect and ISPD 2000 chairman.
Parallel power
The basic idea behind Dolphin, said Parakh, is "continuous model refinement." Monterey uses parallel processing throughout the physical-design flow to speed this refinement. That's important, Parakh said, because the computational power of individual processors is no longer doubling every two years, and a new generation of shared-memory multiprocessors offers great promise for EDA software.
The downside has been the need to develop technology in load balancing and interprocess communications, in addition to IC placement and routing. Moreover, Parakh acknowledged, global optimization and routing have proved difficult to apply in a parallel fashion.
Dolphin simultaneously works on five tasks: placement, logic optimization, timing, clocking and routing. As placement is developed, for example, routing models are refined. The accuracy of congestion analysis improves as the layout progresses, and timing estimation becomes accurate enough to address crosstalk. The final placement and routing thus converge in terms of timing, power and clocking, Parakh said.
One key technology is placement refinement based on a scheme called "quadrisection" basically, a way of doing placement with progressively finer granularity, with partitions that allow parallel processing. Global optimization comes first, and then the placement moves toward greater and greater detail. Parakh said that Monterey uses partitioning technology similar to the h-Metis algorithm, cited in many academic papers.
As placement is being refined, routes are continually monitored and Dolphin is modeling timing and congestion. Any given net might start at a high level of abstraction and then work its way through placement and optimization. A statistical model is created for each net.
As quadrisection proceeds, partitions are physically smaller and spaced farther apart, and parallel threads focus more on local improvements. In a multiprocessing system, Dolphin can launch three parallel threads to solve three local optimization problems. But the original global optimization doesn't allow much parallelism, Parakh said.
Further details were described by Emre Tuncer, member of the technical staff at Monterey. He said Dolphin uses a global refinement approach to static-timing analysis, with an algorithm that "runs multiple passes and tightens the windows." It uses timing "cones," he said, each of which can be run as an independent job.
"The problem arises when cones overlap," Tuncer said. "We let a thread go through an overlap area and mark its way. If another thread comes there, it will know a job is already in progress."
Problem regions
Dolphin performs a host of logic optimizations, Tuncer said. First the tool selects problem regions. It then looks for ways to minimize global slack, correct slope violations and relieve congestion. Dolphin can run optimizations like buffering and sizing in parallel.
Clock trees are synthesized and continually refined, Tuncer said. In multiclock systems, each clock domain can run as a separate job in parallel. Power distribution is accounted for during congestion analysis, and the scan chain is continually updated.
Detailed placement is "really localized," he said, to take maximum advantage of parallelism. Dolphin creates "windows" in which placement optimization takes place. Global routing, on the other hand, does not lend itself well to parallelism, but Monterey came up with a scheme that helps out somewhat: a multilevel partitioning approach to distribute nets to the smallest possible partitions. Even so, the partitions can cover a large area.
The global routing feeds the detailed router, which is shape-based, gridless and "signal-integrity aware," said Tuncer. It handles antenna violations and crosstalk. Detailed routing is "easy to thread," he noted, and is heavily parallelized. The chip is divided into small partitions, each routed as a separate job.
During global refinement, parasitic extraction uses a statistical model. This is replaced with actual routing information following detailed routing. Tuncer showed the kind of speed gains various tasks get in a four-processor system. Jobs like timing analysis, sizing, clock generation, and detailed placement and routing show a close-to-linear, three- to fourfold speedup. Global routing is two to three times faster, global placement roughly three times speedier.
Synopsys' PhysOpt technology, meanwhile, is not currently designed for parallelism, Krishnamoorthy said. "Things like incremental costing are very hard to do in a parallel framework," he said. Krishnamoorthy described PhysOpt as "synthesis-based" technology that adds placement to model wiring accurately. This, he said, allows maximum flexibility for synthesis and logic optimization; offers an ability to handle flat and hierarchical design approaches; and provides a common timing engine for synthesis and physical design.
He listed its five main components as gate-level physical optimization, RTL synthesis with placement, linear wire-length placement, congestion removal and timing analysis. The PhysOpt engine ties all these capabilities together.
Structured guidance
Gate-level optimization includes resizing and buffer insertion, and goes beyond these common techniques to perform local optimizations on logic structure and gate selection. Krishnamoorthy said that fast, incremental timing analysis provides a "cost structure" to guide this optimization. Even so, he noted, gate-level optimization can't reduce levels of logic drastically.
To gain more power, he said, Synopsys has integrated placement with every step of RTL synthesis, including resource allocation, implementation selection, logic structuring, technology mapping and gate-level optimization. This opens a challenging question: What kind of placement can be done with technology-independent gates before any mapping has occurred? "We've done a lot of work in this area," he said.
Because logic structuring and gate selection are performed with a "full placement view," he said, PhysOpt can yield very accurate wire estimates early in the process.
Synopsys' FlexPlace technology does the linear wire-length minimization. Krishnamoorthy said it allows greater flexibility for synthesis algorithms than a more traditional quadratic approach. In apparent contrast to Monterey's technology, PhysOpt avoids partitioning, and is thus not restricted to placement optimization in one specific area. Cost functions, like timing and density, are part of the placement solution.
Congestion analysis captures the routability of the design. Krishnamoorthy said global routing predicts congestion accurately and quickly by using probabilistic models, as opposed to laying down all the wires.
One hallmark of PhysOpt is that a common timing engine is used throughout the tool flow. Synopsys' distributed-parasitic model takes layer-by-layer resistance-capacitance variations into account, Krishnamoorthy said. The technology uses several techniques for predicting wire loads. In one tapeout, the tool predicted 148-MHz frequency and the outcome was 147 MHz. Over 90 percent of the nets were within 10 percent of the post-route delay results.
Krishnamoorthy said the PhysOpt engine can do many tasks concurrently. For example, it applies synthesis transformations as the placement engine is converging on a solution. "A synthesis-based timing-closure system has greater flexibility and produces superior performance," he said.



