News & Analysis
Researchers: time to overhaul design
Richard Goering
12/6/2002 12:15 PM EST
MONTEREY, Calif. It's time to change the way digital circuits are designed, researchers told the ACM/IEEE Tau workshop here last week. Speakers campaigned for a shift from static timing analysis to statistical, or "probabilistic," analysis; laid out new ways to combat parasitics; and emphasized the importance of delay fault testing.
The rub is that statistical timing analysis tools are not here yet, and some technical problems still beg for solutions.
Tau is an annual workshop on timing issues in the specification and synthesis of digital systems. This year, keynote speaker Ivan Sutherland, a vice president and fellow at Sun Microsystems Laboratories, called for a shift to asynchronous design to cope with increasing system complexity.
Statistical timing analysis would signal a departure from the static timing analysis that underlies today's IC design flow. But presenters said that a statistical approach is the only way to accurately account for device, interconnect and process variations. They said that it can reduce the excessive number of analysis runs now required for timing closure, along with the inaccuracy of current techniques.
"It's not just that we need to tune up static timing analysis," said Kurt Keutzer, professor of electrical engineering and computer science at the University of California at Berkeley. "I think we need to fundamentally rethink the way we design circuits." He asked designers to envision chips not as deterministic devices but as "stochastic computing mechanisms."
"I believe the era of probabilistic design is here and that deterministic design is gone," said Chandu Visweswariah, research staff member at IBM's Thomas J. Watson Research Center. He called for a statistical approach to modeling and methodology as well as analysis.
But Avi Efrati, system architect for performance verification at Intel Israel, sounded a note of caution. "Static timing analysis is a key component of chip design," he said. "It will evolve to support other things, but I'm not sure a revolution is coming so quickly."
Static timing analysis today is deterministic, meaning that the analysis uses fixed delays for gates and wires and doesn't consider statistical variations in the underlying silicon. In the current methodology, best-case, worst-case and nominal parameter sets are constructed using Spice simulation. The timing analyzer then runs several times to report the resulting numbers.
A statistical approach would use random variables, not fixed delays. It would produce a statistical distribution, not lists of best- and worst-case numbers. It could, for instance, tell the designer that 50 percent of his or her circuits will run at 225 MHz. In this sense, yield and timing prediction become pretty much the same thing.
While static timing analysis handles die-to-die variations well, it can't accurately model variations within a single die, researchers say. Process variations are perhaps the most obvious problem, but there's also a "fundamental randomness" in the behavior of silicon structures, Keutzer said.
For example, he noted, gates vary according to width, length, threshold voltages, oxides and doping. Interconnects vary according to line width, metal thickness and interlayer dielectric thickness. And parametric delay faults occur because of random particles landing on chips.
"If you try to sweep all these variations under the rug of worst-case models, the result will be overly conservative," he said. "Worse yet, it's not reliable. Static timing analysis can't catch violations due to uncorrelated path delays." Static timing analysis assumes that gate and wire delays are perfectly correlated across a chip, which is not the case, he said.
As a result, Keutzer said, it may take six weeks to close timing on a 200-MHz ASIC at 0.13 micron and when the chip comes back, designers might find it actually runs at 250 MHz.
But, Keutzer noted, there are some technical challenges to be resolved with static timing analysis. One is economical delay testing. "The report says that 50 percent of my chips will run at 225 MHz, but which 50 percent?" he asked. "You've got to test each chip at speed."
Real-world experience
At IBM, Visweswariah said, designers want to predict parametric or "circuit-limited" yield loss, but what's needed is a combination of statistical timing, yield prediction, design-centering and design-for-manufacturability techniques. He used charts and graphs to show that worst-case static timing analysis is "quirky at best" when it comes to shedding light on such comparisons as yield vs. slack.
Static timing analysis, he said, takes in netlists, assertions, delays and slew models, and produces reports on slack and diagnostics. A statistical analysis would also take information on sources of variability and provide a yield curve along with diagnostics.
ASIC designers don't want a huge methodology change, but they will warm to statistical techniques once it's shown that timing runs can be substantially reduced, Visweswariah said.
Intel's Efrati said Intel wants better modeling for the impact of crosstalk on noise and timing, with an approach that allows a trade-off between accuracy and run-time. He said timing analysis has to consider cells, devices and abstract models in the same run, with a single timing graph.
Efrati called for better support for multiple-input switching and for sleep transistors used for leakage control. "Crosstalk, multiple-input switching and variability all need better solutions," he said. "We will require more dynamic, device-level analysis in timing tools."
One paper on process variations supported the case for statistical analysis and had sobering news for designers. A team from the Massachusetts Institute of Technology showed that test chip measurements varied considerably between dice from the same wafer when ring oscillator frequencies were compared.
In a session on process variations, papers called for various kinds of statistical or probabilistic analysis. Lou Schef-
fer, a fellow at Cadence Design Systems, presented an approach that computes performance as a function of process variation. It considers interchip variations as well as intrachip deterministic and statistical variations.
A paper from Ghent University in Belgium outlined a probabilistic approach to clock cycle prediction that captures the impact of parallelism using system-level interconnect prediction techniques.
One of several papers from the University of Michigan, in cooperation with Motorola, told of a new method for path-based statistical timing analysis. Another outlined an approach for evaluating worst-case skew in light of power supply variations. A third described a statistical timing technique that uses bounds and selective enumeration an attempt to tame the exponential run-times that plague some statistical techniques.
In another session, researchers attacked parasitics by looking at ways to model crosstalk noise. One paper chose to turn the monster against itself. Himanshu Kaul of the University of Michigan examined shielding techniques and proposed that shielding lines could not only be tied to ground or Vdd, but could carry active signals that either canceled or reinforced coupling.
Kaul showed data, based on extractions and simulations from proposed structures, on the behavior of signal lines surrounded by shielding. He demonstrated that, depending on the geometry of the lines, either capacitive coupling or inductive coupling was the dominant impact on timing. In the case of capacitive coupling, Kaul showed, driving the shielding lines in the same direction as the transitions on the signal lines resulted in significantly lower delay. But when the coupling mechanism was primarily inductive, driving the shield lines with the inverse of the signal resulted in a significant improvement in delay.
Taking an entirely different tack, Brian Floyd, a researcher at IBM's Thomas J. Watson Research Center, suggested a radical approach to eliminating the impact of parasitics on clock distribution: send the clock through the air. Floyd described work, begun with associates at the University of Florida (Gainesville) and continued at IBM, on using antennas to transmit clock signals across an IC wirelessly.
Floyd's work uses 15-GHz power amps, low-noise amplifiers (LNAs) and frequency dividers with planar metal dipole antennas all fabricated in a stock 0.18-micron CMOS technology to replace the global clock wiring on moderate-size test chips. The antennas described were 2-mm zigzag dipoles for both transmitter and receiver ends. A 15-GHz oscillator is used to drive a power amp, which in turn drives a dipole antenna fabricated in one of the upper metal layers of the chip. Receiver antennas elsewhere on the die pick up the wave from the dipole and relay it to an LNA, which drives an 8-to-1 frequency divider, producing a 1.875-GHz clock synchronized to the original 15-GHz signal.
In another Tau session, Bernd Koenemann of Cadence discussed yet another aspect of delay variability: the growing need for manufacturing test to cover not just stuck-at faults, but delay faults. Koenemann defined delay faults as situations in which a net functions properly but fails to meet its timing requirements.
Manufacturing defects
Such faults can be caused by unanticipated process variations but, Koenemann said, they are often caused by manufacturing defects that are significant but not great enough to cause an outright short or open circuit. He said that such faults will cause field failures, but cannot generally be detected by stuck-at tests. Iddq and delay testing are necessary. Koenemann warned that delay faults were now so common that it was unlikely to get below 2,000-ppm failure rates without delay fault testing.
For delay testing, transitions must be inserted at the inputs to a block and the results must be latched precisely one clock period later. To accomplish this, Koenemann suggested using the transitions created by the scan chain itself as it shifts data through the input latches. By ensuring that on the last shift cycle the input latches change state that is, that the bit being shifted into a latch differs from the one being shifted out of it test engineers can ensure a transition. Then it is necessary only to switch the output-side latch from test to normal mode and clock it one clock cycle later.
Finally, the author cited the emergence and growing understanding of DFT-aware testers, with their potentially great impact on test cost.
In Koenemann's view, the most interesting new development is photon-emission systems that are able to detect instantaneous emissions of photons from transistor junctions during transition. These tiny bursts of light are visible through the substrate of a flip-chip-mounted die, Koenemann said, and can give a transition-by-transition, transistor-by-transistor visual map of the activity in a circuit with enormous temporal resolution. He called the promise for fault diagnosis wonderful.
Further information about the Tau workshop available online.



