Design Article

RTL Timing Analysis: An Enabling Technology for Next-Generation Digital IC Designs

Wendell Baker

12/22/2003 12:00 AM EST

Today's large and complex digital integrated circuit (IC) and system-on-chip (SoC) designs often contain tens of millions of logic gates. Ensuring that these designs will function as planned and meet performance requirements involves extensive timing analysis. The most cost-effective approach—in terms of engineering resources and time-to-market—is to start performing accurate timing analysis as early as possible in the design cycle. Due to a lack of solutions, however, conventional design flows continue to perform timing analysis on gate-level representations of the design, which is both computationally intensive and time consuming.


Figure 1:  Deploying RTA causes minimal disruption to existing flows

A new technique called register transfer level (RTL) Timing Analysis (RTA) can provide accurate timing analysis of designs as early as possible in the design cycle. It can augment existing design flows without impacting or disrupting them in any way (Figure 1). By performing accurate timing analysis 40 to 50 times faster than conventional approaches, RTA decreases the loading on engineering resources and dramatically reduces the chip design cycle time (Figure 2).


Figure 2:  Every RTL design engineer in the team can have a copy of RTA

Why is RTL Timing Analysis Needed?
An engineering "rule of thumb" is that detecting, isolating, and resolving a problem at any stage of the design, implementation or deployment process costs 10 times more than addressing the same problem at the previous stage in the process. For digital ICs, timing analysis can be performed at three distinct levels of abstraction:

  • RTL level
  • Gate-level
  • Layout-level.

Performing timing analysis at the RTL level (and weeding out obvious timing problems) is faster and much more cost effective than waiting to find the same problems during timing analysis at the gate-level or layout-level.

Timing closure, regardless of what level it is performed at, is an iterative process—that is, the analyze-detect-correct steps must often be run many times to get to convergence.

Timing analysis at the layout level can be accurate, but is expensive from a cost and time standpoint. Design teams try to avoid making timing changes at the layout level because of these costs. Iterating a layout is an expensive proposition.

At the gate level, accurate timing analysis can be achieved following logic synthesis and in-place optimization (IPO), though getting to this post-IPO point using conventional flows requires physically-aware synthesis tools to provide a placed gate-level netlist. Large blocks often require days of CPU time to go through the full synthesis, physical synthesis, and timing analysis process. This stretches out the design and timing closure process, and ties up chip production software that could be used for chip implementation instead of timing analysis.

The ideal place to begin timing analysis and quickly identify paths that will cause downstream timing problems is at the RTL level. Historically, static timing analysis has always been associated with gate-level netlists. RTA changes the paradigm by moving timing analysis up to the RTL.

For example, assume that each RTL designer is working with a block of code that will ultimately equate to around 400K logic gates. Using a physically-aware synthesis approach, performing synthesis will take approximately seven hours. Performing timing analysis on the ensuing gate-level output will take approximately three hours. By comparison, performing RTA on the same block takes approximately 15 minutes, or 40 times faster (Figure 3).


Figure 3:  RTA generates timing reports 40x faster

Conventional Timing Analysis Versus RTA
Again, a conventional gate-level timing approach has two primary drawbacks. The first is the potential negative impact on the project schedule. Depending on the size of the design, it is possible for a large block to consume days of CPU time during the process of generating a timing report. Considering that timing closure is an iterative process, this can add weeks to the schedule as designers wait days after changes are made to see the impact.

A second drawback is the cost of tying up chip production software for use in analyzing RTL code. Physically-aware synthesis seats typically cost $300K and are usually purchased for chip production, not for use in a timing analysis flow. These seats are tied up in extensive RTL timing closure iterations and are not available for chip tapeouts.

By comparison, RTA operates at the RTL and provides fast turnaround on timing reports.

How Does RTA Work?
RTA is not based on some form of fast synthesis. It operates directly on the RTL and does not employ any gate-level representations.

Library Characterization
Before considering the algorithms used by RTA as it performs its analyses, it is necessary to understand the library characterization step. InTime Software, for example, has developed an application that performs library characterization and generates design kits employed by RTA. The application accepts industry-standard Liberty and LEF logical and physical representations of a cell library and automatically generates a corresponding InTime Design Kit.

The design kit is not a library of characterized gates, but is instead a database of characterized logical functions such as counters and XOR trees. The library characterization step captures the behavior of these logical functions, including timing and area estimates, for later use. Design kits can be supplied "off-the-shelf" by InTime for the majority of today's leading foundries. For libraries that are not currently supported, the library characterization application is available to end users enabling them to generate their own design kits.

An Example of an RTA Application
An RTA application, such as InTime's Time Director, accepts RTL code for the design block, time constraints associated with that block in industry-standard SDC format, and the design kit associated with the target cell library.

As the RTA application reads in the RTL, it converts it into a netlist of entities called work functions. The way to visualize this is that there is a certain amount of computation that takes place between inputs and outputs of a continuous assignment or an "always" block. Each work function is an abstraction of such a block that directly maps onto an equivalent function in the RTA application's Design Kit. This approach avoids details and complexity of an explicit gate-level representation.

Once the RTL has been converted into a netlist of work functions, the RTA application performs identical logical operations to those performed at the gate level, including common sub-expression elimination, constant propagation, loop unrolling and the removal of all redundant functional computations.

The RTA application uses the minimal irredundant network of work functions to perform a "virtual placement" of these functions. This is used to generate accurate area estimates, which are subsequently used to generate accurate time estimates. Used in conjunction with the design kit, it understands how various synthesis engines weight various factors and modify their implementation strategies to meet specified timing constraints. All of these factors are taken into account when generating the ensuing timing report.

How Accurate is RTA?
RTA acts as a "gatekeeper" whose role is to quickly and efficiently identify timing paths that will not meet their timing requirements during implementation. Any modern synthesis tool can achieve the required timing if its initial seed is within a 20 to 30% range—it is the paths that are off by 80%, 150%, 200%, and higher that cause problems. Using RTA, these paths can be detected, identified, and corrected while working at the RTL.

When it comes to accuracy, it is also necessary to qualify which timings are being compared. As was previously discussed, the first point at which relatively accurate timing analysis can be achieved with conventional flows occurs at the gate-level following synthesis, final placement, and IPO. In this case, timing reports generated by RTA typically correlate to post-IPO delays with an error rate of 20% or less.


About the Author
Wendell Baker joined InTime in September 2001. He has an extensive background in EDA, having started at SDA (which merged with ECAD to form Cadence), in 1984. Subsequent EDA experience includes technical and technical management positions at Cadence, Redwood Design Automation, Escalade, and most recently Atrenta, where he was Director of Engineering. Wendell has BSCS, MSEE and Ph.D. degrees from UC Berkeley. His email address is wbaker@intimesw.com.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form