News & Analysis

G4 PowerPC and C6000: A Comparison

Rodger Hosking

9/28/2001 12:00 AM EDT


 

 
ABOUT THE AUTHOR

Rodger H. Hosking is vice president and co-founder of Pentek where he is responsible for new product definition, technical marketing, and strategic alliances. With over 25 years in the electronics industry, he has served as engineering manager at Wavetek/Rockland, and holds patents in frequency synthesis and spectrum analysis techniques. He received a BS degree in Physics from Allegheny College and BSEE and MSEE degrees from Columbia University.
 

Traditional DSP chips with specialized on-board hardware, such as multipliers and sophisticated address generators to facilitate digital signal processing algorithms, are now being challenged by a new generation of RISC microprocessors.

This article compares two radically different processors, Motorola's G4 PowerPC RISC device, with the new AltiVec engine, and Texas Instruments' C6000 DSP family, by examining both hardware and software issues, and by highlighting the strengths and weaknesses of each device. Topics include processor architectures, memory utilization, data movement, software, and development tools.


Chip Architectures
Although new processors are introduced continually, we will focus on devices that are currently available (Table 1). These devices are all targeted for high-performance computing tasks, but for significantly different markets. The C6000 devices are aimed at highly-embedded, real-time applications such as wireless telecom, wideband modems, and real-time image processing. The major market for the PowerPC is Apple Computer's highest-performance G4 workstation, where it must support a full operating system, user interfaces, and network peripherals.


Manufacturer Name Part Number Type
Texas Instruments C6701 TMS320C6701 Fixed + Floating
Texas Instruments C6201 TMS320C6201 Fixed
Texas Instruments C6203 TMS320C6203 Fixed
Motorola 7410 MPC7410 Fixed + Floating

Table 1:  Texas Instruments C6000 and Motorola PowerPC processors

To meet these needs, both classes of processors must do two things very well: process data and move data. Taking best advantage of these inherent capabilities requires a deeper look inside each of the architectures.

The C6000 DSP family utilizes the VelociTI processor core, which is Texas Instruments' VLIW (Very Long Instruction Word) architecture. The core is organized as eight functional units operating on two register files (Figure 1). Since all eight units execute a unique 32-bit instruction every processor clock cycle, the core consumes a 256-bit VLIW instruction every cycle.

Figure 1:  Texas Instruments VelociTI core using VLIW instructions

In order to take advantage of this powerful VLIW resource, Texas Instruments designed its optimizing C compiler and assembler tools to maximize the number of functional units performing useful operations during each cycle. Helping this process is the orthogonality of the instruction set for each of the eight units, allowing many typical operations to be performed on more than one functional unit. This gives the optimization tools more flexibility in resource allocation.

Motorola first introduced the AltiVec technology to the popular PowerPC product line with the advent of the MPC74xx series, also known as the G4 (generation 4) PowerPC. These processors augment the basic MPC750 CPU core with the AltiVec vector engine (Figure 2), which performs both fixed and floating-point functions on 128-bit vectors.

Figure 2:   Motorola's G4 PowerPC core with AltiVec vector unit

The AltiVec unit can process several data formats including 8-, 16-, and 32-bit signed and unsigned integers, as well as 32-bit IEEE floating-point words. During each processor clock cycle the unit processes one complete vector, regardless of the data type.

The resulting peak calculation rates for the C6000 and 7410 devices are shown in Table 2, along with the processor clock speed and corresponding instruction cycle times. The C6000 family executes up to eight instructions-per-cycle using its VLIW architecture and the PowerPC up to three through the use of some compound instructions, such as multiply-add-store.

Since the C6201 and C6203 devices are fixed-point processors with scalar execution units, the number of fixed-point operations-per-second equals the number of instructions-per-second. The C6701 can execute both fixed- and floating-point instructions but only six of the execution units are capable of floating-point operation, resulting in the lower number shown.

Using the AltiVec engine alone, the MPC7410 can perform 8000 million fixed-point operations-per-second (fixed-point MOPS) with sixteen 8-bit integers in the vector, but only 4000 or 2000 MOPS with 16- and 32-bit integers, respectively. For floating-point operations, the 7410 packs four 32-bit IEEE floating-point values in the vector for the rate of 2000 MOPS. For compound instructions, this rate is proportionally higher.


  C6701 C6201 C6203 7410*
Clock (MHz) 167 200 300 500
Instruction Cycle (ns) 6 5 3.33 2
Instructions Per Cycle 1 - 8 1 - 8 1 - 8 1 - 3
Million Instructions/Sec. 1333 1600 2400 500
Million Fixed-Point Ops/Sec. 1333 1600 2400 8000
Million Floating-Point Ops/Sec. 1000 2000

* AltiVec Unit Only

Table 2:  A comparison of peak processing rates for the TI C6000 DSP engines and Motorola MPC7410

The impressive computational numbers shown for the C6000 family are due to its VLIW architecture. The 7410 owes its horsepower to the AltiVec engine, sometimes referred to as a SIMD (single instruction multiple data) architecture. Note that these are two very effective, although completely different, ways of boosting performance.


On-Chip Memory Resources
Although peak-processing rates may be a useful comparison, one of the main factors impacting performance is the size of internal memory. In the case of the C6000 family, the only way to realize the benefit of the VLIW architecture is to have the program code stored in on-chip program memory. Internally, this memory is organized as 256-bit words so that for each clock cycle, all 256 bits can be delivered to the execution units in parallel. If the program code for a critical loop is too large to fit in program memory, the C6000 must fetch code from external memory through a 32-bit bus. With eight 32-bit fetches per VLIW word, execution speed suffers dramatically. In the C6203, with a six-fold increase in program memory space over the C6201 and C6701, this problem is greatly reduced.


  C6701 C6201 C6203 7410
Program Memory 64 KB 64 KB 384 KB 32 KB
Data Memory 64 KB 64 KB 512 KB 32 KB
Total Memory 128 KB 128 KB 896 KB 64 KB

Table 3:  On-chip memory resources for the TI C6000 DSP engines and Motorola MPC7410

The 7410 has relatively little on-chip memory but instead relies on an external L2 cache.


External Data and Address Buses
Perhaps even more critical than internal memory is the data transfer speed to external memory and I/O peripherals. Here again, the C6000 and the PowerPC differ considerably. As shown in Table 4, the C6201 and C6701 have a single 32-bit data bus capable of running at the full processor clock-speed.

On the other hand, the C6203 and 7410 both have two external buses, so that while one bus is busy moving data into and out of memory, the other can handle program and data fetches for the CPU.


    C6701 C6201 C6203 7410
Bus 1 Bus Interface
Data Width (Bits)
Clock Rate (MHz)
Transfer Rate (MB/Sec.)
Prog
24 / 32
167
667
Prog
24 / 32
200
800
Prog
24 / 32
150
600
MPX
32 / 64
133
1064
Bus 2 Bus Interface
Data Width (Bits)
Transfer Rate (MB/Sec.)




XBUS
32
150
Cache
64
500
DMA Number of Channels 4 4 4

Table 4:  External data and address buses for the TI C6000 DSP engines and Motorola MPC7410

The primary bus on the C6203 is the 32-bit EMIF bus (similar to the C6201 and C6701) with programmable characteristics for glueless interfaces to many different types of external memory including SRAMs, SDRAMs, Sync-FIFOs, FLASH, and EEPROMs. The secondary 32-bit XBUS is designed to support direct connections to high-speed synchronous devices such as SDRAM and Sync-FIFOs.

The 7410 uses the 64-bit MPX bus for its primary interface, which is connected to an external device called a node controller. The node controller includes device-specific interfaces to different types of memory and peripherals. The secondary bus on the 7410 is the 64-bit L2 cache interface for up to 2 MBytes of external cache memory. This is required to support the high-speed transfers to the internal L1 cache for the CPU and AltiVec engines.

Typical of most DSP chips, the C6000 devices have a built-in, programmable four-channel DMA controller capable of quickly moving data between internal and external memory without requiring intervention by the CPU. The PowerPC must rely on external devices (namely, the node controller) to handle this function.


Software Development Tools
Texas Instruments supplies most of the strategic tools for the C6000 processor family, which collectively fall under TI's eXpressDSP initiative. One major component is the IDE (integrated development environment) called Code Composer Studio. It includes TI's Optimizing C Compiler and Optimizing Assembler. Both of these tools are aimed at making the most efficient use of the VLIW instruction set.

The user-configurable Code Composer Debugger—which features tools for displaying data graphically, profiling CPU utilization, and managing software project directories—handles debugging.

TI's RTDX (real-time data exchange) coupled with Pentek's SwiftNet supports development and runtime connectivity between the target DSP and the host workstation. TI's XDAIS (eXpressDSP Algorithm Standard) defines a platform for writing algorithms so that third-party library offerings are all compatible with TI's environment.

Lastly, TI defined a small, scalable real-time kernel operating system called DSP/BIOS, which forms a consistent, low-level driver environment for real-time operating systems and code generation tools.


Tools C6000 G4 PowerPC
C Compiler / Assem
Debugger
Host / Target Connect
TI Optimizing Tools
Code Composer
RTDX / SwiftNet
DIAB C/C++
CrossWind
VxWorks
DSP Libraries XDAIS Algorithm Std VSIPL
Operating Systems DSP/BIOS II VxWorks / Linux

Table 5:  Software development tools for the TI C6000 DSP engines and Motorola G4 PowerPC

While the overwhelming majority of AltiVec PowerPCs are run under Apple Computer's MacOS, embedded applications are largely supported by Wind River's Tornado tool set. It includes the Tornado IDE as a graphical user interface for developing code under the VxWorks operating system.

Also part of the Tornado tool set is the DIAB C/C++ Compiler, which includes support for the 162 instructions added to the basic instruction set of the PowerPC as AltiVec extensions. The CrossWind Debugger, which includes several graphical tools for visualizing data and interacting easily with the compiler, handles debugging. Other utilities in the Tornado package include profilers, task monitors, and software project managers.

DSP algorithms for the PowerPC G4 are derived from an effort by the VSIPL Forum—a consortium of corporate, university, and government organizations. VSIPL (Vector/Signal/Image Processing Library) is a set of library functions published in source code, which you may then optimize for specific processor architectures. A subset of the 513 functions called VSIPL Core Lite includes 127 functions; MPI Software Technology optimized these for the AltiVec engine.


Conclusions
When choosing between the C6000 and PowerPC G4 architectures, several factors come into play. For demanding real-time applications with high-speed streaming data interfaces, the on-chip DMA controllers of the C6000 can become extremely useful. Likewise, if the system needs to respond quickly to several asynchronous external events, the on-chip interrupt handlers on the C6000 can help reduce interrupt latency.

The PowerPC has no on-chip DMA controllers or interrupt handlers, but relies instead on the node controller for these functions. Indeed, the choice of the node controller for a particular board design is just as important as the PowerPC itself, since all traffic in and out of the processor must first pass through this device. You must tailor low-level libraries for handling interrupts, DMA transfers, and serial and parallel peripheral interfaces specifically for the node controller, since the PowerPC is completely unaware of these functions.

The outstanding performance of the C6000 depends on the VLIW architecture, which dictates that critical code must fit within, and be executed from, on-chip program memory.

Familiarity with the optimization features of the TI compiler and assembler is essential for efficiently packing this code across the eight execution units. The 7410 gains its performance from the AltiVec engine with its SIMD architecture, but it must first load data into cache memory before its impressive benchmarks are achieved. Moving data in and out of cache is, once again, the job of the node controller.

Some algorithms, such as the FFT, can take good advantage of the vector math performed by the 7410. Its benchmark for a 1024-point, floating-point complex FFT is just 27 µsec, compared with 108 µsec for the C6701. Other algorithms, such as turbo decoders for wireless applications, are handled more efficiently by the VLIW architecture of the C6203.

Texas Instruments' extensive third-party program, the XDAIS standard, and the DSP/BIOS kernel provide, or strongly promote, major tools for the C6000. On the other hand, companies other than Motorola provide almost all of the tools for embedded applications for the 7410. In spite of this minimal contribution from the chip vendor, the PowerPC benefits from a much broader base of offerings, with many of the tools available as shareware.

Lastly, for systems that require high-speed data interfaces, it is essential that the board and system architectures can sustain the peak and average data rates to and from each processor. It is not unusual for I/O bottlenecks to limit the computational power of these new high-performance processors unless fast inter-board and inter-processor links are present.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form