News & Analysis
G4 PowerPC and C6000: A Comparison
Rodger Hosking
9/28/2001 12:00 AM EDT
![]() |
|
|
ABOUT THE AUTHOR
Rodger H.
Hosking is vice president and co-founder of Pentek where he is
responsible for new product definition, technical marketing, and
strategic alliances. With over 25 years in the electronics
industry, he has served as engineering manager at Wavetek/Rockland,
and holds patents in frequency synthesis and spectrum analysis
techniques. He received a BS degree in Physics from Allegheny
College and BSEE and MSEE degrees from Columbia University.
|
||
Traditional DSP chips with specialized on-board hardware, such as multipliers and sophisticated address generators to facilitate digital signal processing algorithms, are now being challenged by a new generation of RISC microprocessors.
This article compares two radically different processors, Motorola's G4 PowerPC RISC device, with the new AltiVec engine, and Texas Instruments' C6000 DSP family, by examining both hardware and software issues, and by highlighting the strengths and weaknesses of each device. Topics include processor architectures, memory utilization, data movement, software, and development tools.
| Manufacturer | Name | Part Number | Type |
| Texas Instruments | C6701 | TMS320C6701 | Fixed + Floating |
| Texas Instruments | C6201 | TMS320C6201 | Fixed |
| Texas Instruments | C6203 | TMS320C6203 | Fixed |
| Motorola | 7410 | MPC7410 | Fixed + Floating |
Table 1: Texas Instruments C6000 and Motorola PowerPC processors
To meet these needs, both classes of processors must do two things very well: process data and move data. Taking best advantage of these inherent capabilities requires a deeper look inside each of the architectures.
The C6000 DSP family utilizes the VelociTI processor core, which is Texas Instruments' VLIW (Very Long Instruction Word) architecture. The core is organized as eight functional units operating on two register files (Figure 1). Since all eight units execute a unique 32-bit instruction every processor clock cycle, the core consumes a 256-bit VLIW instruction every cycle.
Figure 1: Texas Instruments VelociTI core using VLIW instructions
In order to take advantage of this powerful VLIW resource, Texas Instruments designed its optimizing C compiler and assembler tools to maximize the number of functional units performing useful operations during each cycle. Helping this process is the orthogonality of the instruction set for each of the eight units, allowing many typical operations to be performed on more than one functional unit. This gives the optimization tools more flexibility in resource allocation.
Motorola first introduced the AltiVec technology to the popular PowerPC product line with the advent of the MPC74xx series, also known as the G4 (generation 4) PowerPC. These processors augment the basic MPC750 CPU core with the AltiVec vector engine (Figure 2), which performs both fixed and floating-point functions on 128-bit vectors.
Figure 2: Motorola's G4 PowerPC core with AltiVec vector unit
The AltiVec unit can process several data formats including 8-, 16-, and 32-bit signed and unsigned integers, as well as 32-bit IEEE floating-point words. During each processor clock cycle the unit processes one complete vector, regardless of the data type.
The resulting peak calculation rates for the C6000 and 7410 devices are shown in Table 2, along with the processor clock speed and corresponding instruction cycle times. The C6000 family executes up to eight instructions-per-cycle using its VLIW architecture and the PowerPC up to three through the use of some compound instructions, such as multiply-add-store.
Since the C6201 and C6203 devices are fixed-point processors with scalar execution units, the number of fixed-point operations-per-second equals the number of instructions-per-second. The C6701 can execute both fixed- and floating-point instructions but only six of the execution units are capable of floating-point operation, resulting in the lower number shown.
Using the AltiVec engine alone, the MPC7410 can perform 8000 million fixed-point operations-per-second (fixed-point MOPS) with sixteen 8-bit integers in the vector, but only 4000 or 2000 MOPS with 16- and 32-bit integers, respectively. For floating-point operations, the 7410 packs four 32-bit IEEE floating-point values in the vector for the rate of 2000 MOPS. For compound instructions, this rate is proportionally higher.
| C6701 | C6201 | C6203 | 7410* | |
| Clock (MHz) | 167 | 200 | 300 | 500 |
| Instruction Cycle (ns) | 6 | 5 | 3.33 | 2 |
| Instructions Per Cycle | 1 - 8 | 1 - 8 | 1 - 8 | 1 - 3 |
| Million Instructions/Sec. | 1333 | 1600 | 2400 | 500 |
| Million Fixed-Point Ops/Sec. | 1333 | 1600 | 2400 | 8000 |
| Million Floating-Point Ops/Sec. | 1000 | | | 2000 |
Table 2: A comparison of peak processing rates for the TI C6000 DSP engines and Motorola MPC7410
The impressive computational numbers shown for the C6000 family are due to its VLIW architecture. The 7410 owes its horsepower to the AltiVec engine, sometimes referred to as a SIMD (single instruction multiple data) architecture. Note that these are two very effective, although completely different, ways of boosting performance.
| C6701 | C6201 | C6203 | 7410 | |
| Program Memory | 64 KB | 64 KB | 384 KB | 32 KB |
| Data Memory | 64 KB | 64 KB | 512 KB | 32 KB |
| Total Memory | 128 KB | 128 KB | 896 KB | 64 KB |
Table 3: On-chip memory resources for the TI C6000 DSP engines and Motorola MPC7410
The 7410 has relatively little on-chip memory but instead relies on an external L2 cache.
On the other hand, the C6203 and 7410 both have two external buses, so that while one bus is busy moving data into and out of memory, the other can handle program and data fetches for the CPU.
| C6701 | C6201 | C6203 | 7410 | ||
| Bus 1 | Bus Interface Data Width (Bits) Clock Rate (MHz) Transfer Rate (MB/Sec.) |
Prog 24 / 32 167 667 |
Prog 24 / 32 200 800 |
Prog 24 / 32 150 600 |
MPX 32 / 64 133 1064 |
| Bus 2 | Bus Interface Data Width (Bits) Transfer Rate (MB/Sec.) |
|
|
XBUS 32 150 |
Cache 64 500 |
| DMA | Number of Channels | 4 | 4 | 4 | |
Table 4: External data and address buses for the TI C6000 DSP engines and Motorola MPC7410
The primary bus on the C6203 is the 32-bit EMIF bus (similar to the C6201 and C6701) with programmable characteristics for glueless interfaces to many different types of external memory including SRAMs, SDRAMs, Sync-FIFOs, FLASH, and EEPROMs. The secondary 32-bit XBUS is designed to support direct connections to high-speed synchronous devices such as SDRAM and Sync-FIFOs.
The 7410 uses the 64-bit MPX bus for its primary interface, which is connected to an external device called a node controller. The node controller includes device-specific interfaces to different types of memory and peripherals. The secondary bus on the 7410 is the 64-bit L2 cache interface for up to 2 MBytes of external cache memory. This is required to support the high-speed transfers to the internal L1 cache for the CPU and AltiVec engines.
Typical of most DSP chips, the C6000 devices have a built-in, programmable four-channel DMA controller capable of quickly moving data between internal and external memory without requiring intervention by the CPU. The PowerPC must rely on external devices (namely, the node controller) to handle this function.
The user-configurable Code Composer Debuggerwhich features tools for displaying data graphically, profiling CPU utilization, and managing software project directorieshandles debugging.
TI's RTDX (real-time data exchange) coupled with Pentek's SwiftNet supports development and runtime connectivity between the target DSP and the host workstation. TI's XDAIS (eXpressDSP Algorithm Standard) defines a platform for writing algorithms so that third-party library offerings are all compatible with TI's environment.
Lastly, TI defined a small, scalable real-time kernel operating system called DSP/BIOS, which forms a consistent, low-level driver environment for real-time operating systems and code generation tools.
| Tools | C6000 | G4 PowerPC |
| C Compiler / Assem Debugger Host / Target Connect |
TI Optimizing Tools Code Composer RTDX / SwiftNet |
DIAB C/C++ CrossWind VxWorks |
| DSP Libraries | XDAIS Algorithm Std | VSIPL |
| Operating Systems | DSP/BIOS II | VxWorks / Linux |
Table 5: Software development tools for the TI C6000 DSP engines and Motorola G4 PowerPC
While the overwhelming majority of AltiVec PowerPCs are run under Apple Computer's MacOS, embedded applications are largely supported by Wind River's Tornado tool set. It includes the Tornado IDE as a graphical user interface for developing code under the VxWorks operating system.
Also part of the Tornado tool set is the DIAB C/C++ Compiler, which includes support for the 162 instructions added to the basic instruction set of the PowerPC as AltiVec extensions. The CrossWind Debugger, which includes several graphical tools for visualizing data and interacting easily with the compiler, handles debugging. Other utilities in the Tornado package include profilers, task monitors, and software project managers.
DSP algorithms for the PowerPC G4 are derived from an effort by the VSIPL Foruma consortium of corporate, university, and government organizations. VSIPL (Vector/Signal/Image Processing Library) is a set of library functions published in source code, which you may then optimize for specific processor architectures. A subset of the 513 functions called VSIPL Core Lite includes 127 functions; MPI Software Technology optimized these for the AltiVec engine.
The PowerPC has no on-chip DMA controllers or interrupt handlers, but relies instead on the node controller for these functions. Indeed, the choice of the node controller for a particular board design is just as important as the PowerPC itself, since all traffic in and out of the processor must first pass through this device. You must tailor low-level libraries for handling interrupts, DMA transfers, and serial and parallel peripheral interfaces specifically for the node controller, since the PowerPC is completely unaware of these functions.
The outstanding performance of the C6000 depends on the VLIW architecture, which dictates that critical code must fit within, and be executed from, on-chip program memory.
Familiarity with the optimization features of the TI compiler and assembler is essential for efficiently packing this code across the eight execution units. The 7410 gains its performance from the AltiVec engine with its SIMD architecture, but it must first load data into cache memory before its impressive benchmarks are achieved. Moving data in and out of cache is, once again, the job of the node controller.
Some algorithms, such as the FFT, can take good advantage of the vector math performed by the 7410. Its benchmark for a 1024-point, floating-point complex FFT is just 27 µsec, compared with 108 µsec for the C6701. Other algorithms, such as turbo decoders for wireless applications, are handled more efficiently by the VLIW architecture of the C6203.
Texas Instruments' extensive third-party program, the XDAIS standard, and the DSP/BIOS kernel provide, or strongly promote, major tools for the C6000. On the other hand, companies other than Motorola provide almost all of the tools for embedded applications for the 7410. In spite of this minimal contribution from the chip vendor, the PowerPC benefits from a much broader base of offerings, with many of the tools available as shareware.
Lastly, for systems that require high-speed data interfaces, it is essential that the board and system architectures can sustain the peak and average data rates to and from each processor. It is not unusual for I/O bottlenecks to limit the computational power of these new high-performance processors unless fast inter-board and inter-processor links are present.




