News & Analysis
MAJC combines multiprocessing, multithreading features
Will Wade
8/19/1999 4:12 PM EDT
PALO ALTO, Calif. Sun Microsystems Inc. disclosed at the Hot Chips technical conference this week that its Microprocessor Architecture for Java Computing (MAJC) designs will feature both chip multiprocessing and multithreading in order to beef up performance. While other papers at the conference also proposed using the same techniques, Sun is the only chip company that has announced plans to build and ship an architecture that will implement both methods.
"Multithreading will be the next hot topic in microprocessor architectures," said Marc Tremblay, chief architect for the MAJC project. "Most chips today that are attempting to incorporate parallel processing are doing it at the instruction level, but we are focusing on the thread level, which delivers much more performance."
Multithreading involves breaking down the various tasks in one or more applications into several threads and scheduling them for the most efficient use of the microprocessor's computing power. When one task results in a cache-miss, the processor will have an idle period while the system imports data from the main memory. Multithreading allows the CPU to work on a different thread during that down time. "The overall throughput is much higher," said Tremblay. "While the first thread will finish at about the same time as it would on a traditional CPU, in this case the same processor can complete several other things at the same time."
In addition to multithreading, the MAJC architecture (pronounced "Magic") is designed for chip multiprocessing (CMP), or the placement of several MPU cores on a single die. Just as today's high-performance servers feature several discrete processors chips, more MPUs means more computing horsepower. And putting them all on the same chip means even faster performance because of the shorter distances and faster bus speeds.
The first MAJC chip will most likely feature two processor cores, although Tremblay said the architecture can scale far beyond that. Although two is faster than one, the combination does not produce a straight linear advance because some of the power is diverted to accurately dividing up the threads and sending tasks to each core. Tremblay said that initial tests have shown a two-chip MAJC implementation running 60 percent faster than a single-chip design. "Once it scales past four processors, we could see a more linear performance increase," he predicted. Sun will disclose more details of the early MAJC chips at the Microprocessor Forum in October.
Other chip companies are also playing in the multithreading and multiprocessing fields, and the Hot Chips conference saw discussions on both topics. They included a CMP effort from Hitachi, and a student-led effort from Stanford University that will feature both multithreading and single-chip multiprocessing. Lance Hammond, project leader for Stanford's Hydra chip, said the group is developing a prototype now that may be the first actual implementation of a CMP device in silicon.
The four-core Hydra is built around Integrated Device Technology Inc.'s RC32364 processor, which uses a 0.25-micron process, and will run at 250 MHz. "As manufacturing processes and the microprocessor design is proven, it becomes easy to replicate the core several times on a single die," said Hammond. "CMP can be better than a single, large monolithic processor because there is no delay when signals don't have to travel so far across a large die."
In terms of performance gains, there are a lot of similarities between the two approaches, according to Linley Gwennap, editorial director for Microprocessor Report (Sunnyvale, Calif.). "Multithreading may deliver more performance within a given area, while chip multiprocessors will require more silicon, but both are means to get more work out of a single chip," he said. "Otherwise, I can't think of any major advantages of one technology over the other."
If multithreading makes a single core work more effectively, and chip multiprocessors put more cores on a device to work together, then the logical next step is to combine them both, as Sun is attempting.
Gwennap points out that this is only effective, however, when the software in the application and the operating system is powerful enough to break up the computing tasks into enough threads to give every core a project. Otherwise, some cores may spend significant periods of time waiting for work, rather than actually processing data.
Another factor that shouldn't be discounted is the difficulty of implementing two different architectural advances in a single generation. "Sun has really bitten off a lot," said Gwennap.
IBM Corp. has come to the same conclusions. While the company is already implementing multithreading in its Power3 chips, it has abandoned that approach in favor of chip multiprocessing for its upcoming line of Power4 devices. "The two concepts are totally compatible," said Joel Tendler, senior technology analyst for IBM's server group in Austin, Texas. "But in our view, the software is not yet mature enough to efficiently separate the applications into enough threads. When the software matures and it will then multithreading will be more feasible with chip multiprocessors."
Sun's Tremblay agrees that the key is creating enough threads. "The holy grail in the industry is breaking a single-thread application into multiple threads."
A key factor to keep in mind is the target market for the MAJC designs, which will be aimed at very high-performance, professional workstations, Web servers receiving millions of individual requests and digital home entertainment systems. One of the main functions of the chips, according to Tremblay, is to process digital versions of analog information. That could mean downloading an entire movie over the Internet, rendering a scene from a digitally animated cartoon, or processing millions of user requests at a Web site. All of those involve discrete tasks, or large tasks that can be broken into multiple smaller tasks.
"Chip multiprocessing is only useful if every processor has something to do," said Nathan Brookwood, principal analyst for Insight 64 (Saratoga, Calif.). That means in a traditional server, with one user performing one task, a second processor may not be fully utilized unless the software that divides up the tasks into threads can really allocate an appropriate amount of work to each CPU. However, CMP devices can be very effective for systems such as Web servers, which receive large numbers of small information requests that are already divided into discrete tasks. "To be sure, if you put multiple processors on a single die or in a system, you will get more performance," he said. "As long as there are multiple users or multiple tasks, then multiprocessing can benefit from multithreading."



