News & Analysis
Multiprocessing drive promises profits, problems for Intel
Rick Merritt
3/18/1999 3:54 PM EST
NEW YORK A key building block in Intel Corp.'s drive toward a new performance plateau for Pentium-based systems was unveiled here Wednesday (March 17) by the company's Corollary subsidiary. The move came with Corollary's first public demonstration of its Profusion silicon, a chip set for eight-way multiprocessing systems.
While Profusion and a new turn of Pentium III processors launched here hold the promise of bolstering Intel's bottom line, the high-performance push also presents unique design challenges for Intel and for competitors planning their own high-performance products.
Intel's new products will boost the company into a realm where it could charge more than $3,600 for a 500-MHz Pentium III Xeon with 2 Mbytes of L2 cache, and perhaps $50,000 for a fully configured eight-way system using the parts a promising boon at a time of sub-$1,000 desktops with sub-$100 CPUs. But efforts to design adequate memory and I/O subsystems to keep pace with expected advances in the Intel processor bus will tax designers, especially in the multiprocessing arena that's seen as the next tier in the advancing PC architecture.
Symmetric multiprocessing systems are among the most tortuous of computer-engineering designs, a fact well documented by Corollary's four-year effort to get Profusion out the door. The chip set, originally geared for the 66-MHz bus of the Pentium Pro, was redesigned for the 100-MHz Pentium III bus. Corollary's original Pentium Pro boards were redesigned to reduce the costs and size of the boards, which Intel plans to start selling before July.
Competitors promoting a move to switched fabrics or directory-based systems are already criticizing Profusion's bus-based architecture as limited. But Corollary president George White defended the Profusion architecture, which links into a single backplane two mezzanine boards that hold four processors each.
"There will be a few more generations where you should be able to put four CPUs on a bus, but four to five years out, you will have to move to a switched fabric design," said White.
John Miner, general manager of Intel's enterprise server group, said Profusion is "a key ingredient in taking a high-volume server from the four-way systems we have been shipping since '93 to a new category of processing."
While Intel aims at a new volume space for high-performance systems, a number of OEMs are already shipping Pentium III Xeon systems with larger numbers of processors based on homegrown designs and chip sets. And at least one new competitor hopes to leapfrog Corollary's work with a switch-based system based on the upcoming K7 processor from Advanced Micro Devices Inc.
"We have a point-to-point architecture that puts CPUs on a network and runs everything through a switched fabric with as many end points as you need," said Rick Shriner, chief executive officer of Poseidon Technology Inc. (San Jose, Calif.), and former chief executive officer of PowerPC clone maker Exponential Technology.
The Poseidon silicon, which should ship by the end of the year, is geared to the 200-MHz, 3.2-Gbyte/second Alpha EV-6 bus used on the AMD K7. Its 14-port switch architecture is geared for a range of four- and eight-way servers. The company, founded by Daniel Foo of Digital Equipment Corp., got its start in 1993 designing four-way chip sets for the Pentium, but switched to AMD when it could not obtain a license to the Intel P6 processor bus.
Lower-end competitors also dog Corollary's heels. At Reliance Computer Corp., founder and chief executive officer Raju Vegesna said Reliance will introduce a new generation of chip-set technology this spring that will leverage the Xeon technology in the symmetric-multiprocessing market. "What we see happening is that in the future, the baseline in servers will be four-way, compared with today's server sweet spot of two-way systems. And by next year, the baseline for workstations will be two-way, moving up the ladder," Vegesna said.
Separately, a number of OEMs are already shipping systems with more than eight Pentium III CPUs based on their own homegrown architectures and core logic, a trend that is expected to accelerate with the rollout of Intel's Merced next year. For systems beyond eight CPUs, the Profusion architecture "just doesn't cut it," said Martin Whittaker, who heads R&D at Hewlett-Packard Co.'s PC server group. Nevertheless, HP expects to adopt Profusion for what it considers midrange systems. "Its been a long haul for Corollary, and they still have some implementation issues they are working through, but their fundamental architecture is solid," Whittaker said.
Another challenge Corollary faces is in memory subsystems. Profusion uses PC-100 memories now and may move to 133-MHz SDRAMs for a follow-on generation. However, an interim step may yet be required before higher-performance Direct Rambus memories are able to take on the memory densities servers require.
White suggested one solution might employ standard SDRAMs with "specialized silicon using high-performance signaling technology to sit between the memory subsystem and the chip set" to help ratchet up the frequency of the memory subsystem without driving up the pin count of the chip set. That approach would likely emerge as an ad hoc solution from various designers, rather than as any standard.
For its part, Silicon Graphics Inc. (Mountain View, Calif.) adopted a unique memory architecture for a four-way workstation it showed in tandem with the Pentium III launch. SGI's Cobalt is a three-piece chip set that embodies "everything we've learned" over the past few years about multiprocessing workstation development, including a unified memory architecture in which main memory handles virtually all of the requirements of the system, said Tom Furlong, senior vice president and general manager of the workstation division at SGI.
SGI's memory controller interfaces to PC-100 SDRAM, in a wide configuration that delivers 3.2 Gbytes/s of bandwidth. SGI's core competency, a graphics controller core with 8 million transistors, resides on the same die. Furlong said SGI placed the graphics core right next to the memory controller "to get a deep and wide memory access, without the use of any special memories. The graphics memory, the video frame buffer, Ethernet buffer memory all of that runs out of main memory."
At the I/O crossroads
On the I/O front, Corollary is implementing 64-bit, 66-MHz PCI today. But the company has customers on both sides of a roaring debate over next-generation architectures the so-called Next-Generation I/O backed by Intel and Dell among others, and the Future I/O plan backed by Compaq, IBM and HP.
"Intel wants to ship NGIO in the time frame of its Foster CPU, which ships late next year," said one source close to the debate. "I think Intel has a belief that whoever gets to the market first with something that works will win, and that's a sticking point right now. I'm not as optimistic [about a resolution between the two camps] as I once was."
"I think the NGIO and FIO efforts are technically much closer than most people know," said Shriner of Poseidon. "I hope they converge into one standard, but if it takes them longer than six months we will be hurting."
Shriner took comfort, however, in the fact that "the world is only now moving to 64/66 PCI and the number of available cards for it is not that large. There will be a lag in transition to any new I/O standard."
"I'm not radically perturbed about the I/O situation," said White. "That is a half-step removed from what I have to build."
A more immediate transition will involve the processor bus itself, which is expected to shift up to 133 MHz with Intel's next-generation server chip, dubbed Cascades. Profusion's use of mezzanine processor boards with their own processor voltage regulators is expected to help the design migrate to that CPU.
A new design, however, would be required to accommodate the Foster CPU, a new 32-bit microarchitecture Intel is expected to roll out late next year, at about the same time as the 64-bit CPU. The South Carolina-based Octascale design team Intel acquired last year from NCR Corp. is believed to be at work on a multiprocessing chip set targeted at Foster systems.
Whether Intel can turn its complex multiprocessor designs fast enough to keep up with those processor bus, memory and I/O transitions could be a major factor in how well it executes its plans to reap new profits from the server market.
In the meantime, Intel is making hay with claims of the raw performance of its uniprocessors. At the rollout, Miner showed EDA benchmarks that rated performance for Intel's 550-MHz Pentium III Xeon systems with 512 kbytes cache as "up to a third greater than Sun" Ultra 60 workstations at 340 MHz with 4 Mbytes of L2 cache. "If you were developing Sparc processors, you would probably want to do it on a Pentium III-based system," quipped Miner.
In lots of 1,000, the Pentium III Xeon chips range in price from $931 for a 500-MHz CPU with 512 kbytes cache to $3,692 for a chip with 2 Mbytes cache. The 550-MHz versions of the chip will spill out over the next two quarters. Board-level products based on Profusion will be available in the second quarter.
Additional reporting by David Lammers.



