Design Article

Dealing with the benefits, pressures of system dissaggregation

Tim Miller, President, StarFabric Trade Association, Vice President, Marketing and Sales, StarGen, Inc., Marlborough, Mass.

1/17/2003 8:19 AM EST

Dealing with the benefits, pressures of system dissaggregation
In today's market environment, projects that will gain traction for either internal or external investment must either be performance-driven or efficiency-driven, or both. Performance driven projects must greatly increase the performance of one or more mission critical parameters, while maintaining current power and cost budgets. Efficiency driven projects must maintain current performance levels while greatly reducing cost, power and, in many cases, physical space budgets.

One architectural design approach that is emerging as a way to answer such conflicting demands is that of disaggregation. With this design approach, functional subsystems are isolated on their own circuit board or blade. Connecting these subsystems together via a switched interconnect technology, such as StarFabric/PCI Express Advanced Switching (AS), creates the system. This approach offers many benefits due to the fact that this adaptive infrastructure can be configured and re-configured as required. The system architecture has benefits throughout a system lifecycle and, in many cases, can extend the useful life of a system indefinitely.

But achieving the benefits of system disaggregation puts enormous pressure on the interconnect architecture selection, regardless of the designer's choice, whether done with StarFabric or PCI Express AS to meet critical requirements across numerous embedded applications.

With disaggregation, the system is architected so that each blade, or board, performs a single function. The basic building blocks of the system would be processor blades and I/O blades, in which the blades become the atomic components. The processor blades could contain CPUs, DSPs, NPUs, or an application acceleration (for example, encryption) processor. Each blade would contain one type of processor, yet could contain multiple types. Similarly, I/O blades would contain the input and output devices required by the particular system, the most common being storage blades and network blades. The storage blades would contain any number of disk drives with any interface (e.g., SCSI, IDE, or Fiber Channel). The disk controller would interface to the fabric via PCI or PCI-X today and PCI Express in the future. The network blades would likely be Gigabit Ethernet and in the future 10G Ethernet. In embedded distributed processing systems, there is an array of application-specific I/O devices. Depending on the system, the I/O blades could control a wide range of devices, including cameras, video display devices, scanners, etc.

These components could either be co-located in a rack-based system or could be some distance away. StarFabric, for example, supports interlink distances of up to 40 feet. This distance can be extended to 80 feet if a switch exists in the middle, eliminating traditional mechanical constraints while creating new degrees of freedom to system designers at the same time. Each component can be located in the optimal location.

There are a number of benefits to system disaggregation and the creation of an adaptive infrastructure. One such benefit: each component blade can be optimized in terms of function, cost, and power. Each component will likely be less complex, as each only has to perform a single function.

The lifecycle benefits are tremendous with the disaggregated system, extending the useful life almost indefinitely. Starting with a single chassis, only the components required need to be populated. As the system needs grow, additional components can be added to scale the performance. Once a single chassis is full, a second chassis can be added, where it can be populated as required. In a simple two chassis example, once both are filled with the required components, the system is not dead yet.

Processor blades could be replaced with the next-generation blades without having any effect on the other components. Similarly, storage blades could be updated to higher density devices, and I/O blades could be swapped out for the next-generation devices. A parallel benefit to system designers and end users would be vendor independence. The purchasing decision for CPUs, storage, and other I/O devices can be de-coupled.

Reconfiguration options
With a disaggregated system, connecting the components via a switch fabric has many useful features. One such feature is that the adaptive system can be configured and reconfigured over time. The management system could assign a number of processor blades, a number of storage blades, and a number of I/O blades to function as System 1 performing some specific application. It could then assign a different set of each of the components to function as System 2 performing a separate function. An entirely separate set of components could be spares for both systems. For example, in a blade server, a group of components could be configured to run a database and another group of components to run email. The management system could also re-configure the system, so that at midnight, a subset of components, both active and standby, could perform a backup of the entire system.

New embedded distributed processing systems that can benefit from system disaggregation are put in mission critical roles. These roles, as in the case of a radar system, could literally mean life or death, while other non life-threatening applications could mean profit or loss. These, often big ticket items, must be highly reliable and support ever-increasing levels of availability. The trend is essentially becoming zero downtime with continuous operation and the system degrading gracefully with a failure. To achieve this, high-availability (HA) can not be an afterthought in the system design nor in the interconnect architecture chosen.

Switched interconnect architectures like StarFabric/PCI Express AS were designed from the onset with HA in mind. While HA in general is a multi-layer issue, these interconnect technologies have a number of hardware-based HA features and additional primitives for the upper layers to use.

The use of redundant links with hardware recover is one of the essential HA silicon-based features. Designers can architect a system with parallel redundant fabrics, and each end node would have an interface to both the primary and the secondary fabric. In the event that the primary link fails, the silicon would automatically switch to the backup path.

To accomplish this, a CRC is utilized. At each node in the fabric, the CRC is checked and regenerated for transmission to the next node. In the event where the CRC is invalid, the sending node is notified of the failure. Each packet that failed is then retransmitted. If the failure continues after the link has attempted resynchronization, the fabric will automatically notify all the nodes in the system that the particular link has failed.

StarFabric's 2.5 Gbit/second links are comprised of four 622 Mbit/sec differential pairs. To support graceful degradation, StarFabric can continue to run on "fragile" links, which occur when one or more of the differential pairs are broken in some fashion. The links will in fact operate all the way down to a single operational pair.

The ability to support multiple types and classes of traffic are critical features of the required interconnect technology. Both StarFabric and PCI Express AS are protocol-agnostic architectures. AS takes StarFabric's architecture to a new level of multi-protocol support. Essentially, AS is an encapsulation architecture. Figure 2 shows how a StarFabric packet would be encapsulated in to AS. Each protocol is given a Protocol Encapsulation Interface (PEI) number. The first eleven (PEI 0 - 10) are pre-assigned to predominantly fabric management functions. Then there are 211 PEI assigned to standard protocols and 11 that are user-defined.

Additionally, both StarFabric and AS support multiple classes of service. StarFabric defines eight types of traffic, provisioning, high-priority isochronous, isochronous, high-priority asynchronous, asynchronous, multicast, address routed and special. AS defines eight classes of services as well. These traffic classes (TC) are then mapped to virtual channels. AS provides an additional degree of freedom where each TC can be mapped to a specific virtual channel (VC) at run time. For example, if a device only supported 4 VCs, the designer can map the 8 TCs in to the four VCs in a manner that fine-tunes the system traffic.

The capability of being protocol-agnostic and having multiple classes of services provides a number of benefits to disaggregated embedded distributed processing systems. Only a single interconnect fabric is required, greatly simplifying the system. This fabric can then be made redundant for non-stop operations.

Whether the projects emphasis is on performance or efficiency, designers must re-use existing hardware and software investments wherever possible. StarFabric Bridges, such as the SG2010 PCI-to-StarFabric Bridge, have two modes of operation: "bridge mode" and "gateway mode." The first mode is the familiar transparent bridge function, where the second mode is a fabric-native gateway function. When operating in bridge mode, the SG2010 looks like a standard PCI-to-PCI bridge. All existing legacy software, including operating systems, BIOSs, and drivers can be utilized without any modification. This mode of operation is also called address routing. Here, a single flat global address space is supported. Each device within the system is visible and can be accessed by its own unique address range.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form