News & Analysis
Intelligent platform management interface gives network builders control
Pete Rendek, Director of Product Marketing, ComStruct Media Solutions, Dave Halliday, Telecom Architect, Active within the PICMG Committee, Motorola Computer Group, Tempe, Ariz.
7/15/2002 8:48 AM EDT
Sometimes you have to give up some control to gain control. Sound contradictory? It's really not. For example, initially, CompactPCI systems adhering to the PICMG 2.0 (PCI Industrial Computer Manufacturing Group) standard typically utilized the PCI bus to control and manage card slots.
For a high reliability system, this is not an ideal approach, as control and management over the PCI bus relies on a large number of signals being operational. Even more, the failure of any number of components can cause a system to be rendered useless by corruption of the shared PCI bus.
The solution? A simpler serial connection between card slots and key peripherals and components. The PICMG 2.9 working group was formed to specify and standardize a method of chassis and inter chassis management that allows scalability and improves system robustness to a point where the system designer can claim five or six nines reliability.
The PICMG 2.9 specification is based on the Intelligent Platform Management Interface (IPMI). Basically, the IPMI is comprised of three specifications: IPMI specification, which describes the interface between the system management software that may be running locally or remote to the platform; Intelligent Platform Management Bus specifications (IPMB), which describes the internal management bus; and Intelligent Chassis Management Bus (ICMB), which defines an external management bus for connecting additional IPMI-based systems to the platform. IPMI by design calls for a much simplified and separate (or out of band) management plane based on a two-wire serial interface.
The Intelligent Peripheral Management Interface philosophy is one of autonomous monitoring and recovery based on events. IPMI is provided as a separate control plane that does not depend on the main application processors or operating system. It describes both hardware and software processes; the hardware layer being based on the I2C protocol, the management software being based on an event driven mechanism. Essentially, this scheme relies on an event being generated from subscribed system resources when its state changes such as a hot swap event generated by a payload card when being inserted or extracted.
IMPI allows for the two basic functions in computing system: monitoring and event response. While PICMG 2.9 covers the application of IMPI in embedded computing platforms, IMPI also finds itself in newer desktop computers in the form of the underlying I2C bus. It is the path that returns fan speeds and CPU temperatures to the operating system in the desktop.
A typical IPMI system architecture might include both intelligent and non-intelligent payload cards. These may be organized in a redundant manner -N+1 or 2N, multiple redundant power supplies, and a chassis alarm display panel to visually indicate alarm conditions. Essentially, the central alarm management controller (AMC) is the in-chassis resource responsible for the monitoring and control of IPM devices over the I2C bus; these too may be organized as a redundant pair.
The IPMI functionality allows the computing system to monitor and begin to take action. If a fault of some type has caused a card to fail, for example, the IPMI can be used to send a restart command to the card. If this fails, IMPI will allow power cycling the card. As the card begins to start up, more sophisticated control, such as application restart, will be handled through the operating system, such as the Advanced High Availability (AHA) Linux offering.
As the card start-up process continues, information needed to initialize cards and applications can be routed through an Ethernet interface. If this process is successful, the card has been restored to operation without restarting the entire platform. If the card continues to cause trouble, it can be shut down, and maintenance personnel can be alerted. Alarm indication may be either a visual indication by means of the alarm display panel or via system management software to a remote monitoring station.
What does that mean to me?
IPMI allows an open standards approach to system management meaning that boards from different vendors may be integrated into systems and managed in a seamless and efficient way. This open approach gives a number of distinct advantages to the system engineer including flexible and interoperable access to platform information.
Since IPMI v1.0 was released in 1998 well over 70 companies have adopted the IPMI approach to system management across various platforms including CompactPCI. This uptake is reflected in newer CompactPCI chassis and board level equipment, which are building in support for PICMG2.9 based around the IPMI v1.5 specification. The result to the IT operative or system engineer is that more system information on the platforms condition is available; this enables robust monitoring and proactive fault detection to take place which in turn equates to increased system reliability.
The specification allows several ways to physically implement IPMI. The simplest is to run a single bus to each slot. However, a failure along any part of the bus could inhibit control of potentially all of the cards in the chassis. To eliminate single points of failure, the messaging paths in the chassis need to provide redundancy.
One way to accomplish this is to have a dual bus architecture. In this sort of architecture, messages are simultaneously passed on each bus. Dual bus segments provide path redundancy, but designers must insure that there are no common failure points on any attached devices that would cause loss of both bus segments due to a single failure.
An alternative physical implementation for high reliability uses a star architecture, where the IPMI bus consists of separate multiple lines running to each slot. The star architecture insures that any card level failure, or any failure within a specific bus segment, will only impact the operation of a single card. To insure the entire control path is resilient for failovers, a redundant management card or alarm card is required.
All of these IPMI architectural approaches are covered in the current PICMG 2.9 specification. In addition, current PICMG committee discussions will lead to enhancements that will support the PICMG 2.x class of platforms and include the capabilities needed to address next-generation PICMG 3.0 platforms. This will the allow IPMI physical and software implementations to cover a wide range of current platforms, and to have the extensibility to grow into much larger platforms in the future.



