News & Analysis
Integrating the operation of heterogeneous systems
Robert Largren, Product Manager, OSE Systems, Inc., San Jose, Calif.
2/24/2003 7:35 AM EST
Today voice and data already share the high-bandwidth pipes into an enterprise, through frame relay protocols, voice over IP, firewalls for communications systems and automated call center systems. As data and voice networks continue to converge, there will be an increasing requirement to integrate the operation of heterogeneous systems.
However, Internet-centric communications convergence will only happen when the embedded networking and telecommunications infrastructure is tightly integrated with data center servers. Computers, networking devices and telephony equipment will need to seamlessly communicate, exchange information and provide a means of sharing resources between the platforms when required.
Data center servers and software are highly flexible, performing many different functions simultaneously with moderate reliability, whereas data and telecommunications devices are highly specific, optimizing a defined single set of specialized functions with very high reliability. With a clearly defined set of functions, the specialized communications equipment runs dedicated software that optimizes the performance of a limited number of tasks, providing high reliability and the capability for fault recovery.
Integration of these two types of computing is a complex software task, because the contradictory goals of each category of equipment must be reconciled. Applications such as call centers, monitoring and control of network and telecommunications devices, tracking and allocating bandwidth, and optimizing the use of alternative data and storage channels all require communications at a low level.
In order that the full potential of voice and data convergence is realized, engineers must enable reliable inter-process communication between the computer servers, routers and switches that make up the communications infrastructure. Achieving this communication between processes that are running on different processors, with different operating systems and perhaps even different programming languages is a difficult challenge.
Most systems today use software techniques such as TCP sockets or Common Object Request Broker Architecture (CORBA) to enable communications between embedded devices and servers, whereas communications within embedded communications devices use direct asynchronous message passing. Although either enterprise approach solves some of the problems both have issues and limitations that can result in a sub-optimal final system. TCP sockets use a software library that enables connections between an application and a network protocol. With a sockets library, a program can send and receive TCP/IP messages by opening a network socket and reading and writing data to and from the socket. This approach enables the programmer to concentrate on data transfers and lets the operating system transport messages across the network correctly.
However, enterprise application-level integration using sockets tends to be highly complex to code, requiring specialized programming skills, and must be implemented separately for every application using the system. But there can be subtle nuances in the sockets semantics and in the configuration of network connections that make it less portable between different platforms.
Also, using the low-level socket Application Programming Interfaces (API) to connect at the network protocol level requires detailed knowledge of the underlying network, including network addresses and packet structure. For these reasons, a direct message passing operating system with a simple API layered above the sockets API for application integration, and with a transparent abstraction of the underlying network is important to overcome these obstacles.
CORBA is an architecture that enables pieces of programs to communicate with one another regardless of what programming language they were written in or what operating system they're running on. The CORBA specification defines the Interface Definition Language (IDL) and the API that enable client/server object interaction within a specific implementation of an Object Request Broker (ORB).
It is a highly complex set of technologies. The API is complex and can be difficult to use correctly, and an ORB is required to broker objects between software components. Overall, the TCP/IP and CORBA alternatives lack the tight integration of operating system components, can be technically difficult to implement, and do not inherently provide fault tolerance to individual software components. The result is a complex mixture of software technologies that may not provide the required level of performance and reliability.
Wrapper power
With the drawbacks inherent in the use of both TCP sockets and CORBA, future communications systems will demand a simpler, more scalable and more reliable approach. The ideal solution would be to implement a message passing architecture that operates for designs incorporating multiple operating systems in a heterogeneous execution environment typical of today's mixed-traffic networks. This can be done by providing an embedded library that can be linked to application code on the enterprise machine. This library provides a wrapper through which a client application running on an application or Web server communicates transparently with other nodes in a distributed system, bringing the power of direct asynchronous message passing traditionally only found in some real-time embedded operating systems to the enterprise.
The wrapper code interfaces with a daemon running on the enterprise server, to provide a gateway between the enterprise server and the embedded device
This gateway approach can also increase the reliability of communications systems by enabling fault-tolerance capabilities for processes running on the enterprise server. If a process crashes and communications cannot continue, most solutions today simply force the sender of a message to wait for an acknowledgement, and if it is not received, the sender is left to trigger some sort of fault analysis and recovery after a failure. In cases where time might be critical, this approach is unlikely to deliver the required performance. In addition, this approach to fault tolerance typically has to be built into the design at an added cost.
If the operating system on the embedded devices offers built-in communications supervision to all processes, a gateway like this naturally extends this capability to the enterprise system.
The approach taken by this type of message-based Gateway, addresses the limitations of existing technologies by providing a transparent mechanism for communication among processes across operating systems. The implementation is independent of network addressing schemes or application programming interfaces on client operating systems, and enables developers to build complex systems at a higher conceptual level, improving reliability and time to market.
In effect, it enables heterogeneous distributed systems to have the same level of efficiency, reliability, and high availability as the real-time embedded operating system itself. Native applications use the simple Gateway Client Library to communicate directly, process to process, with a simple operating system process.
From the standpoint of the distributed system, this message based gateway enables the client process to behave as if it was an embedded operating system process: the "foreign processes" effectively become phantom OSE processes. The gateway client provides an API that consists of Meta calls, normal OSE-type system calls and non-blocking receive mechanism (wait for host events and OSE signals). The Meta calls allow the client to perform operations such as finding and connecting to Gateway servers. The system calls manage the communications between the enterprise workstation and the server, and the need for the processes on the workstation to wait for both OSE signals as well as native OS events is fulfilled by the non-blocking receive mechanism.
The supervisor can monitor all desired processes, including phantom processes on foreign nodes and is entirely managed by the RTOS running on the nodes. If any supervised process goes away, notification is sent and recovery can happen immediately, before someone tries to use the disabled process and fails. Catching problems when they occur, at the point of failure and before they precipitate across the system is a fundamental precept of fault tolerant operation.



