News & Analysis
Reliable Real-Time Performance in Windows NT
Alex Doumani
12/1/1997 12:00 AM EST
A number of significant business and technical reasons are helping drive Windows NT into the Telecom and Datacom equipment markets. Within the last year there has been a discernible shift in the operating systems focus for many of the major players in these markets. For the first time this top-end of the Microsoft operating system offering is being very seriously considered, and in some cases already adopted, for the next generation of products. Though Windows NT has many features that make it an appropriate operating system for these applications, it also has some limitations that need to be overcome in order to make Windows NT a ubiquitous operating system in this area. These limitations include the lack of determinism and concerns over fault-tolerance. This article addresses these limitations and proposes alternative solutions to them.
Until recently, these applications required higher performance and reliability than was available in standard desktop operating systems and hardware. Windows NT, however, has been designed from the ground up to be a responsive, reliable, general purpose operating system with features such as a pre-emptive, priority-based multitasking kernel, and built-in protection and security mechanisms. Thus, Windows NT running on PC-compatible hardware is being utilized in more and more non-desktop applications that in the past required specialized or proprietary operating systems and hardware.
Another important requirement for real-time software is robustness and reliability. While a programming error in a word processing program can lower the productivity of the user, an error in an real-time control program can mean costly downtimes, damage to expensive equipment or even the loss of human life. Tools and protection mechanisms must be made available to the developer in order to minimize the occurrence of typical programming errors such as stray pointers, memory leaks, uninitialized variables, as well as errors in program logic. In the event of faulty code and/or a software crash, protection must be provided to minimize the impact of the crash on the critical processes being controlled.
As we have already discussed, determinism (the ability to meet deadlines predictably) is an important requirement for a real-time system, and a deterministic system can only be developed if events can be reliably predicted. This can only be achieved by giving the developer extensive control of the relative priorities of all operations and events. Windows NT restricts the developer's ability to control and predict operations in a number of areas:
Because the Windows NT priority spectrum places all interrupts at a higher level than normal thread execution, user-level threads are subject to being pre-empted by any interrupt source, regardless of its priority. This means that even the lowly mouse can generate an interrupt that pre-empts what to the developer is a high priority operation. Only kernel-level threads are allowed to raise or lower the interrupt request level (IRQL) to mask or unmask interrupts. In many real-time operating systems, thread priorities are interleaved with interrupt priorities, giving the developer total control, at the application level, of the relationship between interrupts and threads.
Windows NT provides a Delayed Procedure Call (DPC) mechanism to increase the responsiveness of the system to interrupts. A correctly designed interrupt service routine (ISR) minimizes interrupt latency by performing only critical processing in the ISR itself and queuing a DPC for later execution. These DPCs are placed into a single FIFO, with no provisions for the priority of the operation. This means that a low priority DPC will execute first, regardless of the priority of DPCs queued behind it. Although you can cheat and place a DPC at the head of the line, this doesn't solve the problem because you may inadvertently be deferring the execution of an even higher priority operation that is already in the FIFO. Additionally, since DPCs are lower on the priority spectrum than all other types of interrupts, DPCs will not be able to execute in the event of active interruptseven low priority device interrupts.
In Windows NT, multiple requests for a synchronization object (such as a semaphore or a mutex) are queued in FIFO order without regard for the priority of the requesting thread. Thus a higher priority thread may have to wait for a lower priority thread to complete its operation before proceeding. This not only affects determinism, it may also lead to priority inversion.
Solving the classic problem of priority inversion requires the ability to inherit the priority of another thread, which is not available in Windows NT. The problem can be described as follows: there must be at least three threads, A, B, and C, with A being the highest priority and C the lowest. Let's say that C has previously locked a resource, A is now waiting on that resource, but C is unable to complete its job because it is being pre-empted by B. Priority inversion has occurred because A has been effectively held off by a lower priority thread, B. Temporarily boosting C's priority to A's priority would remedy the problem.
- Restrict Windows NT to soft real-time applications. If
your application can handle occasional "hiccups" or delays,
you may be able to use standard Windows NT. Although the
actual window of predictability is up for debate, your
application should be able to handle timing variations in the
1-20 millisecond range with the realization that there are no
guarantees. The cost of missing deadlines should be
relatively low, and not result in a system failure or
unacceptable performance degradation.
- Create a finely tuned, highly constrained environment. By
paying careful attention to the system load, interaction with
other processes (via the network or locally), and in effect
"closing" the system, you can limit the amount of spurious or
unpredictable behavior. You may also need to write most of
your application in Windows NT's kernel mode, with the
majority of your code in device drivers. A Windows NT expert
who knows what's going on under the hood and knows where the
hidden dangers lie may be able to construct a finely tuned
system that meets your requirements, for now. However,
developing any substantial applications with such
restrictions neutralizes many of the benefits of an OS such
as Windows NT and will result in code that is difficult to
support and maintain.
- Provide a Win32 API wrapper around an RTOS. This approach
does not leverage Windows NT at all, but rather provides an
alternative API to an existing real-time operating system.
Standard Windows NT applications cannot be utilized with this
approach, limiting your options for future expandability.
Also, since the target system is not Windows NT, you are
forced to use specialized tools for compilation and
debugging.
- Couple a real-time operating system with Windows NT. In
most cases, this means running Windows NT and a real-time
operating system (RTOS) on separate systems. For this
approach, the Windows NT system is used only for the operator
interface and other non-real-time functions. The dedicated
RTOS system is used for the actual real-time control. This
scenario requires you to learn and maintain two separate
development environments and also increases the cost of the
total system by requiring at least two computers. Running the
two operating systems on the same system eliminates the extra
hardware costs, but still requires two separate development
environments.
- Modify the Windows NT kernel. Because Microsoft does not
license the source code to the Windows NT kernel to third
parties, this is an option that is only available to
Microsoft. Because of their focus on the broader OS market,
indications are that these types of modifications will not be
coming from them.
- Modify the Hardware Abstraction Layer (HAL). The HAL is a
layer of code between the Windows NT executive and the
hardware platform that hides hardware-dependent details such
as I/O interfaces and interrupt controllers. Microsoft
routinely grants HAL source code licenses to hardware vendors
who need to do special adaptations for their hardware to run
Windows NT. Microsoft has also granted HAL source code
licenses to various companies, including RadiSys, for
products that attempt to extend Windows NT with real-time
capabilities. Performing extensive modifications to the HAL,
such as manipulating the clock or rewriting the way in which
interrupts are processed, represents an unprecedented,
unproven use of the HAL, creates a non-standard environment,
and may pose serious maintainability challenges. Even more
importantly, because the "non-real-timeness" of Windows NT is
rooted in basic Windows NT kernel mechanisms, modifying the
HAL can only result in slightly improved soft real-time
performance. As long as the standard Windows NT executive is
used to schedule and process threads and interrupts, hard
real-time determinism is not possible.
- Complement the standard Windows NT kernel with a real-time kernel. Any solution that claims to bring hard real-time performance to Windows NT must provide an alternate kernel to handle real-time task scheduling and execution, running in conjunction with the standard Windows NT kernel. In fact, the three major solutions to real-time Windows NT available on the market today have taken this approach. Introducing such a kernel into the Windows NT environment, however, may actually decrease system reliability unless the kernel is at least as reliable as the Windows NT kernel. It is critical, therefore, that the real-time kernel be proven in real-life applications, with extensive testing that can only come through repeated usage over time. Other important considerations are memory and address space protection, as well as the ability to survive catastrophic system failures (Windows NT "blue screen" crashes). Finally, clean integration with the Windows NT environment, for example leveraging the same development tools and APIs where possible, is critical for the general usability of the solution.
At first glance, putting a real-time kernel inside a Windows NT interrupt service routine (ISR) or device driver is the most straight-forward and easiest to implement approach. However, with such an approach the user is forced to develop real-time applications in the Windows NT kernel mode (as opposed to user mode, the "normal" development mode). In the Windows NT kernel mode, code has privileged access to the entire memory space, including the Windows NT kernel and other device drivers, with no address isolation or memory protection offered. Thus, a real-time thread could easily overwrite the address space of another process, including other real-time processes. Because these types of programming errors are typically extremely difficult to detect and result in spurious but critical failures, achieving reliable operation often requires extensive testing and debugging, with many errors not detected until after a system has been deployed in the field. Writing a complex, multithreaded real-time application in this privileged mode is contrary to the programming model espoused by Windows NT.
Equally serious is the challenge of maintaining reliable operation of the real-time kernel in the event of a Windows NT blue screen crash. By definition, when Windows NT crashes something catastrophic has occurred such that Windows NT itself cannot recover. The integrity of all of Windows NT is in question, including interrupt handling, the operation of all device drivers and HAL services. Continued operation of a real-time kernel that is encapsulated within the Windows NT kernel space will be unreliable at best, and will likely lead to the crash of the real-time processes.
Through a unique combination of proven real-time technology and seamless integration with Windows NT, INtime makes it possible to extend Windows NT applications into the real-time world. INtime applications consist of non-real-time Windows NT processes and threads, and real-time processes and threads. Real-time processes typically handle time-critical I/O and control, while non real-time processes handle the human interface, network communication and data storage.
INtime consists of:
- Real-Time Kernel
The real-time kernel, based on the proven iRMX operating system kernel, provides deterministic scheduling and execution of real-time threads. Real-time interrupts and active INtime threads immediately pre-empt the execution of any Windows NT threads and disable all non-real-time interrupts. - RT API
Real-time threads access the capabilities of the real-time kernel via a Win32*-extension real-time application programming interface (RT API). To develop real-time applications, you use standard Windows NT development tools, including Microsoft Visual C/C++ Developer Studio, "Wizard" extensions (for real-time processes), and a Windows NT-based real-time dynamic debugger. - NTX Driver
The NTX driver is a Windows NT device driver that provides centralized support for the OSEM. The NTX driver facilitates communications between real-time kernel threads and Windows NT threads. - NTX API
The NTX API extends the Win32 API to enable non-real-time threads to communicate and synchronize with real-time threads. Mechanisms such as semaphores, mailboxes and shared memory are provided. - Patented OS encapsulation mechanism (OSEM)
The OSEM manages the simultaneous operation and integrity of the Windows NT kernel and the real-time kernel, and provides memory protection and address isolation between processes for added reliability and robustness. - Modified Windows NT Hardware Abstraction Layer
(HAL)
INtime includes a special version of the Windows NT HAL that improves the overall reliability and robustness of the system.
The INtime kernel supports 256 priority levels with round-robin scheduling supported within each level. Application threads and interrupt handlers share the same priority structure, allowing priority inter-mixing between handlers and application threads. The INtime kernel also supports a full complement of inter-process communication and synchronization mechanisms including data and object mailboxes, counting semaphores, access-controlled regions and timer management.
INtime provides a set of real-time application and device driver "wizards," integrated with the Microsoft Developer Studio, for faster development of real-time applications and device drivers. The wizards guide the developer through the design decisions required when developing a real-time application and generate the corresponding code fragments.
The NT Extensions (NTX) API enables non-real-time Win32 threads to communicate and synchronize with real-time threads. Win32 threads that utilize the NTX API may synchronize with a real-time thread (in other words, a thread that uses the RT API) through real-time semaphores. Communication may occur through real-time mailboxes as well as via a shared memory interface.
In a standard Windows NT configuration, the bulk of the OS runs in the confines of a single hardware task. Additional hardware tasks are normally only defined to handle catastrophic software induced failures such as stack faults and double faults where a safe, known environment is required from which to handle the failure. INtime transparently creates a hardware task for the real-time kernel and manages the switching and execution of the standard Windows NT hardware task and the real-time hardware task. This approach guarantees the integrity of both the Windows NT kernel and the real-time kernel, and enables the successful operation of real-time processes even in the event of a total Windows NT failure. It is this mechanism that adds a new level of fault tolerance to Windows NT. By putting critical processes under the control of INtime, these processes are guaranteed to continue operation through any failure of the Windows applications in the system or even a failure of Windows NT itself.
The OSEM encapsulates the entire Windows NT priority spectrum in the lowest INtime real-time priority level. This ensures that real-time threads and interrupts will always have priority over Windows NT threads and that the end system will operate deterministically, regardless of Windows NT activity.
Because the OSEM provides a separate, protected environment for the real-time processes, INtime users are relieved of the burden of writing code in the Windows NT kernel mode space. The result is improved reliability and robustness, as well as simplified programming and debugging. For each real-time process created on top of the INtime kernel, a separate 32-bit protected memory segment is automatically created. This segment is separate and distinct from those used by Windows NT and provides address isolation and protection not only between real-time processes, but between real-time processes and non-real-time Windows NT code. This memory protection is provided automatically to the INtime developer, using standard Windows NT compilers (such as Microsoft Visual C/C++).
Finally, the OSEM provides a clean, well defined interface that minimizes interaction with Windows NT to a few key areas, resulting in improved product reliability and simplifying compatibility with future Windows NT releases.
- Trap attempts to modify the real time clock and time-of-day clock so that the real-time kernel can control the system time base and synchronization of the time-of-day clock
- Trap attempts to assign Windows NT interrupt handlers to interrupts reserved by the user for INtime real-time use
- Ensure that interrupts reserved for INtime real-time kernel use are never masked.
By utilizing proven real-time technology and seamless integration with Windows NT, RadiSys has made reliable real-time Windows NT a reality. Corporations will now be able to utilize industry standard Windows NT across the entire organization, from desktop and business applications to high-end embedded applications such as telecommunication and data communication equipment, and all the way down to the factory floor. These applications can take full advantage of Windows NT's standard user interface, network capabilities, development tools and off-the-shelf software and still deliver rock solid, reliable performance of real-time tasks.



