Design Article

I/O Virtualization (IOV) & its uses in the Network Infrastructure: Part 2

Nabil Damouny & Rolf Neugebauer

6/2/2009 5:35 PM EDT

Some Hypervisors, including Xen and VMware ESX Server, discussed briefly in Part 1 in this series of articles, allow the direct assignment of PCI devices to Guest VMs. This is a relatively small extension to the techniques needed to run device drivers inside the Management VM. Assigning PCI devices directly to Guest VMs eliminates the remaining overhead and added latencies of the MQ NIC IO virtualization approach.

Figure 3, below shows the common architecture for how hypervisors support PCI device assignment: The hypervisor provides mechanisms to directly access a PCI device's hardware resources and the Management VM needs to provides a way for Guest VMs to discover PCI devices assigned to them and their associated resources.

Device discovery by a Guest VM is typically achieved by providing a virtual PCI bus. The Management VM normally owns the physical PCI buses and enumerates all physical devices attached to them. If a PCI device is assigned to a Guest VM it is enumerated on a virtual PCI (vPCI) bus exported to the Guest VM.

This allows the guest to access the PCI configuration space of the device assigned to it. Importantly, all PCI configuration space accesses by a Guest VM are transferred to the Management VM which can, either pass them through to the device, intercept and emulate them, or discard them. This also allows the Management VM to enable or configure hardware resources required by the Guest VM to use the device.

Figure 3: PCI device assignment. Guest VMs can directly access hardware devices, eliminating all IO virtualization overheads.

There are three different types of hardware resources a Guest VM must have access to in order to run a device driver for a physical device: device memory, device IO ports, and device interrupts.

The first two, device memory and IO ports, are described in the device's PCI configuration space as Base Address Registers (BARs). In order for a Guest VM to access device memory, the Management VM instructs the Hypervisor that a given Guest VM is allowed to map the physical addresses at which the device memory is located into its virtual address space.

The hypervisor can use memory protection provided by the CPU MMU (Memory Management Unit) to enforce that a Guest VM only accesses the device memory belonging to the assigned device. Access to IO ports can be restricted in a similar way using the Task Segment Selector (TSS) on x86 processors.

Physical interrupts originating from a device need to be handled by the Hypervisor as interrupts are only delivered to the highest privileged software entity. Hypervisors then virtualizes the physical interrupts and deliver them to the Guest VMs.

In order to reduce interrupt latencies it is important that physical interrupts are delivered to the same CPU core that the destination Guest VM is using to handle the resulting virtual interrupt.

In the MQ section above we argued that descriptors need to be passed through the Management VM to prevent breach of VM isolation due to rogue DMA setups. This is not required for PCI device assignment since modern chipsets include IO MMUs, such as Intel's VT-d, which can be setup by the Hypervisor to allow a device to access only certain pages of host memory.

This is achieved by setting up a page table mapping in the IO MMU to map host memory into a device's DMA address space. On memory write and read requests from a PCI device to or from host memory, the chipsets selects a IO MMU page table based on the Requester ID used by the PCI device.

Thus, the Hypervisor sets up the IO MMU page tables for a device to map only the memory belonging to a Guest VM when the device is assigned to it. This prevents a Guest VM intentionally or accidentally accessing other VMs memory areas via a device's DMA engines.

Of all the IO virtualization options, direct PCI device assignment has the lowest overhead and the least added latencies. The Management VM is not involved in the data path; it just provides infrequent access to the device's PCI configuration space.

The Hypervisor itself is only involved in the virtualization of device interrupts, which can be achieved with relatively low overhead especially if physical interrupts are delivered to the same CPU cores the recipient Guest VM is executing on.

However, it is clearly infeasible to have a separate PCI device for every Guest VM in a system, even if multifunction devices were used. The PCI-SIG introduced SR-IOV to address this issue.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form