Understanding GPU Peer Access¶

A Graphics Processor Unit (GPU) has peer access with another GPU when the two GPUs can access each other’s memory without going through a shared pool of system memory on the Central Processing Unit (CPU). This typically means that the GPUs have a direct hardware connection using NVLink or NVLink Switch.

The Base System¶

In the example system below, we have four GPUs connected to a motherboard via the PCI Bus. The black arrows indicate hardware connection where data can flow.

If GPU 0 and GPU 1 need to communicate data, there are a few ways this can be accomplished. The exact method used at runtime will depend on your system configuration.

When you do not have peer access, the data would first be copied from GPU 0 to the system memory, then copied from the system memory to GPU 1. This is shown in the diagram below with the two red arrows indicating each data copy operation. This requires two copies to be made. Also note that the data are traversing a relatively slow PCIe bus and memory bus. This is only supported when you have CUDA-aware MPI. This is not supported on Microsoft Windows.

If your system provides peer access over PCIe, then the data would only be copied once. Note that the data still flow over a relatively slow PCIe interface. The diagram below shows this data flow. This system does have peer access. Peer access over PCIe is not guarranteed and is system-dependent. This is not supported on Microsoft Windows.

Adding NVLink¶

What if we were able to directly connect GPUs to one another, completely separate from the PCIe, chipset, and system memory? Good news, that is exactly what NVLink and NVLink Switch does. This hardware guarantees peer access and provides a dedicated high speed data pipeline between the GPUs. We always recommend using this hardware when building new systems with multiple GPUs. In the below diagram, data are copied from GPU 0 to GPU 1 over the NVLink bridge.

There’s just one problem, these NVLink PCIe Bridges are only able to connect two GPUs. What if we want to run a simulation with all four GPUs?

If we have a system with two bridges connecting these GPUs, GPU 0 and GPU 2 are not connected via NVLink. You could end up without peer access between those GPUs.

In the example below, the system does have peer access, but the data copies from GPU 0 to GPU 2 will be relatively slow. This is not supported on Microsoft Windows.

No peer access between GPU 0 and 2! It is only supported when you have CUDA-aware MPI. Again, this is not supported on Microsoft Windows.

Adding NVLink Switch (NVSwitch)¶

To connect more than two GPUs together, you need to design your system with NVLink Switch hardware. Note that PCIe-based GPUs can no longer be used in these systems, and you need to use the SXM-based GPUs. These GPUs have a special interface that will not plug into PCIe.

As a result, peer access is available for all!

NVLink Switch also allows connecting high-speed network adapters in order to build multi-node systems.

How do I determine if my system has peer access?¶

If you are using the Solver GUI, peer access is queried and reported in the log window at program startup.
Refer to Platform Diagnostics.

Notes for Micrsoft Windows Users¶

For multi-GPU systems, you must set your GPUs to use TCC mode. This is a special mode that puts the GPU in a compute-only setting. See Set GPUs to TCC Mode.
NVLink is required for peer access.
CUDA-aware MPI does not exist.
A dedicated GPU required for display purposes must be set to WDDM mode. This GPU cannot be used in multi-GPU runs.

Notes for Linux Users¶

More flexibility is available if you have CUDA-aware OpenMPI.
Display processes can share the same GPUs used for compute, even in multi-GPU configurations (no TCC/WDDM limitation, as on Windows).