Understanding GPU peer access

A Graphics Processor Unit (GPU) has peer access with another GPU when the two GPUs can access each others memory without going through a shared pool of system memory on the Central Processing Unit (CPU). This typically means that the GPUs have a direct hardware connection with one another using NVLink or NVLink Switch.

The base system

In the example system below we have 4 GPUs connected to a motherboard via the PCI Bus. The black arrows indicate hardware connection where data can flow.

System diagram

If GPU 0 and GPU 1 need to communicate data there are a few ways this can be accomplished. The exact method used at runtime will depend on your system configuration.

When you do not have peer access, the data would first be copied from GPU 0 to the system memory, and then copied from the system memory to GPU 1. This is shown in the below diagram with the 2 red arrows indicating each data copy operation. This requires 2 copies to be made. Also note that the data is traversing a relatively slow PCIe bus and memory bus. Only supported when you have CUDA-aware MPI. This is not supported on Microsoft Windows.

System diagram

If your system provides peer access over PCIe, then the data would only be copied once. Note that the data still flows over a relatively slow PCIe interface. The diagram below shows this data flow. This system does have peer-access. Peer access over PCIe is not guarranteed and is system dependent. This is not supported on Microsoft Windows.

System diagram

How do I determine if my system has peer access?

  • If you are using the Solver GUI, peer access is queried and reported in the log window at program startup.

  • Refer to Platform Diagnostics

Notes for Micrsoft Windows users

  • For multi-GPU systems, you must set your GPUs to use TCC mode. This is special mode that puts the GPU in a compute-only setting. See Set GPUs to TCC Mode

  • NVLink is required for peer access

  • CUDA-aware MPI does not exist

  • Dedicated GPU required for display purposes must be set to WDDM mode. This GPU cannot be used in multi-GPU runs.

Notes for Linux users

  • More flexibility is available if you have CUDA-aware OpenMPI

  • Display processes can share the same GPUs used for compute, even in multi-GPU configurations (no TCC/WDDM limitation as on Windows)

Further reading