Understanding GPU peer access¶
A Graphics Processor Unit (GPU) has peer access with another GPU when the two GPUs can access each others memory without going through a shared pool of system memory on the Central Processing Unit (CPU). This typically means that the GPUs have a direct hardware connection with one another using NVLink or NVLink Switch.
The base system¶
In the example system below we have 4 GPUs connected to a motherboard via the PCI Bus. The black arrows indicate hardware connection where data can flow.
If GPU 0 and GPU 1 need to communicate data there are a few ways this can be accomplished. The exact method used at runtime will depend on your system configuration.
When you do not have peer access, the data would first be copied from GPU 0 to the system memory, and then copied from the system memory to GPU 1. This is shown in the below diagram with the 2 red arrows indicating each data copy operation. This requires 2 copies to be made. Also note that the data is traversing a relatively slow PCIe bus and memory bus. Only supported when you have CUDA-aware MPI. This is not supported on Microsoft Windows.
If your system provides peer access over PCIe, then the data would only be copied once. Note that the data still flows over a relatively slow PCIe interface. The diagram below shows this data flow. This system does have peer-access. Peer access over PCIe is not guarranteed and is system dependent. This is not supported on Microsoft Windows.
Adding NVLink¶
What if we were able to directly connect GPUs to one another, completely separate from the PCIe, chipset, and system memory? Well, good news, that is exactly what NVLink and NVLink Switch does! This hardware guarantees peer access and provides a dedicated high speed data pipe between the GPUs. When building new systems that need to execute on multiple GPUs, we always recommend using this hardware. In the below diagram, data is copied from GPU 0 to GPU 1 over the NVLink bridge.
There’s just one problem, these NVLink PCIe Bridges are only able to connect 2 GPUs. What if we want to run a simulation with all 4 GPUS?
If we have a system with 2 bridges connecting these GPUs, note that GPU 0 and GPU 2 are not connected via NVLink. You could end up without peer access between those GPUs.
In example below, the system does have peer access, but those data copies from GPU 0 to GPU 2 will be relatively slow. This is not supported on Microsoft Windows.
No peer access between GPU 0 and 2! Only supported when you have CUDA-aware MPI. This is not supported on Microsoft Windows.
Adding NVLink Switch (NVSwitch)¶
To connect more than 2 GPUs together, you need to design your system with NVLink Switch hardware. Note that PCIe based GPUs can no longer be used in these systems, and you need to use the SXM based GPUs. These GPUs have a special interface that will not plug into PCIe.
Peer access for all!
NVLink Switch also allows connecting high speed network adapters in order to build multi-node system.
How do I determine if my system has peer access?¶
If you are using the Solver GUI, peer access is queried and reported in the log window at program startup.
Refer to Platform Diagnostics
Notes for Micrsoft Windows users¶
For multi-GPU systems, you must set your GPUs to use TCC mode. This is special mode that puts the GPU in a compute-only setting. See Set GPUs to TCC Mode
NVLink is required for peer access
CUDA-aware MPI does not exist
Dedicated GPU required for display purposes must be set to WDDM mode. This GPU cannot be used in multi-GPU runs.
Notes for Linux users¶
More flexibility is available if you have CUDA-aware OpenMPI
Display processes can share the same GPUs used for compute, even in multi-GPU configurations (no TCC/WDDM limitation as on Windows)
Further reading¶
NVIDIA NVLink Bridge: https://www.nvidia.com/en-us/design-visualization/nvlink-bridges/
NVIDIA NVLink Switch: https://www.nvidia.com/en-us/data-center/nvlink/
NVIDIA NVLink Switch generations: https://developer.nvidia.com/blog/upgrading-multi-gpu-interconnectivity-with-the-third-generation-nvidia-nvswitch/