Linux (Multi Node)

M-Star CFD now supports NCCL technology as of version 3.9.54, which greatly improves the reliability and compatibility of the solver when running in multi-node environments. Installation requires the use of NVidia HPC-X and nv_peer_mem.

Setup

Install the latest version of NVidia HPC-X and NVidia CUDA driver: See NVidia HPC-X documentation.

Software Prerequisites

  • CUDA version 11.1+

  • NVidia HPC-X 2.13.1+ (See NVidia HPC-X documentation)

  • nv_peer_mem (This is part of the NVidia HPC-X installation)

Operating systems supported

  • Redhat compatible 7+

  • Ubuntu compatible 20+

Hardware Prerequisites

See NVidia HPC-X documentation

  • Full GPU peer-to-peer within each node

  • Node-to-node communication requires NVSWITCH/NVLINK along with a compatible HCA for optimial GPU communication.

Notional Hardware Topology Diagram

../../../_images/linux-multi-node.png

Note that the HCA connects directly to the NVSWITCH so that data transfer occurs directly with GPU instead of transmiting data to CPU first.

Accessing the Software

To access M-Star CFD:

  1. Go to M-Star Downloads.

  2. Choose the solver-only package that indicates the CUDA version you have. For example: mstarcfd-solver-3.9.54-oracle7-cuda12.tar.gz.

Important

Package file names that are prefixed with mstarcfd-solver and are version 3.9.54+ are NCCL enabled and always preferred for multi-node environments.

Installation

Instructions below use “3.9.54” as the example version. Replace this with the actual version you downloaded.

  1. Extract the files from the tarball.

mkdir -p /opt/mstar/3.9.54
cd /opt/mstar/3.9.54
tar -xzf /tmp/mstarcfd-solver-3.9.54-oracle7-cuda12.tar.gz
  1. Create an environment file or module file.

  • If using environment files, paste below text into /opt/mstar/3.9.54/mstar.sh

export PATH=/opt/mstar/3.9.54/bin:$PATH
export LD_LIBRARY_PATH=/opt/mstar/3.9.54/lib:$LD_LIBRARY_PATH
  • If using module files, paste below text into a new module file /PATH-TO-MODULFILES/mstar/3.9.54.lua

#%Module1.0###############
##
## M-Star CFD module
##

proc ModulesHelp { } {
puts stderr "This module adds M-Star CFD 3.9.54 to your path"
}

module-whatis "This module adds M-Star CFD 3.9.54 to your path\n"

module load hpcx

set             basedir            /opt/mstar/3.9.54
prepend-path    PATH               $basedir/bin
prepend-path    LD_LIBRARY_PATH    $basedir/lib

System Verification

For additional information on validating your NVidia HPC-X installation, see NVidia HPC-X documentation.

Check NVIDIA driver:

# CUDA version and GPU name should be displayed
nvidia-smi

Check the MLNX_OFED Infiniband:

# run as root
hca_self_test.ofed

Check NV PEER MEM kernel module:

# verify that this indicates status is good
service nv_peer_mem status

# verify this shows module is loaded
lsmod | grep nv_peer_mem

See ClusterKit documentation. This is a utility provided with HPC-X to do more thorough HPC validation and testing. We encourage you to work through these tests.

Running M-Star

Submit a job to your HPC that gets two nodes with four GPUs each. Be sure to adjust the environment loading for your specific system. The snippet below shows how one might execute M-Star in a job submission script:

# load HPC-X environment
module use $HPCX_HOME/modulefiles
module load hpcx

# load mstar environment
source /path/to/mstar.sh

# mpirun        Invoke openmpi mpirun
# -np 8         Running on a total of 8 GPUs
# --gpu-auto    automatically selects GPUs on each node
mpirun -x PATH -x LD_LIBRARY_PATH -np 8 mstar-cfd-mgpu -i input.xml -o out --gpu-auto

Troubleshooting Pre and Post startup

See Troubleshooting M-Star Pre/Post start up for additional information