Hardware Guide¶
Updated March 2023
GPU technology is constantly changing, so it can be confusing to know what hardware to purchase. The purpose of this document is explain the current state of GPU technology and provide recommendations.
tl;dr¶
Here are some options to consider for the impatient. Many users will land on a variation of the “High End Workstation” which provides a great price-performance ratio without jumping to the more expensive server hardware. However, users with complex modeling needs and large models may need to consider server class hardware.
- Entry Level PC
1x NVidia GeForce 20-series or 30-series, 32–64GB System memory, 1TB Disk
- Middle-of-the-Road Workstation
1x NVidia RTX A5500, 64–96GB System memory, 2TB Disk
- High End Workstation
2x NVidia RTX A6000 with NVLINK, 256GB System memory, 2TB Disk
Note: Add another GPU dedicated for display when running Microsoft Windows.
- Entry Level Server
4x A100 SXM with full NVLINK/NVSWITCH
- High End Server
8x A100 SXM or H100 SXM with full NVLINK/NVSWITCH
- Ultra Mega High End Server
16x or more H100 SXM with full NVLINK/NVSWITCH
First Considerations¶
The first consideration in designing a computer to run M-Star should be what GPUs you want to use, you can then design the computer around those GPUs. To determine GPUs, take into consideration the average size problem you need to solve. Larger simulations with additional physics will require more GPU memory. Simple simulations with moderate to coarse fluid resolution will require less GPU memory.
While you can speed up M-Star by running across more GPUs, bear in mind that each GPU needs a sufficient amount of load in order to provide the best efficiency. For example, if you run a simulation with 1M nodes on 8x A100s, it would not run much faster than a single A100. This is due to a bottleneck in performance in the GPU-to-GPU communication when the simulation is too thinly spread across your GPUs. It is best to design the computing resource based on how much GPU memory you actually need. This behavior is further discussed in Scaling Performance.
Important
Design your computing resource based on the GPU memory requirement for typical M-Star models you will run.
How much GPU memory do I need?¶
The size of the simulation, in terms of lattice density and particle count, is limited by the local GPU RAM. As a first order approximation, 1 GB of GPU RAM can support 2–4 million grid points and 1 million particles. Adding scalar fields, custom variables may change this scaling.
Loosely speaking, most simulations contain 1–100 million lattice points and/or 1–10 million particles. These simulations can typically be performed on a single performance GPU, which typically contains 16–80 GB of RAM. Simulations with larger memory requirements may require a multi-GPU configuration.
GPU Spec Tables¶
Always reference the official datasheet provided by NVidia for official specfications. When multiple GPU variants are available, the one with more memory and/or cores is always listed.
Tables are grouped into three main types: Data Center, Workstation, and Consumer. Each table is sorted by TFLOPS in descending order.
- Name
Name of the GPU.
- TFlops
Theoretical Single Precision Teraflops of a single GPU based on the Boost clock frequency and CUDA cores. Typically referred to as FP 32 TFlops in NVidia data sheets and other sources.
- Memory
Amount of memory of a single GPU in gigabytes.
- NVLink N
The number of GPUs that may be connected to each other via NVLINK. A value of zero indicates NVlink is not supported. For more information, see GPU Topology.
- ECC
Error Correcting Code. It is ‘y’ if this feature is supported. ECC prevents data corruption in memory.
Note
Regarding NVLink:
Most PCIe-based GPUs allow for either zero or two GPU connections. For example, if NVLink N = 2, this means a single NVLink bridge may be used to connect two GPUs.
In contrast, SXM-based GPUs typically allow for many NVLink connections to be made via NVLINK/NVSWITCH hardware.
Data Center GPUs¶
This category contains the top performing GPUs. These are typically recommended for server class hardware in a data center and for solving the largest problems.
Name |
TFlops |
Memory (GB) |
NVLink N |
ECC |
---|---|---|---|---|
L4 |
30.3 |
24 |
0 |
n |
L40S |
91.6 |
48 |
0 |
y |
L40 |
90.5 |
48 |
0 |
y |
H100 SXM |
67.0 |
80 |
256 |
y |
H100 PCIe |
51.0 |
80 |
2 |
y |
A40 |
37.4 |
48 |
2 |
y |
A10 |
31.2 |
24 |
0 |
n |
A100 PCIe |
19.5 |
80 |
2 |
y |
A100 SXM |
19.5 |
80 |
16 |
y |
V100 SXM2 |
15.7 |
32 |
8 |
y |
V100 PCIe |
14.0 |
32 |
2 |
y |
A30 |
10.3 |
24 |
2 |
n |
T4 |
8.1 |
16 |
0 |
y |
Workstation GPUs¶
PCIe-based GPUs intended for workstations and servers. These are recommended for middle range memory capacity.
Name |
TFlops |
Memory (GB) |
NVLink N |
ECC |
---|---|---|---|---|
RTX 6000 Ada |
91.1 |
48 |
0 |
y |
RTX A6000 |
38.7 |
48 |
2 |
y |
RTX A5500 |
34.1 |
24 |
2 |
y |
RTX A5000 |
27.8 |
24 |
2 |
y |
RTX A4500 |
23.7 |
20 |
2 |
y |
RTX A4000 |
19.2 |
16 |
0 |
y |
Quadro GV100 |
14.8 |
32 |
2 |
y |
RTX A2000 |
8.0 |
12 |
0 |
y |
Consumer/Gaming GPUs¶
PCIe-based GPUs intended for gaming. These GPUs tend to be lower cost and lack features such as NVLINK or ECC memory. NVLink capability is currently limited to Linux only.
Name |
TFlops |
Memory (GB) |
NVLink N |
ECC |
---|---|---|---|---|
RTX 4090 |
82.6 |
24 |
0 |
n |
RTX 4080 |
48.8 |
16 |
0 |
n |
RTX 4070 Ti |
40.1 |
12 |
0 |
n |
RTX 3090 Ti |
40.0 |
24 |
2 |
n |
RTX 3090 |
35.7 |
24 |
2 |
n |
RTX 3080 Ti |
34.2 |
12 |
0 |
n |
RTX 3080 |
30.6 |
12 |
0 |
n |
RTX 3070 Ti |
21.7 |
8 |
0 |
n |
RTX 3070 |
20.4 |
8 |
0 |
n |
RTX 3060 Ti |
16.2 |
8 |
0 |
n |
RTX 2080 Ti |
14.3 |
11 |
2 |
n |
RTX 3060 |
12.8 |
12 |
0 |
n |
RTX 2080 Super |
11.2 |
8 |
2 |
n |
RTX 2080 |
10.6 |
8 |
2 |
n |
RTX 3050 |
9.1 |
8 |
0 |
n |
RTX 2070 Super |
8.8 |
8 |
2 |
n |
RTX 2070 |
7.9 |
8 |
0 |
n |
RTX 2060 Super |
7.2 |
8 |
0 |
n |
RTX 2060 |
7.2 |
12 |
0 |
n |
CPU¶
M-Star CFD is not a CPU-bound process. This selection is left to the user.
System Memory¶
We recommend 1.5–2x the amount of total GPU memory in the machine. For example if you have two GPUs with 48GB of memory each, you would multiply that by two and get 96GB for the system memory. ECC memory should be preferred for shared workstations and server class hardware.
Disk Storage¶
The requirements for disk storage can vary wildly depending on how a simulation is configured. A good starting point is to have 1–2TB of working storage for M-Star. This should be fast storage, preferably local SSD-based storage to optimize the write speed of large output files.