Hardware Guide¶
Updated August 2025
GPU technology is constantly changing, so it can be confusing to know what hardware to purchase. The purpose of this document is explain the current state of GPU technology and provide recommendations.
tl;dr¶
Here are some options to consider for the impatient. Many users will land on a variation of the High End Workstation which provides a great price-performance ratio without jumping to the more expensive server hardware. However, users with complex modeling needs and large models may need to consider server class hardware.
- Entry Level PC
1x NVidia GeForce 40-series or 50-series, 32–64GB System memory, 1TB Disk
- Middle-of-the-Road Workstation
1x NVidia RTX 6000 Ada, 64–128GB System memory, 2TB Disk
- High End Workstation
1x NVidia RTX PRO 6000, 256GB System memory, 2TB Disk
Note: Add another GPU dedicated for display when running Microsoft Windows.
- Entry Level Server
4x H100 SXM with full NVLINK/NVSWITCH
- High End Server
8x H100 SXM or B100 SXM with full NVLINK/NVSWITCH
- Ultra Mega High End Server
16x or more B200 SXM with full NVLINK/NVSWITCH
First Considerations¶
The first consideration in designing a computer to run M-Star should be what GPUs you want to use, you can then design the computer around those GPUs. To determine GPUs, take into consideration the average size problem you need to solve. Larger simulations with additional physics will require more GPU memory. Simple simulations with moderate to coarse fluid resolution will require less GPU memory.
While you can speed up M-Star by running across more GPUs, bear in mind that each GPU needs a sufficient amount of load in order to provide the best efficiency. For example, if you run a simulation with 1M nodes on 8x A100s, it would not run much faster than a single A100. This is due to a bottleneck in performance in the GPU-to-GPU communication when the simulation is too thinly spread across your GPUs. It is best to design the computing resource based on how much GPU memory you actually need. This behavior is further discussed in Scaling Performance.
Important
Design your computing resource based on the GPU memory requirement for typical M-Star models you will run.
How much GPU memory do I need?¶
The size of the simulation, in terms of lattice density and particle count, is limited by the local GPU RAM. As a first order approximation, 1 GB of GPU RAM can support 2–4 million grid points and 1 million particles. Adding scalar fields, custom variables may change this scaling.
Loosely speaking, most simulations contain 1–100 million lattice points and/or 1–10 million particles. These simulations can typically be performed on a single performance GPU, which typically contains 16–80 GB of RAM. Simulations with larger memory requirements may require a multi-GPU configuration.
What memory bandwidth do I need?¶
Bandwidth is the rate at which data can be transferred between the GPU’s processing cores and its on‑board memory (VRAM). For NVidia GPUs it is reported in gigabytes per second (GB/s). Higher bandwidth allows the GPU to access and process data faster, improving overall performance.
After total memory capacity and raw compute performance (see TFlops), bandwidth is the next specification to compare. When two GPUs have similar memory size and computational speed, prefer the one with the higher memory bandwidth.
Note
Bandwidth matters most for memory‑bound workloads (large lattices or particle counts with comparatively low arithmetic intensity). If your models fit comfortably in memory and are compute‑bound, differences in bandwidth may have less impact.
GPU Spec Tables¶
Always reference the official datasheet provided by NVidia for official specfications. When multiple GPU variants are available, the one with more memory and/or cores is always listed.
Tables are grouped into three main types: Data Center, Workstation, and Consumer. Each table is sorted by TFLOPS in descending order.
- Name
Name of the GPU.
- TFlops
Theoretical Single Precision Teraflops (floating point operations per second) of a single GPU based on the Boost clock frequency and CUDA cores. Typically referred to as FP 32 TFlops in NVidia data sheets and other sources.
- Memory
Amount of memory of a single GPU in gigabytes.
- Memory Bandwidth
Peak memory bandwidth of a single GPU in gigabytes per second (GB/s). Higher is better for memory‑bound workloads. See What memory bandwidth do I need?.
- NVLink N
The number of GPUs that may be connected to each other via NVLINK. A value of zero indicates NVlink is not supported. For more information, see GPU Topology.
- ECC
Error Correcting Code. It is ‘y’ if this feature is supported. ECC prevents data corruption in memory.
Note
Regarding NVLink:
Most PCIe-based GPUs allow for either zero or two GPU connections. For example, if NVLink N = 2, this means a single NVLink bridge may be used to connect two GPUs.
In contrast, SXM-based GPUs typically allow for many NVLink connections to be made via NVLINK/NVSWITCH hardware.
Data Center GPUs¶
This category contains the top performing GPUs. These are typically recommended for server class hardware in a data center and for solving the largest problems.
Name |
TFlops |
Memory (GB) |
Memory Bandwidth (GB/s) |
NVLink N |
ECC |
---|---|---|---|---|---|
B200 PCIe |
80 |
192 |
8000 |
576 |
y |
B100 SXM |
60 |
192 |
8000 |
576 |
y |
H100 SXM |
67 |
80 |
3360 |
8 |
y |
H100 PCIe |
51 |
80 |
2000 |
2 |
y |
A100 SXM |
19.5 |
80 |
1935 |
16 |
y |
A100 PCIe |
19.5 |
80 |
1935 |
2 |
y |
L40S |
91.6 |
48 |
864 |
0 |
y |
L40 |
90.5 |
48 |
864 |
0 |
y |
A40 |
37.4 |
48 |
696 |
2 |
y |
A10 |
31.2 |
24 |
600 |
2 |
y |
L4 |
30.3 |
24 |
300 |
0 |
y |
A30 |
10.3 |
24 |
933 |
2 |
y |
Workstation GPUs¶
PCIe-based GPUs intended for workstations and servers. These are recommended for middle range memory capacity.
Name |
TFlops |
Memory (GB) |
Memory Bandwidth (GB/s) |
NVLink N |
ECC |
---|---|---|---|---|---|
RTX PRO 6000 Workstation |
125 |
96 |
1792 |
0 |
y |
RTX PRO 6000 Server |
120 |
96 |
1597 |
0 |
y |
RTX PRO 6000 Max-Q Workstation |
110 |
96 |
1792 |
0 |
y |
RTX 6000 Ada |
91.1 |
48 |
960 |
0 |
y |
RTX PRO 5000 |
73.2 |
48 |
1344 |
0 |
y |
RTX A6000 |
38.7 |
48 |
768 |
2 |
y |
RTX 5000 Ada |
65.3 |
32 |
576 |
0 |
y |
Quadro GV100 |
14.8 |
32 |
870 |
2 |
y |
RTX PRO 4000 |
46.9 |
24 |
672 |
0 |
y |
RTX A5500 |
34.1 |
24 |
768 |
2 |
y |
RTX A5000 |
27.8 |
24 |
768 |
2 |
y |
RTX 4000 Ada |
26.7 |
20 |
360 |
0 |
y |
RTX A4500 |
23.7 |
20 |
640 |
2 |
y |
RTX A4000 |
19.2 |
16 |
448 |
0 |
y |
RTX A2000 |
8 |
12 |
288 |
0 |
y |
Consumer/Gaming GPUs¶
Consumer (gaming) GPUs are PCIe boards intended primarily for gaming and general desktop workloads. They offer a lower cost per unit of theoretical compute but typically lack data‑center/workstation features such as NVLink, ECC memory, and broader validation/testing. The trade‑off is reduced reliability features and support versus workstation parts. For modest single‑GPU simulation that fit comfortably in memory, a high‑end consumer GPU can provide excellent price/performance.
Name |
TFlops |
Memory (GB) |
Memory Bandwidth (GB/s) |
NVLink N |
ECC |
---|---|---|---|---|---|
RTX 5090 |
106.1 |
32 |
1792 |
0 |
n |
RTX 4090 |
82.6 |
24 |
1008 |
0 |
n |
RTX 3090 Ti |
40 |
24 |
1008 |
2 |
n |
RTX 3090 |
35.7 |
24 |
936 |
2 |
n |
RTX 5080 |
56.8 |
16 |
960 |
0 |
n |
RTX 4080 |
48.7 |
16 |
717 |
0 |
n |
RTX 5070 Ti |
43.9 |
16 |
896 |
0 |
n |
RTX 5060 Ti (16GB) |
23.7 |
16 |
448 |
0 |
n |
RTX 4070 Ti |
40.1 |
12 |
504 |
0 |
n |
RTX 3080 Ti |
34.1 |
12 |
912 |
0 |
n |
RTX 5070 |
30.9 |
12 |
672 |
0 |
n |
RTX 3080 |
29.8 |
12 |
912 |
0 |
n |
RTX 3060 |
12.7 |
12 |
360 |
0 |
n |
RTX 5060 Ti (8GB) |
23.7 |
8 |
448 |
0 |
n |
RTX 3070 Ti |
21.7 |
8 |
608 |
0 |
n |
RTX 3070 |
20.4 |
8 |
448 |
0 |
n |
RTX 5060 |
19.2 |
8 |
448 |
0 |
n |
RTX 3060 Ti |
16.2 |
8 |
448 |
0 |
n |
RTX 3050 |
9.1 |
8 |
224 |
0 |
n |
CPU¶
M-Star CFD is not a CPU-bound process. This selection is left to the user.
System Memory¶
We recommend 1.5–2x the amount of total GPU memory in the machine. For example if you have two GPUs with 48GB of memory each, you would multiply that by two and get 96GB for the system memory. ECC memory should be preferred for shared workstations and server class hardware.
Disk Storage¶
The requirements for disk storage can vary wildly depending on how a simulation is configured. A good starting point is to have 1–2TB of working storage for M-Star. This should be fast storage, preferably local SSD-based storage to optimize the write speed of large output files.