Hardware Guide

Updated August 2025

GPU technology is constantly changing, so it can be confusing to know what hardware to purchase. The purpose of this document is explain the current state of GPU technology and provide recommendations.

tl;dr

Here are some options to consider for the impatient. Many users will land on a variation of the High End Workstation which provides a great price-performance ratio without jumping to the more expensive server hardware. However, users with complex modeling needs and large models may need to consider server class hardware.

Entry Level PC

1x NVidia GeForce 40-series or 50-series, 32–64GB System memory, 1TB Disk

Middle-of-the-Road Workstation

1x NVidia RTX 6000 Ada, 64–128GB System memory, 2TB Disk

High End Workstation

1x NVidia RTX PRO 6000, 256GB System memory, 2TB Disk

Note: Add another GPU dedicated for display when running Microsoft Windows.

Entry Level Server

4x H100 SXM with full NVLINK/NVSWITCH

High End Server

8x H100 SXM or B100 SXM with full NVLINK/NVSWITCH

Ultra Mega High End Server

16x or more B200 SXM with full NVLINK/NVSWITCH

First Considerations

The first consideration in designing a computer to run M-Star should be what GPUs you want to use, you can then design the computer around those GPUs. To determine GPUs, take into consideration the average size problem you need to solve. Larger simulations with additional physics will require more GPU memory. Simple simulations with moderate to coarse fluid resolution will require less GPU memory.

While you can speed up M-Star by running across more GPUs, bear in mind that each GPU needs a sufficient amount of load in order to provide the best efficiency. For example, if you run a simulation with 1M nodes on 8x A100s, it would not run much faster than a single A100. This is due to a bottleneck in performance in the GPU-to-GPU communication when the simulation is too thinly spread across your GPUs. It is best to design the computing resource based on how much GPU memory you actually need. This behavior is further discussed in Scaling Performance.

Important

Design your computing resource based on the GPU memory requirement for typical M-Star models you will run.

How much GPU memory do I need?

The size of the simulation, in terms of lattice density and particle count, is limited by the local GPU RAM. As a first order approximation, 1 GB of GPU RAM can support 2–4 million grid points and 1 million particles. Adding scalar fields, custom variables may change this scaling.

Loosely speaking, most simulations contain 1–100 million lattice points and/or 1–10 million particles. These simulations can typically be performed on a single performance GPU, which typically contains 16–80 GB of RAM. Simulations with larger memory requirements may require a multi-GPU configuration.

What memory bandwidth do I need?

Bandwidth is the rate at which data can be transferred between the GPU’s processing cores and its on‑board memory (VRAM). For NVidia GPUs it is reported in gigabytes per second (GB/s). Higher bandwidth allows the GPU to access and process data faster, improving overall performance.

After total memory capacity and raw compute performance (see TFlops), bandwidth is the next specification to compare. When two GPUs have similar memory size and computational speed, prefer the one with the higher memory bandwidth.

Note

Bandwidth matters most for memory‑bound workloads (large lattices or particle counts with comparatively low arithmetic intensity). If your models fit comfortably in memory and are compute‑bound, differences in bandwidth may have less impact.

GPU Spec Tables

Always reference the official datasheet provided by NVidia for official specfications. When multiple GPU variants are available, the one with more memory and/or cores is always listed.

Tables are grouped into three main types: Data Center, Workstation, and Consumer. Each table is sorted by TFLOPS in descending order.

Name

Name of the GPU.

TFlops

Theoretical Single Precision Teraflops (floating point operations per second) of a single GPU based on the Boost clock frequency and CUDA cores. Typically referred to as FP 32 TFlops in NVidia data sheets and other sources.

Memory

Amount of memory of a single GPU in gigabytes.

Memory Bandwidth

Peak memory bandwidth of a single GPU in gigabytes per second (GB/s). Higher is better for memory‑bound workloads. See What memory bandwidth do I need?.

NVLink N

The number of GPUs that may be connected to each other via NVLINK. A value of zero indicates NVlink is not supported. For more information, see GPU Topology.

ECC

Error Correcting Code. It is ‘y’ if this feature is supported. ECC prevents data corruption in memory.

Note

Regarding NVLink:

Most PCIe-based GPUs allow for either zero or two GPU connections. For example, if NVLink N = 2, this means a single NVLink bridge may be used to connect two GPUs.

In contrast, SXM-based GPUs typically allow for many NVLink connections to be made via NVLINK/NVSWITCH hardware.

Data Center GPUs

This category contains the top performing GPUs. These are typically recommended for server class hardware in a data center and for solving the largest problems.

Data Center Class

Name

TFlops

Memory (GB)

Memory Bandwidth (GB/s)

NVLink N

ECC

B200 PCIe

80

192

8000

576

y

B100 SXM

60

192

8000

576

y

H100 SXM

67

80

3360

8

y

H100 PCIe

51

80

2000

2

y

A100 SXM

19.5

80

1935

16

y

A100 PCIe

19.5

80

1935

2

y

L40S

91.6

48

864

0

y

L40

90.5

48

864

0

y

A40

37.4

48

696

2

y

A10

31.2

24

600

2

y

L4

30.3

24

300

0

y

A30

10.3

24

933

2

y

Workstation GPUs

PCIe-based GPUs intended for workstations and servers. These are recommended for middle range memory capacity.

Workstation Class

Name

TFlops

Memory (GB)

Memory Bandwidth (GB/s)

NVLink N

ECC

RTX PRO 6000 Workstation

125

96

1792

0

y

RTX PRO 6000 Server

120

96

1597

0

y

RTX PRO 6000 Max-Q Workstation

110

96

1792

0

y

RTX 6000 Ada

91.1

48

960

0

y

RTX PRO 5000

73.2

48

1344

0

y

RTX A6000

38.7

48

768

2

y

RTX 5000 Ada

65.3

32

576

0

y

Quadro GV100

14.8

32

870

2

y

RTX PRO 4000

46.9

24

672

0

y

RTX A5500

34.1

24

768

2

y

RTX A5000

27.8

24

768

2

y

RTX 4000 Ada

26.7

20

360

0

y

RTX A4500

23.7

20

640

2

y

RTX A4000

19.2

16

448

0

y

RTX A2000

8

12

288

0

y

Consumer/Gaming GPUs

Consumer (gaming) GPUs are PCIe boards intended primarily for gaming and general desktop workloads. They offer a lower cost per unit of theoretical compute but typically lack data‑center/workstation features such as NVLink, ECC memory, and broader validation/testing. The trade‑off is reduced reliability features and support versus workstation parts. For modest single‑GPU simulation that fit comfortably in memory, a high‑end consumer GPU can provide excellent price/performance.

Consumer/Gaming Class

Name

TFlops

Memory (GB)

Memory Bandwidth (GB/s)

NVLink N

ECC

RTX 5090

106.1

32

1792

0

n

RTX 4090

82.6

24

1008

0

n

RTX 3090 Ti

40

24

1008

2

n

RTX 3090

35.7

24

936

2

n

RTX 5080

56.8

16

960

0

n

RTX 4080

48.7

16

717

0

n

RTX 5070 Ti

43.9

16

896

0

n

RTX 5060 Ti (16GB)

23.7

16

448

0

n

RTX 4070 Ti

40.1

12

504

0

n

RTX 3080 Ti

34.1

12

912

0

n

RTX 5070

30.9

12

672

0

n

RTX 3080

29.8

12

912

0

n

RTX 3060

12.7

12

360

0

n

RTX 5060 Ti (8GB)

23.7

8

448

0

n

RTX 3070 Ti

21.7

8

608

0

n

RTX 3070

20.4

8

448

0

n

RTX 5060

19.2

8

448

0

n

RTX 3060 Ti

16.2

8

448

0

n

RTX 3050

9.1

8

224

0

n

CPU

M-Star CFD is not a CPU-bound process. This selection is left to the user.

System Memory

We recommend 1.5–2x the amount of total GPU memory in the machine. For example if you have two GPUs with 48GB of memory each, you would multiply that by two and get 96GB for the system memory. ECC memory should be preferred for shared workstations and server class hardware.

Disk Storage

The requirements for disk storage can vary wildly depending on how a simulation is configured. A good starting point is to have 1–2TB of working storage for M-Star. This should be fast storage, preferably local SSD-based storage to optimize the write speed of large output files.