Hardware Guide

Updated March 2023

GPU technology is constantly changing, so it can be confusing to know what hardware to purchase. The purpose of this document is explain the current state of GPU technology and provide recommendations.

tl;dr

Here are some options to consider for the impatient. Many users will land on a variation of the “High End Workstation” which provides a great price-performance ratio without jumping to the more expensive server hardware. However, users with complex modeling needs and large models may need to consider server class hardware.

Entry Level PC

1x NVidia GeForce 20-series or 30-series, 32–64GB System memory, 1TB Disk

Middle-of-the-Road Workstation

1x NVidia RTX A5500, 64–96GB System memory, 2TB Disk

High End Workstation

2x NVidia RTX A6000 with NVLINK, 256GB System memory, 2TB Disk

Note: Add another GPU dedicated for display when running Microsoft Windows.

Entry Level Server

4x A100 SXM with full NVLINK/NVSWITCH

High End Server

8x A100 SXM or H100 SXM with full NVLINK/NVSWITCH

Ultra Mega High End Server

16x or more H100 SXM with full NVLINK/NVSWITCH

First Considerations

The first consideration in designing a computer to run M-Star should be what GPUs you want to use, you can then design the computer around those GPUs. To determine GPUs, take into consideration the average size problem you need to solve. Larger simulations with additional physics will require more GPU memory. Simple simulations with moderate to coarse fluid resolution will require less GPU memory.

While you can speed up M-Star by running across more GPUs, bear in mind that each GPU needs a sufficient amount of load in order to provide the best efficiency. For example, if you run a simulation with 1M nodes on 8x A100s, it would not run much faster than a single A100. This is due to a bottleneck in performance in the GPU-to-GPU communication when the simulation is too thinly spread across your GPUs. It is best to design the computing resource based on how much GPU memory you actually need. This behavior is further discussed in Scaling Performance.

Important

Design your computing resource based on the GPU memory requirement for typical M-Star models you will run.

How much GPU memory do I need?

The size of the simulation, in terms of lattice density and particle count, is limited by the local GPU RAM. As a first order approximation, 1 GB of GPU RAM can support 2–4 million grid points and 1 million particles. Adding scalar fields, custom variables may change this scaling.

Loosely speaking, most simulations contain 1–100 million lattice points and/or 1–10 million particles. These simulations can typically be performed on a single performance GPU, which typically contains 16–80 GB of RAM. Simulations with larger memory requirements may require a multi-GPU configuration.

GPU Spec Tables

Always reference the official datasheet provided by NVidia for official specfications. When multiple GPU variants are available, the one with more memory and/or cores is always listed.

Tables are grouped into three main types: Data Center, Workstation, and Consumer. Each table is sorted by TFLOPS in descending order.

Name

Name of the GPU.

TFlops

Theoretical Single Precision Teraflops of a single GPU based on the Boost clock frequency and CUDA cores. Typically referred to as FP 32 TFlops in NVidia data sheets and other sources.

Memory

Amount of memory of a single GPU in gigabytes.

NVLink N

The number of GPUs that may be connected to each other via NVLINK. A value of zero indicates NVlink is not supported. For more information, see GPU Topology.

ECC

Error Correcting Code. It is ‘y’ if this feature is supported. ECC prevents data corruption in memory.

Note

Regarding NVLink:

Most PCIe-based GPUs allow for either zero or two GPU connections. For example, if NVLink N = 2, this means a single NVLink bridge may be used to connect two GPUs.

In contrast, SXM-based GPUs typically allow for many NVLink connections to be made via NVLINK/NVSWITCH hardware.

Data Center GPUs

This category contains the top performing GPUs. These are typically recommended for server class hardware in a data center and for solving the largest problems.

Data Center Class

Name

TFlops

Memory (GB)

NVLink N

ECC

L4

30.3

24

0

n

L40S

91.6

48

0

y

L40

90.5

48

0

y

H100 SXM

67.0

80

256

y

H100 PCIe

51.0

80

2

y

A40

37.4

48

2

y

A10

31.2

24

0

n

A100 PCIe

19.5

80

2

y

A100 SXM

19.5

80

16

y

V100 SXM2

15.7

32

8

y

V100 PCIe

14.0

32

2

y

A30

10.3

24

2

n

T4

8.1

16

0

y

Workstation GPUs

PCIe-based GPUs intended for workstations and servers. These are recommended for middle range memory capacity.

Workstation Class

Name

TFlops

Memory (GB)

NVLink N

ECC

RTX 6000 Ada

91.1

48

0

y

RTX A6000

38.7

48

2

y

RTX A5500

34.1

24

2

y

RTX A5000

27.8

24

2

y

RTX A4500

23.7

20

2

y

RTX A4000

19.2

16

0

y

Quadro GV100

14.8

32

2

y

RTX A2000

8.0

12

0

y

Consumer/Gaming GPUs

PCIe-based GPUs intended for gaming. These GPUs tend to be lower cost and lack features such as NVLINK or ECC memory. NVLink capability is currently limited to Linux only.

Consumer/Gaming Class

Name

TFlops

Memory (GB)

NVLink N

ECC

RTX 4090

82.6

24

0

n

RTX 4080

48.8

16

0

n

RTX 4070 Ti

40.1

12

0

n

RTX 3090 Ti

40.0

24

2

n

RTX 3090

35.7

24

2

n

RTX 3080 Ti

34.2

12

0

n

RTX 3080

30.6

12

0

n

RTX 3070 Ti

21.7

8

0

n

RTX 3070

20.4

8

0

n

RTX 3060 Ti

16.2

8

0

n

RTX 2080 Ti

14.3

11

2

n

RTX 3060

12.8

12

0

n

RTX 2080 Super

11.2

8

2

n

RTX 2080

10.6

8

2

n

RTX 3050

9.1

8

0

n

RTX 2070 Super

8.8

8

2

n

RTX 2070

7.9

8

0

n

RTX 2060 Super

7.2

8

0

n

RTX 2060

7.2

12

0

n

CPU

M-Star CFD is not a CPU-bound process. This selection is left to the user.

System Memory

We recommend 1.5–2x the amount of total GPU memory in the machine. For example if you have two GPUs with 48GB of memory each, you would multiply that by two and get 96GB for the system memory. ECC memory should be preferred for shared workstations and server class hardware.

Disk Storage

The requirements for disk storage can vary wildly depending on how a simulation is configured. A good starting point is to have 1–2TB of working storage for M-Star. This should be fast storage, preferably local SSD-based storage to optimize the write speed of large output files.