Add initial end-to-end CUDA FGMRES solver path by LwhJesse · Pull Request #2825 · su2code/SU2

LwhJesse · 2026-06-01T14:37:52Z

Proposed Changes

This PR adds an initial end-to-end CUDA FGMRES linear solve path on top of the existing CUDA BSR SpMV path.

It intentionally bundles the minimal pieces required for a reviewable GPU linear-solve slice, rather than sending the intermediate infrastructure-only pieces separately. The scope is limited to one GPU Krylov solver path (FGMRES), one simple GPU preconditioner path (JACOBI), and the vector operations and dispatch/lifecycle changes strictly required to make that path run.

Concretely, this PR:

caches the cuSPARSE SpMV resources needed by the solver path
adds CUDA FGMRES scaffolding and internal dispatch while keeping the public solver entry point unchanged
adds the CUDA vector primitives needed by the solver path
implements an initial CUDA FGMRES solve path
adds a simple CUDA Jacobi preconditioner path
keeps cuSPARSE for SpMV
keeps cuBLAS for dot / norm
uses custom CUDA kernels for vector-vector operations

This PR does not attempt to add more GPU Krylov solvers, more advanced GPU preconditioners, remove the current host-driven Krylov control flow, or perform broader cache / portability / cleanup work beyond this minimal slice.

Related Work

This PR follows the review direction discussed in #2822, where the request was to show a working end-to-end GPU linear solve path before splitting out additional infrastructure work.

It also follows the implementation preferences discussed in #2816:

cuSPARSE for SpMV
cuBLAS for dot / norm
custom CUDA kernels for vector-vector operations

Suggested review order:

53bacf193f Cache CUDA SpMV cuSPARSE resources
08fde80e1e Add CUDA FGMRES and Jacobi scaffolding
fde2c145cf Implement CUDA vector primitives
2b4f9d8716 Implement CUDA FGMRES solve path
9c344ee793 Implement CUDA Jacobi preconditioner

Validation

Validated locally with:

python3.12 -m pre_commit run --all-files
serial CUDA build compilation
mixed-precision CUDA build compilation
serial CPU build compilation
OpenMP CPU build compilation
CPU/GPU numerical comparison on 6 representative cases, each tested with LINEAR_SOLVER_PREC=NONE and LINEAR_SOLVER_PREC=JACOBI
nsys profiling
ncu profiling

Representative cases used for validation:

periodic2d_sector
udf_lam_flatplate_s
udf_lam_flatplate_m
udf_lam_flatplate_l
udf_test_11_probes_s
udf_test_11_probes_m

In short: this branch compiles, the end-to-end CUDA FGMRES path runs successfully on the tested cases, and the GPU-side results are numerically consistent with the CPU-side results. Across the tested cases, the CPU and GPU residual histories either match exactly or differ only at floating-point roundoff level.

Performance was also checked on the same representative cases against both a serial CPU build and a 20-thread OpenMP CPU build. The GPU path is faster than the serial CPU baseline on the medium and large cases tested here. Against the 20-thread OpenMP CPU baseline, it is not beneficial on the smallest cases, but still shows a clear speedup on the medium and large cases tested here.

The simple Jacobi path is numerically valid, but is not yet a net performance win on these cases.

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

LwhJesse added 5 commits June 1, 2026 01:48

Cache CUDA SpMV cuSPARSE resources

53bacf1

Add CUDA FGMRES and Jacobi scaffolding

08fde80

Implement CUDA vector primitives

fde2c14

Implement CUDA FGMRES solve path

2b4f9d8

Implement CUDA Jacobi preconditioner

9c344ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial end-to-end CUDA FGMRES solver path#2825

Add initial end-to-end CUDA FGMRES solver path#2825
LwhJesse wants to merge 5 commits into
su2code:developfrom
LwhJesse:gpu/initial-cuda-fgmres

LwhJesse commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LwhJesse commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Related Work

Validation

PR Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LwhJesse commented Jun 1, 2026 •

edited

Loading