Skip to content

fix(cuda): include cooperative_groups/reduce.h for CUDA 13 (libkernels.so build)#123

Merged
dndungu merged 1 commit into
mainfrom
fix/cuda13-cg-reduce-include
Jun 9, 2026
Merged

fix(cuda): include cooperative_groups/reduce.h for CUDA 13 (libkernels.so build)#123
dndungu merged 1 commit into
mainfrom
fix/cuda13-cg-reduce-include

Conversation

@dndungu

@dndungu dndungu commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Problem

On CUDA 13 (nvcc 13.x), make shared fails compiling gemv_q4k_sm121.cu:

error: namespace "cooperative_groups" has no member "plus"
    acc = cg::reduce(warp, acc, cg::plus<float>());

CUDA 13 moved cg::reduce / cg::plus out of <cooperative_groups.h> into <cooperative_groups/reduce.h>. The file only included the former, so the whole libkernels.so build breaks on CUDA 13 toolchains (e.g. the GB10 DGX, CUDA 13.0).

Fix

Add #include <cooperative_groups/reduce.h>. Verified: make shared CUDA_ARCH=sm_121 builds cleanly on the GB10 DGX (CUDA 13.0) after this change. No behavior change on older CUDA (header is additive).

CUDA 13 moved cg::reduce / cg::plus out of <cooperative_groups.h> into
<cooperative_groups/reduce.h>. Without the explicit include, gemv_q4k_sm121.cu
fails to compile (libkernels.so build breaks) under nvcc 13.x. Verified: the
sm_121 (GB10) kernel build succeeds with this include added.
@dndungu dndungu merged commit bcbdd9d into main Jun 9, 2026
1 check passed
@dndungu dndungu deleted the fix/cuda13-cg-reduce-include branch June 9, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant