SGEMM Optimization

This repository is a CUDA SGEMM case study presented as a technical whitepaper and kernel academy. It starts from readable FP32 baselines, climbs through tiled, bank-conflict-aware, double-buffer, and guarded Tensor Core WMMA paths, then frames every performance claim with explicit validation boundaries.

Why it stands out

Readable optimization ladder: every kernel stage exists to expose one bottleneck shift.
Evidence-first public story: correctness policy, benchmark scope, and local-versus-CI trust boundaries stay attached to every claim.
Interview-grade positioning: the Pages site is written so the project can be explained, defended, and audited under technical pressure.
Bilingual mirrored docs: English and Chinese routes stay structurally aligned across the full public site.

Quick start

git clone https://github.com/LessUp/sgemm-optimization.git
cd sgemm-optimization

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
./build/bin/sgemm_benchmark -a
ctest --test-dir build

Runtime tests and benchmarks require a local CUDA-capable machine. Hosted CI covers repository integrity, documentation, OpenSpec validation, and Pages buildability.

GitHub Pages entry points

The README is the executive summary. The long-form technical narrative lives on Pages.

Goal	Entry point
Open English home	English Home
Open Chinese home	中文首页
Get oriented quickly	Project Guide
Inspect system structure	Architecture
Study the kernel ladder	Academy
Check what the evidence proves	Validation
Trace papers and related repos	Research Desk
Read normative repository requirements	OpenSpec Specs

Validation boundary

Environment	What it can prove
Hosted CI	Docs structure, route integrity, OpenSpec consistency, Pages buildability
Local CUDA GPU	Runtime correctness, fallback behavior, benchmark performance

This split is deliberate. CI keeps the repository coherent, but only local GPU execution can validate runtime behavior and speed claims.

Source map

src/kernels/   CUDA SGEMM implementations
src/utils/     CUDA RAII, verification, benchmark helpers
src/main.cu    benchmark CLI
tests/         Google Test coverage against cuBLAS
docs/          VitePress whitepaper and academy, mirrored under /en and /zh
openspec/      stable specs and change workflow

License

MIT. See LICENSE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.claude		.claude
.githooks		.githooks
.github		.github
.vscode		.vscode
benchmarks/data		benchmarks/data
docs		docs
openspec		openspec
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CONTEXT.md		CONTEXT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
specs.md		specs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGEMM Optimization

Why it stands out

Quick start

GitHub Pages entry points

Validation boundary

Source map

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGEMM Optimization

Why it stands out

Quick start

GitHub Pages entry points

Validation boundary

Source map

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages