perf: speed up SVD (~4.7x) and symmetric EVD (~1.7x) via transposed internal storage by lpatiny · Pull Request #167 · mljs/matrix

lpatiny · 2023-09-29T13:56:32Z

Summary

The decomposition algorithms iterate their hot inner loops down columns (the row index varies), but Matrix stores data row-major (data[row][col]). Those column walks thrash the CPU cache. This PR makes the hot loops scan memory sequentially by storing the worked-on matrices transposed internally and transposing them back before returning — so the public API and the numerical results are unchanged.

This supersedes the original exploratory commit (which globally swapped get/set). A global swap is the wrong fix: it speeds SVD/EVD but makes LU ~1.6× slower, and breaks non-square matrices and every consumer that assumes row-major data. The win is captured per-algorithm instead.

What was done

SVD (src/dc/svd.js): work on at = value.transpose() and accumulate the U/V singular vectors in transposed storage, then transpose back. The Householder and QR-rotation inner loops become sequential.
Symmetric EVD (src/dc/evd.js, tred2/tql2): accumulate the eigenvectors V transposed, then transpose back.
Non-symmetric EVD (orthes/hqr2): left row-major on purpose. Its two O(n³) phases (the QR sweep vs. the eigenvector back-transform Σ V(i,k)·H(k,j)) have opposite layout preferences, so a single static transpose cancels out — not worth the large, fragile rewrite of the complex-eigenvector code.
LU: untouched. Its only O(n³) loop is already a sequential row scan, so it is cache-optimal as-is.
Tests: the EVD only had a single 2×2 example. Added a reconstruction oracle (A·V = V·D, eigenvector orthonormality, complex eigenvalue pairs) for symmetric (4×4, 12×12) and non-symmetric matrices.
Benchmark: scripts/benchmark.js is now deterministic (seeded inputs), warmed up, and reports both EVD paths.

Speed (1000×1000 unless noted, deterministic, warmed up)

Decomposition	before	after	speedup
SVD	17764 ms	3804 ms	~4.7×
EVD, symmetric (600×600)	666 ms	386 ms	~1.7×
EVD, symmetric (300×300)	78 ms	51 ms	~1.5×
LU	232 ms	238 ms	unchanged (intentional)
EVD, non-symmetric	—	—	left row-major (see above)

(Numbers from an M-series laptop; ratios are what matter.)

Results are bit-identical, not just close

Because only storage location changes and never the order of arithmetic, every IEEE-754 operation sees the same operands in the same sequence. Verified by dumping full outputs (singular/eigen values + complete vector matrices, full float precision) from this branch and from main, then byte-comparing with cmp:

SVD: byte-for-byte identical across square / tall / wide shapes × autoTranspose on/off.
EVD: byte-for-byte identical across n = 3, 5, 10, 25, 40, 80 × {symmetric, symmetric auto-detected, general}.

Zero differing bytes in both.

Follow-up

The non-symmetric EVD path could still be sped up (~1.5×) with a phase-split transpose (transpose V between the QR and back-transform phases). Left out of this PR to keep it low-risk; the new test oracle is in place to support it later.

codecov · 2023-09-29T13:58:53Z

Codecov Report

❌ Patch coverage is 64.97175% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.46%. Comparing base (67cda77) to head (93bdb7e).

Files with missing lines	Patch %	Lines
scripts/benchmark.js	0.00%	56 Missing and 1 partial ⚠️
src/dc/svd.js	95.38%	3 Missing ⚠️
src/dc/evd.js	94.28%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
+ Coverage   64.83%   68.46%   +3.62%     
==========================================
  Files          47       48       +1     
  Lines        5625     5727     +102     
  Branches      954     1013      +59     
==========================================
+ Hits         3647     3921     +274     
+ Misses       1967     1794     -173     
- Partials       11       12       +1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The SVD hot loops iterate over rows for a fixed column. With the row-major backing store (data[row][col]) these are column walks that thrash the cache. Work on the transpose of the input and store the U/V singular vectors transposed during the computation, then transpose them back before returning. The inner loops then scan memory sequentially. The public API and results are unchanged. Measured (1000x1000, deterministic input): 17764 ms -> 3804 ms (~4.7x). LU and EVD are unaffected. All tests pass. Also remove the dead exploratory get/set comments in matrix.js and make scripts/benchmark.js deterministic (seeded) and warmed up. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…airs) The eigenvalue decomposition only had a single 2x2 example. Add reconstruction tests (A·V = V·D) for symmetric (4x4 and 12x12), non-symmetric real-eigenvalue, and complex-eigenvalue-pair matrices, plus orthonormality of symmetric eigenvectors. These exercise tred2/tql2 and orthes/hqr2 and guard the upcoming performance refactor. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tred2/tql2 (the symmetric eigenproblem) accumulate the eigenvectors in V with the row index varying in the hot loops, i.e. column walks of the row-major backing store. Store V transposed during the reduction and transpose it back before returning; the inner loops then scan memory sequentially. Measured (symmetric, deterministic input): 600x600 666 ms -> 386 ms (~1.7x), 300x300 ~1.5x. Results unchanged (guarded by the new reconstruction tests). The non-symmetric path (orthes/hqr2) is deliberately left row-major: its two O(n^3) phases (QR sweep vs eigenvector back-transform) have opposite layout preferences, so a single transposed storage cannot help both. Add an "EVD (symmetric)" column to scripts/benchmark.js. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The transposed-storage optimization restored the logical layout of the output matrices with `M.transpose()`, which allocates a second full matrix while the old one is still live (transient ~1.5x peak memory). These outputs are square (SVD's V and EVD's V are always n x n; SVD's U is square whenever the input is), so transpose them in place via a new `transposeSquareInPlace` helper. No allocation in the common square case, so the optimization is now memory-neutral versus the original implementation. The working copy `at = value.transpose()` already replaced the original `value.clone()`, so it is not an extra allocation. Results remain bit-identical (verified by byte comparison against main) and the speedups are unchanged. Non-square SVD U falls back to allocating transpose. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the hand-rolled mulberry32 PRNG with the ecosystem's ml-xsadd (XORSHIFT-ADD) generator via `new XSadd(seed).random`, in the EVD reconstruction tests and scripts/benchmark.js. Add ml-xsadd to devDependencies. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat: optimize SVD ?

e82b2ba

lpatiny added 4 commits June 20, 2026 11:04

Merge main into speed

b50b922

lpatiny changed the title ~~feat: optimize SVD ?~~ perf: speed up SVD (~4.7x) and symmetric EVD (~1.7x) via transposed internal storage Jun 20, 2026

lpatiny marked this pull request as ready for review June 20, 2026 09:52

lpatiny added 4 commits June 20, 2026 12:03

style: apply prettier formatting to EVD tests

93bdb7e

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore: ignore .claude

903f3f7

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lpatiny requested review from Copilot and targos and removed request for Copilot June 20, 2026 11:54

Copilot started reviewing on behalf of lpatiny June 20, 2026 12:03 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up SVD (~4.7x) and symmetric EVD (~1.7x) via transposed internal storage#167

perf: speed up SVD (~4.7x) and symmetric EVD (~1.7x) via transposed internal storage#167
lpatiny wants to merge 9 commits into
mainfrom
speed

lpatiny commented Sep 29, 2023 •

edited

Loading

Uh oh!

codecov Bot commented Sep 29, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lpatiny commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was done

Speed (1000×1000 unless noted, deterministic, warmed up)

Results are bit-identical, not just close

Follow-up

Uh oh!

codecov Bot commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lpatiny commented Sep 29, 2023 •

edited

Loading

codecov Bot commented Sep 29, 2023 •

edited

Loading