Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #167 +/- ##
==========================================
+ Coverage 64.83% 68.46% +3.62%
==========================================
Files 47 48 +1
Lines 5625 5727 +102
Branches 954 1013 +59
==========================================
+ Hits 3647 3921 +274
+ Misses 1967 1794 -173
- Partials 11 12 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The SVD hot loops iterate over rows for a fixed column. With the row-major backing store (data[row][col]) these are column walks that thrash the cache. Work on the transpose of the input and store the U/V singular vectors transposed during the computation, then transpose them back before returning. The inner loops then scan memory sequentially. The public API and results are unchanged. Measured (1000x1000, deterministic input): 17764 ms -> 3804 ms (~4.7x). LU and EVD are unaffected. All tests pass. Also remove the dead exploratory get/set comments in matrix.js and make scripts/benchmark.js deterministic (seeded) and warmed up. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…airs) The eigenvalue decomposition only had a single 2x2 example. Add reconstruction tests (A·V = V·D) for symmetric (4x4 and 12x12), non-symmetric real-eigenvalue, and complex-eigenvalue-pair matrices, plus orthonormality of symmetric eigenvectors. These exercise tred2/tql2 and orthes/hqr2 and guard the upcoming performance refactor. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tred2/tql2 (the symmetric eigenproblem) accumulate the eigenvectors in V with the row index varying in the hot loops, i.e. column walks of the row-major backing store. Store V transposed during the reduction and transpose it back before returning; the inner loops then scan memory sequentially. Measured (symmetric, deterministic input): 600x600 666 ms -> 386 ms (~1.7x), 300x300 ~1.5x. Results unchanged (guarded by the new reconstruction tests). The non-symmetric path (orthes/hqr2) is deliberately left row-major: its two O(n^3) phases (QR sweep vs eigenvector back-transform) have opposite layout preferences, so a single transposed storage cannot help both. Add an "EVD (symmetric)" column to scripts/benchmark.js. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The transposed-storage optimization restored the logical layout of the output matrices with `M.transpose()`, which allocates a second full matrix while the old one is still live (transient ~1.5x peak memory). These outputs are square (SVD's V and EVD's V are always n x n; SVD's U is square whenever the input is), so transpose them in place via a new `transposeSquareInPlace` helper. No allocation in the common square case, so the optimization is now memory-neutral versus the original implementation. The working copy `at = value.transpose()` already replaced the original `value.clone()`, so it is not an extra allocation. Results remain bit-identical (verified by byte comparison against main) and the speedups are unchanged. Non-square SVD U falls back to allocating transpose. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hand-rolled mulberry32 PRNG with the ecosystem's ml-xsadd (XORSHIFT-ADD) generator via `new XSadd(seed).random`, in the EVD reconstruction tests and scripts/benchmark.js. Add ml-xsadd to devDependencies. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The decomposition algorithms iterate their hot inner loops down columns (the row index varies), but
Matrixstores data row-major (data[row][col]). Those column walks thrash the CPU cache. This PR makes the hot loops scan memory sequentially by storing the worked-on matrices transposed internally and transposing them back before returning — so the public API and the numerical results are unchanged.This supersedes the original exploratory commit (which globally swapped
get/set). A global swap is the wrong fix: it speeds SVD/EVD but makes LU ~1.6× slower, and breaks non-square matrices and every consumer that assumes row-majordata. The win is captured per-algorithm instead.What was done
src/dc/svd.js): work onat = value.transpose()and accumulate theU/Vsingular vectors in transposed storage, then transpose back. The Householder and QR-rotation inner loops become sequential.src/dc/evd.js,tred2/tql2): accumulate the eigenvectorsVtransposed, then transpose back.orthes/hqr2): left row-major on purpose. Its two O(n³) phases (the QR sweep vs. the eigenvector back-transformΣ V(i,k)·H(k,j)) have opposite layout preferences, so a single static transpose cancels out — not worth the large, fragile rewrite of the complex-eigenvector code.A·V = V·D, eigenvector orthonormality, complex eigenvalue pairs) for symmetric (4×4, 12×12) and non-symmetric matrices.scripts/benchmark.jsis now deterministic (seeded inputs), warmed up, and reports both EVD paths.Speed (1000×1000 unless noted, deterministic, warmed up)
(Numbers from an M-series laptop; ratios are what matter.)
Results are bit-identical, not just close
Because only storage location changes and never the order of arithmetic, every IEEE-754 operation sees the same operands in the same sequence. Verified by dumping full outputs (singular/eigen values + complete vector matrices, full float precision) from this branch and from
main, then byte-comparing withcmp:autoTransposeon/off.Zero differing bytes in both.
Follow-up
The non-symmetric EVD path could still be sped up (~1.5×) with a phase-split transpose (transpose
Vbetween the QR and back-transform phases). Left out of this PR to keep it low-risk; the new test oracle is in place to support it later.