Skip to content

Fix transitive propagation of matrix jobs through already-compiled dependants#770

Merged
f-f merged 1 commit into
masterfrom
f-f/fix-transitive-cascade
May 28, 2026
Merged

Fix transitive propagation of matrix jobs through already-compiled dependants#770
f-f merged 1 commit into
masterfrom
f-f/fix-transitive-cascade

Conversation

@f-f
Copy link
Copy Markdown
Member

@f-f f-f commented May 25, 2026

Followup on #763 - with the current code we almost get to all the packages, except for a puzzling few (~50).

It turns out this was also happening due to the chaining of several bugs:

  • readMetadata was poisoning the cache for the AllMetadata: reading one metadata file would add it to the cached Map, which would then not be read in full when calling readAllMetadata if the git repo was not refreshed. This lead to building an incorrect/incomplete CompilerIndex; this patch removes resetFromDisk so that we do not cache single metadata reads anymore.
  • the incomplete CompilerIndex was an issue because it was lenient: if a package was present in the ManifestIndex but not in the MetadataIndex (as it would be the case for the case above with a poisoned/partial AllMetadata), then we would fall back to building the CompilerIndex with no purs version bounds for that package. This meant that the package could be included in a solver plan for purs versions for which it was not compatible with. Since we now guarantee that every package has purs constraints, this patch does not swallow this condition anymore, but crashes instead because we should rather catch this kind of data inconsistencies.
  • finally, the interplay between the order in which packages are cascaded, and tight vs wide version bounds made it so that with the current logic (a package X that solves for compiler Y tries to enqueue all its dependents for matrix jobs. If all the dependencies of a package were compatible with compiler Y then we'd run the job) we could have packages that do not solve at time X but would (and should) solve at time Y; concretely:
    • A depends on B which depends on C
    • B has very wide version bounds on C
    • there's a shared dependency D: older versions of C need old D, newer versions of C need new D, and A independently requires new D
    • C has many versions, and as soon as one that fits B's bound would compile with the new compiler, then we'd try to build B and succeed (with an old C, which pulls in old D)
    • we would then try to solve A, but that requires new D, so its build plan can't use that early C and it's forced onto a specific, newer C that only gets compiled later in the cascade
    • when that newer C finally compiles, the direct-dependants-only-cascade looks at the dependants of C, finds that B already built, and moves on
    • since A is only reachable through B it is never retried, even though it would now succeed

To fix this latter problem we change the cascading logic from "for every package that solves with compiler Y try to solve and enqueue its dependands" to "for every package that solves with compiler Y try to solve and enqueue its dependands; if a dependant is already compatible, recurse through its dependands until you reach packages that we have not solved yet, or the bottom of the tree".

It's a bit convoluted, but I hope it makes sense. I have added tests for all these issues, and in particular there is an integration test that verifies that this new logic works properly.

Last but not least, we add updateCompilerIndex which should help to update the CompilerIndex for each job so we don't need to rebuild it every time, hopefully shortening the job duration. For sanity I have not included that change here, I'd rather merge this separately.

@f-f f-f requested a review from thomashoneyman May 25, 2026 21:45
Copy link
Copy Markdown
Member

@thomashoneyman thomashoneyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good find 👏

@f-f f-f merged commit fa708f6 into master May 28, 2026
16 checks passed
@f-f f-f deleted the f-f/fix-transitive-cascade branch May 28, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants