Fix transitive propagation of matrix jobs through already-compiled dependants#770
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Followup on #763 - with the current code we almost get to all the packages, except for a puzzling few (~50).
It turns out this was also happening due to the chaining of several bugs:
readMetadatawas poisoning the cache for theAllMetadata: reading one metadata file would add it to the cached Map, which would then not be read in full when callingreadAllMetadataif the git repo was not refreshed. This lead to building an incorrect/incompleteCompilerIndex; this patch removesresetFromDiskso that we do not cache single metadata reads anymore.CompilerIndexwas an issue because it was lenient: if a package was present in theManifestIndexbut not in theMetadataIndex(as it would be the case for the case above with a poisoned/partialAllMetadata), then we would fall back to building theCompilerIndexwith nopursversion bounds for that package. This meant that the package could be included in a solver plan forpursversions for which it was not compatible with. Since we now guarantee that every package haspursconstraints, this patch does not swallow this condition anymore, but crashes instead because we should rather catch this kind of data inconsistencies.Xthat solves for compilerYtries to enqueue all its dependents for matrix jobs. If all the dependencies of a package were compatible with compilerYthen we'd run the job) we could have packages that do not solve at time X but would (and should) solve at time Y; concretely:Adepends onBwhich depends onCBhas very wide version bounds onCD: older versions ofCneed oldD, newer versions ofCneed newD, andAindependently requires newDChas many versions, and as soon as one that fitsB's bound would compile with the new compiler, then we'd try to buildBand succeed (with an oldC, which pulls in oldD)A, but that requires newD, so its build plan can't use that earlyCand it's forced onto a specific, newerCthat only gets compiled later in the cascadeCfinally compiles, the direct-dependants-only-cascade looks at the dependants ofC, finds thatBalready built, and moves onAis only reachable throughBit is never retried, even though it would now succeedTo fix this latter problem we change the cascading logic from "for every package that solves with compiler Y try to solve and enqueue its dependands" to "for every package that solves with compiler Y try to solve and enqueue its dependands; if a dependant is already compatible, recurse through its dependands until you reach packages that we have not solved yet, or the bottom of the tree".
It's a bit convoluted, but I hope it makes sense. I have added tests for all these issues, and in particular there is an integration test that verifies that this new logic works properly.
Last but not least, we add
updateCompilerIndexwhich should help to update theCompilerIndexfor each job so we don't need to rebuild it every time, hopefully shortening the job duration. For sanity I have not included that change here, I'd rather merge this separately.