fix(scoring): re-enqueue scoring after commit to avoid stuck SCORING …#2420
Open
AybH26 wants to merge 1 commit into
Open
fix(scoring): re-enqueue scoring after commit to avoid stuck SCORING …#2420AybH26 wants to merge 1 commit into
AybH26 wants to merge 1 commit into
Conversation
…rows When the compute worker PATCHes a submission to status=SCORING, the API serializer used to call run_submission() synchronously inside the same DB transaction. If the broker (RabbitMQ) was unreachable at that exact moment, the status row would commit but the scoring task would never be published, leaving the submission stuck in SCORING forever (no recovery: the 24h cleanup only rescues RUNNING rows). Move the enqueue into transaction.on_commit so the task is only published after the SCORING status is durably committed, and explicitly mark the submission as Failed (with a clear status_details) if the publish still fails, so the row never stays in a non-terminal limbo state. Wrap update() in @transaction.atomic to make the commit boundary explicit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(scoring): re-enqueue scoring after commit to avoid stuck
SCORINGrows (#2419)Closes #2419
Issue
Submissions can stay in
Scoringforever. The compute worker PATCHes the submission tostatus=SCORINGto trigger the scoring step, but if the broker (RabbitMQ) has a brief hiccup at that exact moment, the status row commits while the scoring task is never published. The submission then sits inScoringindefinitely: the 24h cleanup (submission_status_cleanup, src/apps/competitions/tasks.py:797-806) only rescuesRUNNINGrows, so there is no recovery path today. Participants saw this as "stuck in Scoring for many hours" / "submitted ~12h ago, still scoring".Root cause
In
SubmissionCreationSerializer.update()(src/apps/api/serializers/submissions.py) the scoring re-enqueue was published synchronously, inside the request transaction, without retry oron_commitguard:Two problems compound:
run_submissionis called beforesuper().update()commits, so the broker can see (and start) the scoring task on a row that does not yet reflectSCORING.ConnectionResetError,OperationalError, AMQP timeout) bubbles up from inside the PATCH handler. The worker side then swallows it in_update_status(literal comment in compute_worker/compute_worker.py:632-643: "Always catch exception and never raise error"), so the row remains inScoringwith no task ever queued.Fix
Defer the scoring enqueue until after the DB transaction commits, and explicitly mark the submission
Failed(with a clearstatus_details) if the publish itself fails, so no submission stays in a non-terminal limbo.update()is also wrapped in@transaction.atomicto make the commit boundary explicit.