Skip to content

fix(site_worker): remove watchmedo auto-restart causing BrokenPipeError loop#2418

Open
AybH26 wants to merge 1 commit into
codalab:developfrom
AybH26:fix/site_worker
Open

fix(site_worker): remove watchmedo auto-restart causing BrokenPipeError loop#2418
AybH26 wants to merge 1 commit into
codalab:developfrom
AybH26:fix/site_worker

Conversation

@AybH26

@AybH26 AybH26 commented Jun 16, 2026

Copy link
Copy Markdown

fix(site_worker): remove watchmedo wrapper that caused restart loop (#2417)

Closes #2417

Issue

Uploading a competition bundle leaves the "unpacking" step spinning forever. The unpack_competition Celery task is enqueued on the site-worker queue but never executed, because the site_worker container — although reported as Up by Docker — has no consumer attached to RabbitMQ.

Root cause

The previous site_worker command wrapped Celery in watchmedo so that task code edits would auto-reload:

command: ["watchmedo auto-restart -p '*.py' --recursive -- celery -A celery_config worker -B -Q site-worker -l info -n site-worker@%n --concurrency=2"]

watchmedo watches the entire mounted /app tree, which includes:

  • __pycache__/*.pyc files written by Python on every import
  • src/celerybeat-schedule* SQLite files written continuously by Celery beat itself

These files keep changing faster than Celery can finish initializing its prefork pool. Each aborted startup leaks pool worker processes and produces a BrokenPipeError in billiard. As a result no consumer ever attaches to the site-worker queue and tasks pile up indefinitely.

Fix

Remove the watchmedo wrapper from the site_worker command in docker-compose.yml:

   site_worker:
-    # This auto-reloads
-    command: ["watchmedo auto-restart -p '*.py' --recursive -- celery -A celery_config worker -B -Q site-worker -l info -n site-worker@%n --concurrency=2"]
+    command: ["celery -A celery_config worker -B -Q site-worker -l info -n site-worker@%n --concurrency=2"]
     working_dir: /app/src

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

site_worker stuck in restart loop — competition unpacking hangs forever

1 participant