From 51bf29e7d551d79912495a51f5c8d85c3fe378cc Mon Sep 17 00:00:00 2001 From: Iskander Date: Tue, 16 Jun 2026 17:37:05 +0200 Subject: [PATCH 1/2] fix(phase-0): log and surface repeated MCP version check failures (fixes #30) On curl failure, the version check previously skipped silently with no trace. Agents on flaky networks had no visibility into repeated failures. Changes: - Add `--max-time 10` to the curl call so a hung request can't stall the loop - Add `circuit_breaker.mcp_version_check.fail_count` field to health.json template - Update Phase 0 failure behavior: increment fail_count on each empty LATEST, write a warning to STATE.md after 3 consecutive failures, reset on next success Mirrors the circuit_breaker pattern already used by the heartbeat phase. --- daemon/health.json | 6 +++++- daemon/loop.md | 8 ++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/daemon/health.json b/daemon/health.json index 5abab88..06ec2d3 100644 --- a/daemon/health.json +++ b/daemon/health.json @@ -19,7 +19,11 @@ "outreach_cost_sats": 0, "idle_cycles_count": 0 }, - "circuit_breaker": {}, + "circuit_breaker": { + "mcp_version_check": { + "fail_count": 0 + } + }, "last_discovery_date": "", "next_cycle_at": "2000-01-01T00:00:00.000Z" } diff --git a/daemon/loop.md b/daemon/loop.md index b80ccac..e268d87 100644 --- a/daemon/loop.md +++ b/daemon/loop.md @@ -24,7 +24,7 @@ Unlock wallet if STATE.md says locked. Load MCP tools if not present. Check if the MCP server has been updated since this loop started. ```bash -LATEST=$(curl -s https://api.github.com/repos/aibtcdev/aibtc-mcp-server/releases/latest | python3 -c "import sys,json; print(json.load(sys.stdin).get('tag_name','').replace('mcp-server-v',''))" 2>/dev/null) +LATEST=$(curl -s --max-time 10 https://api.github.com/repos/aibtcdev/aibtc-mcp-server/releases/latest | python3 -c "import sys,json; print(json.load(sys.stdin).get('tag_name','').replace('mcp-server-v',''))" 2>/dev/null) CACHED=$(python3 -c "import json; print(json.load(open('daemon/health.json')).get('mcp_version_cached','unknown'))" 2>/dev/null) || CACHED="unknown" [ -z "$CACHED" ] && CACHED="unknown" ``` @@ -33,7 +33,11 @@ CACHED=$(python3 -c "import json; print(json.load(open('daemon/health.json')).ge - **Version match**: Set `mcp_update_required` to `false` in health.json (clears the flag after a restart). Continue normally. - **Version mismatch** (`LATEST` != `CACHED`): set `mcp_update_required: true` **and** `mcp_version_cached` to `LATEST` in health.json. Complete the current cycle normally, then in Phase 9 (Sleep), exit instead of sleeping with message: "MCP update detected ({CACHED} -> {LATEST}). Exiting for restart. Run /loop-start to resume with updated version." -On curl failure (no internet, API rate limit): skip check, continue normally. Do not block the cycle on a version check failure. +On curl failure (`LATEST` is empty — no internet, GitHub API rate limit, or timeout): +- Increment `circuit_breaker.mcp_version_check.fail_count` in health.json (field initialized to `0` in the health.json template). +- If `fail_count` reaches **3**: write a warning line to STATE.md — `"⚠️ MCP version check failed 3 cycles in a row — check internet / GitHub API rate limit"` — then reset `fail_count` to `0`. +- Continue normally. Do not block the cycle on a version check failure. +- On the next **successful** check: reset `circuit_breaker.mcp_version_check.fail_count` to `0`. --- From d4a9265e5424b485e8fd402ec827b4af70181365 Mon Sep 17 00:00:00 2001 From: Iskander Date: Tue, 16 Jun 2026 17:50:09 +0200 Subject: [PATCH 2/2] fix(phase-0): address review feedback on MCP version check circuit-breaker MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three items from arc0btc's review: 1. [suggestion] Add timestamp to STATE.md warning line — agents can now tell when the failures occurred vs whether the issue is stale. 2. [question] Migration path for existing agents — if fail_count is missing from health.json (initialized before this PR), treat it as 0 and initialize it before incrementing rather than erroring. 3. [nit] Framing fix — "On curl failure" was inaccurate since LATEST can be empty from a degraded API response (empty tag_name) even if curl succeeds. Changed to "On empty LATEST" with explicit list of causes. --- daemon/loop.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/daemon/loop.md b/daemon/loop.md index e268d87..6270175 100644 --- a/daemon/loop.md +++ b/daemon/loop.md @@ -33,11 +33,12 @@ CACHED=$(python3 -c "import json; print(json.load(open('daemon/health.json')).ge - **Version match**: Set `mcp_update_required` to `false` in health.json (clears the flag after a restart). Continue normally. - **Version mismatch** (`LATEST` != `CACHED`): set `mcp_update_required: true` **and** `mcp_version_cached` to `LATEST` in health.json. Complete the current cycle normally, then in Phase 9 (Sleep), exit instead of sleeping with message: "MCP update detected ({CACHED} -> {LATEST}). Exiting for restart. Run /loop-start to resume with updated version." -On curl failure (`LATEST` is empty — no internet, GitHub API rate limit, or timeout): -- Increment `circuit_breaker.mcp_version_check.fail_count` in health.json (field initialized to `0` in the health.json template). -- If `fail_count` reaches **3**: write a warning line to STATE.md — `"⚠️ MCP version check failed 3 cycles in a row — check internet / GitHub API rate limit"` — then reset `fail_count` to `0`. +On empty `LATEST` (curl failure, GitHub API rate limit, timeout, or degraded API response returning no `tag_name`): +- Read `circuit_breaker.mcp_version_check.fail_count` from health.json. If the field is missing (agent initialized health.json before this field was added), treat it as `0` and initialize it to `0` before incrementing. +- Increment `fail_count` by 1 and write it back to health.json. +- If `fail_count` reaches **3**: write a timestamped warning line to STATE.md — `"{TIMESTAMP} ⚠️ MCP version check failed 3 cycles in a row — check internet / GitHub API rate limit"` (where `TIMESTAMP` is the current UTC ISO-8601 timestamp, e.g. `2026-06-16T15:45:29Z`) — then reset `fail_count` to `0`. - Continue normally. Do not block the cycle on a version check failure. -- On the next **successful** check: reset `circuit_breaker.mcp_version_check.fail_count` to `0`. +- On the next **successful** check (non-empty `LATEST`): reset `circuit_breaker.mcp_version_check.fail_count` to `0`. ---