fix(update): honor yaml source_root so update stops mass-deleting the index#320
Merged
Merged
Conversation
HumanBean17
added a commit
that referenced
this pull request
Jun 14, 2026
A pyarrow/lance worker thread (loaded via lancedb in lifecycle commands) can outlive CPython finalization in a one-shot CLI subprocess and trip PyGILState_Release (SIGABRT, exit -6). It's a thread-timing race — flaky — and it intermittently red-blocked unrelated PRs: it killed the erase step of test_cli_lifecycle_round_trip_init_increment_meta_erase on PR #320 (which touches only installer.py), while the same test passed on green master #319. Route the installed `java-codebase-rag` entry through _console_script_main, which flushes stdout/stderr and os._exit(rc) instead of returning into the racy teardown. main() stays return-based so in-process test callers keep working. Co-authored-by: Claude <noreply@anthropic.com>
… index run_update passed the discovered config dir as an explicit source_root to resolve_operator_config, routing it into the branch that SKIPS the YAML source_root field. With a config living in a subdir next to `source_root: ../`, update then indexed that subdir (no Java) against the real index one level up, so cocoindex treated every indexed file as removed and deleted them — the "Updating index (Lance + graph)..." hang, and the ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left cocoindex.db mid-reconcile. This is the same bug class #316 fixed for the MCP server (its docstring warns that a non-None source_root skips the YAML field); run_update was the last production caller still passing a discovered dir. Pass source_root=None so the YAML source_root is honored exactly like increment/init/reprocess. run_install is unaffected (it passes the user-confirmed Java root). Adds a regression test mirroring the reported layout (config in my-project-context/, source_root: ../, real index one level up) that captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to the YAML root, not the config dir. No schema, ontology, embedding, or env-var change. Existing indexes remain valid; no reindex required. Co-Authored-By: Claude <noreply@anthropic.com>
053de82 to
a953461
Compare
Merged
HumanBean17
added a commit
that referenced
this pull request
Jun 15, 2026
Catch-up: master advanced (#322 installer cross_service_resolution, #323 config embedding.model resolution, #325 version 0.6.2, #326 PR-1 progress.py) while the index-output-rework stack was based on #320. This merges those in so the catch-up PR (#330) carries only PR-2/3/4. Conflicts resolved (both add/add, feature branch is the superset): - java_codebase_rag/progress.py (master had PR-1 state; branch has PR-1 + CallbackRenderer/make_relay/build_index_progress_context) - tests/test_progress.py (master had PR-1's 14 tests; branch adds PR-2/3/4 tests) Auto-merged cleanly: installer.py (#322 + PR-4), pyproject.toml (version 0.6.2 + rich>=14,<15), tests/test_installer.py. Verified: ruff clean; full suite 833 passed, 13 skipped (heavy-gated). Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
java-codebase-rag updateresolvedsource_rootdifferently from every other operator command and, for the documented nested-config layout, pointed cocoindex at the config subdir (no Java) while pointed at the real, fully-populated index — so cocoindex treated every indexed file as removed and mass-deleted them.Root cause
run_updatepassed the discovered config dir as an explicitsource_roottoresolve_operator_config(installer.py):A non-
Nonesource_rootroutes into the explicit-override branch that skips the YAMLsource_rootfield (config.py). With a config inmy-project-context/next tosource_root: ../,updatethen resolvedsource_roottomy-project-context/(only the YAML) butindex_dirto the real index one level up (../.java-codebase-rag). cocoindex, told the source had no Java but aimed at a full index, began deleting every vector.This is the identical bug class #316 fixed for the MCP server — its own docstring warns that a non-
Nonesource_rootskips the YAML field.run_update(added in #290, before #316 existed) was the last production caller still passing a discovered dir.Symptoms this explains
increment= clean 6 s no-op, butupdate's identical index phase hung 5 min+ at "Updating index (Lance + graph)…"._deletionsgrew monotonically (the mass row-deletions).cocoindex.dbmid-reconcile, the nextincrement(which resolves the root correctly) re-embedded nearly everything → 1000 s+.The fix
One line — pass
source_root=Noneso the YAMLsource_rootis honored exactly likeincrement/init/reprocess:The
discover_project_root(cwd)no-config guard above it is unchanged.run_installis unaffected (it passes the user-confirmed Java root).run_updateis now the only production caller ofresolve_operator_configbesides the (already-fixed) MCP server and the CLI, and all honor the YAML field consistently.Validation (TDD)
test_update_honors_yaml_source_root_for_nested_config_dirmirrors the reported layout, captures the env handed to cocoindex, assertsJAVA_CODEBASE_RAG_SOURCE_ROOT= the YAML root (not the config dir) andJAVA_CODEBASE_RAG_INDEX_DIR= the real index. Watched it fail RED (SOURCE_ROOToff by one level), then pass GREEN..venv/bin/ruff check .— clean..venv/bin/python -m pytest tests -q→ 775 passed, 11 skipped (heavy-gated). No regressions.User-visible behaviour changes
java-codebase-rag updatenow resolvessource_rootconsistently withincrement/init/reprocess/the MCP server. For the common case (config at the source root, or running from the source root) behaviour is unchanged. For a config living in a subdirectory of the Java tree (themy-context/+source_root: ../layout documented in fix(config): consistent index_dir/source_root resolution for CLI and MCP #316),update's index phase now operates on the correct source instead of mass-deleting the index.No reindex required. No schema, ontology, embedding, or env-var change. Existing indexes remain valid.
Recovery for affected indexes
If an index was damaged by the bug (run
updatefrom a nested config dir before this fix), from the config dir:increment/init/reprocessresolve the root correctly, so they are safe to run even before applying this fix; onlyupdate's index phase was affected.🤖 Generated with Claude Code