HumanBean17 · HumanBean17 · Jun 17, 2026 · Jun 15, 2026 · Jun 15, 2026 · Jun 15, 2026
@@ -74,8 +74,8 @@ when needed.
 |------|------|
 | `server.py` | MCP stdio server. Every `@mcp.tool` lives here. |
 | `search_lancedb.py` | Vector / hybrid / graph-expanded search; ranking. |
-| `build_ast_graph.py` | Tree-sitter → Kuzu graph builder (full rebuild). Owns `pass1`–`pass6` (`pass5` emits `HTTP_CALLS` / `ASYNC_CALLS` caller edges; `pass6_match_edges` resolves cross-service / intra-service / ambiguous / phantom / unresolved match outcomes — ontology 7). |
-| `kuzu_queries.py` | Read-only Cypher helpers used by the server. Includes `meta()` decoder for the Kuzu MAP-as-STRING JSON-blob columns. |
+| `build_ast_graph.py` | Tree-sitter → LadybugDB graph builder (full rebuild). Owns `pass1`–`pass6` (`pass5` emits `HTTP_CALLS` / `ASYNC_CALLS` caller edges; `pass6_match_edges` resolves cross-service / intra-service / ambiguous / phantom / unresolved match outcomes — ontology 7). |
+| `ladybug_queries.py` | Read-only Cypher helpers used by the server. Includes `meta()` decoder for the LadybugDB MAP-as-STRING JSON-blob columns. |
 | `ast_java.py` | Tree-sitter Java parsing, role/capability inference, `_string_value_atoms` helper (shared by route/client/producer extractors), `_collect_outgoing_calls` for caller-side detection. |
 | `graph_enrich.py` | `module` / `microservice` resolution, `BrownfieldOverrides` (route + role + capability + http client + async producer), meta-annotation walk, `resolve_routes_for_method` / `resolve_http_client_for_method` / `resolve_async_producer_for_method`. |
 | `java_ontology.py` | Source of truth for `VALID_ROLES`, `VALID_CAPABILITIES`, `VALID_CLIENT_KINDS`, `VALID_HTTP_CALL_STRATEGIES`, `VALID_ASYNC_CALL_STRATEGIES`, `VALID_HTTP_CALL_MATCHES`. |
@@ -90,7 +90,7 @@ when needed.
 
 ## Test layout
 
-- `tests/conftest.py` — session-scoped Kuzu graph fixture.
+- `tests/conftest.py` — session-scoped LadybugDB graph fixture.
 - `tests/bank-chat-system/` — deterministic Java corpus (fixture, not production model).
 - `tests/fixtures/call_graph_smoke/` — mini Maven tree calibrated against the call-graph resolver.
 - `tests/fixtures/brownfield_route_stubs/` — `@CodebaseRoute` / `@CodebaseRoutes` source stubs (PR-A3).
@@ -188,7 +188,7 @@ template):
   `VALID_ASYNC_CALL_STRATEGIES`, `VALID_HTTP_CALL_MATCHES`,
   `VALID_ROUTE_FRAMEWORKS`, `VALID_ROUTE_KINDS`, `VALID_PRODUCER_KINDS`,
   `VALID_RESOLVE_REASONS`, `VALID_UNRESOLVED_CALL_REASONS`.
-- Schema changes that affect the Lance index or Kuzu graph need a
+- Schema changes that affect the Lance index or LadybugDB graph need a
   matching update to the README "Re-index required" callout. Bump
   `ontology_version` when enrichment semantics change (currently **17**).
 - Brownfield is a first-class surface: any new auto-detection (route,
@@ -199,10 +199,10 @@ template):
   union when any brownfield layer fires on a method (single network packet
   → single edge). See `plans/completed/PLAN-TIER1B-COMPLETION.md` §
   "Caller-side composition divergence".
-- Kuzu's Python binder rejects `dict` for `MAP` columns. Store all
+- LadybugDB's Python binder rejects `dict` for `MAP` columns. Store all
   map-shaped graph_meta data (`routes_by_framework`, `routes_by_layer`,
   `http_calls_by_strategy`, `async_calls_by_strategy`, etc.) as `STRING`
-  JSON blobs and decode in `kuzu_queries.meta()`.
+  JSON blobs and decode in `ladybug_queries.meta()`.
 - `server.py` is a stdio MCP server: anything reachable from a tool
   handler must not write to **stdout** (that's the JSON-RPC transport).
   Diagnostics go to stderr.
@@ -216,10 +216,10 @@ template):
   support. `BrownfieldOverrides` already holds route, role, capability,
   http client, and async producer dicts — extend it in place.
 
-## Kuzu Cypher pitfalls
+## LadybugDB Cypher pitfalls
 
-When adding or editing Cypher run against Kuzu (for example in
-`kuzu_queries.py`, `mcp_v2.py`, or any `KuzuGraph._rows` caller):
+When adding or editing Cypher run against LadybugDB (for example in
+`ladybug_queries.py`, `mcp_v2.py`, or any `LadybugGraph._rows` caller):
 
 - **Do not filter relationship types with** `label(e) IN $list` **or**
   `label(e) IN ["A","B"]` **in** `WHERE`. On supported versions this can
@@ -252,7 +252,7 @@ When adding or editing Cypher run against Kuzu (for example in
   ```bash
   rm -rf /tmp/check && .venv/bin/python build_ast_graph.py \
     --source-root tests/bank-chat-system \
-    --kuzu-path /tmp/check/code_graph.kuzu --verbose
+    --ladybug-path /tmp/check/code_graph.lbug --verbose
   ```
 
 ## Commit and PR
@@ -289,7 +289,7 @@ When adding or editing Cypher run against Kuzu (for example in
 ## Cursor Cloud specific instructions
 
 This is a self-contained Python project — no external services
-(no Postgres, Kafka, Docker) are needed. All storage (Kuzu, LanceDB,
+(no Postgres, Kafka, Docker) are needed. All storage (LadybugDB, LanceDB,
 CocoIndex state) is embedded/file-based.
 
 ### Environment
@@ -317,12 +317,12 @@ first run. They are not required for normal development.
 
 ### Hello-world verification
 
-Build the Kuzu graph from the test fixture and inspect it:
+Build the LadybugDB graph from the test fixture and inspect it:
 
 ```bash
 rm -rf /tmp/check && .venv/bin/python build_ast_graph.py \
   --source-root tests/bank-chat-system \
-  --kuzu-path /tmp/check/code_graph.kuzu --verbose
+  --ladybug-path /tmp/check/code_graph.lbug --verbose
 .venv/bin/java-codebase-rag meta \
   --source-root tests/bank-chat-system --index-dir /tmp/check
 ```

@@ -2,7 +2,7 @@
 
 A graph-native code intelligence layer for Java microservice estates, exposed to LLM agents via the **Model Context Protocol (MCP)**.
 
-The system extracts a deterministic property graph from Java source (tree-sitter), stores it in **Kuzu** (graph) alongside a **LanceDB** vector index (chunks), and exposes a deliberately small MCP surface — **five tools**: `search`, `find`, `describe`, `neighbors`, `resolve` — that collapse onto three primitive agent operations: **locate**, **inspect**, **walk**.
+The system extracts a deterministic property graph from Java source (tree-sitter), stores it in **LadybugDB** (graph) alongside a **LanceDB** vector index (chunks), and exposes a deliberately small MCP surface — **five tools**: `search`, `find`, `describe`, `neighbors`, `resolve` — that collapse onto three primitive agent operations: **locate**, **inspect**, **walk**.
 
 > **What this MCP is:** a **GPS for code navigation**, not a reasoning engine.
 > Agents use a simple loop:
@@ -21,9 +21,9 @@ For the design rationale, the GPS metaphor, and the full ontology, see [`docs/pa
 
 Generic code-search tools (grep, ctags, vector-only RAG) hit a ceiling on real Java microservice estates: they find files but lose the structure that makes a Spring/JAX-RS system navigable. This project is built around five choices that target that gap.
 
-- **Hybrid RAG + GraphRAG, not either-or.** Semantic recall (LanceDB chunk vectors) and structural navigation (Kuzu property graph) are composed in one surface. `search` finds candidate nodes by meaning; `neighbors` walks the exact edge you care about (`CALLS`, `IMPLEMENTS`, `INJECTS`, `DECLARES_ROUTE`, …). The agent picks the right primitive per step instead of being forced into pure-vector or pure-symbol search.
+- **Hybrid RAG + GraphRAG, not either-or.** Semantic recall (LanceDB chunk vectors) and structural navigation (LadybugDB property graph) are composed in one surface. `search` finds candidate nodes by meaning; `neighbors` walks the exact edge you care about (`CALLS`, `IMPLEMENTS`, `INJECTS`, `EXPOSES`, …). The agent picks the right primitive per step instead of being forced into pure-vector or pure-symbol search.
 
-- **A Java-tuned role model.** Symbols are labelled with stereotypes inferred from Spring and JAX-RS conventions — `CONTROLLER`, `SERVICE`, `REPOSITORY`, `CLIENT`, `PRODUCER`, `MAPPER`, `DTO`. Agents can ask "list controllers" or "who injects this repository" directly, instead of grep-ing for `@RestController` and hoping for the best. Roles drive both filtering (`find` with a `NodeFilter`) and ranking.
+- **A Java-tuned role model.** Symbols are labelled with stereotypes inferred from Spring and JAX-RS conventions — `CONTROLLER`, `SERVICE`, `REPOSITORY`, `COMPONENT`, `CONFIG`, `ENTITY`, `CLIENT`, `MAPPER`, `DTO`. Agents can ask "list controllers" or "who injects this repository" directly, instead of grep-ing for `@RestController` and hoping for the best. Roles drive both filtering (`find` with a `NodeFilter`) and ranking.
 
 - **Ranking specialized for Java codebases.** The composite ranker is aware of role, microservice, and FQN structure — not a generic BM25. A search for `"chat ingress"` surfaces controllers before utility classes; a search scoped to one microservice doesn't drown in matches from the other 19. Defaults are tuned on the bank-chat fixture and exposed in `docs/CONFIGURATION.md` for per-repo overrides.
 
@@ -71,7 +71,7 @@ All indexing lifecycle commands (`init`, `increment`, `reprocess`, `install`, `u
 
 If you prefer manual configuration, see [`docs/JAVA-CODEBASE-RAG-CLI.md`](./docs/JAVA-CODEBASE-RAG-CLI.md) for the full CLI reference.
 
-> **Stability disclaimer.** This package does **not** promise backward compatibility. MCP tool contracts, env vars, Lance/Kuzu schemas, config files, and Python APIs may change without a deprecation period. Track `main` and rebuild indexes when ontology or embedding settings change.
+> **Stability disclaimer.** This package does **not** promise backward compatibility. MCP tool contracts, env vars, Lance/LadybugDB schemas, config files, and Python APIs may change without a deprecation period. Track `main` and rebuild indexes when ontology or embedding settings change.
 
 ---
 
@@ -84,7 +84,7 @@ This repo ships a small multi-module Spring fixture under [`tests/bank-chat-syst
 git clone https://github.com/HumanBean17/java-codebase-rag
 cd java-codebase-rag
 
-# 2. Build the index (Lance vectors + Kuzu graph). First run downloads the
+# 2. Build the index (Lance vectors + LadybugDB graph). First run downloads the
 #    embedding model (~90 MB) and takes ~30-60s on the fixture.
 java-codebase-rag init --source-root tests/bank-chat-system --index-dir /tmp/bank-chat-index
 
@@ -99,7 +99,7 @@ Smoke-test the index with two checks (`search_lancedb` ships with the package):
 JAVA_CODEBASE_RAG_INDEX_DIR=/tmp/bank-chat-index \
   python -m search_lancedb "chat ingress controller" --table java --limit 3
 
-# Vector + graph expansion — proves Kuzu is wired in
+# Vector + graph expansion — proves LadybugDB is wired in
 JAVA_CODEBASE_RAG_INDEX_DIR=/tmp/bank-chat-index \
   python -m search_lancedb "chat ingress controller" --table java --limit 3 \
     --graph-expand --expand-depth 2
@@ -199,8 +199,8 @@ Run `java-codebase-rag --help` to list grouped subcommands. Operator playbook wi
 | Setup | `install` | Interactive setup wizard: config, MCP registration, skill/agent deployment, indexing. |
 | Setup | `update` | Refresh shipped artifacts (skill, agent, MCP entry) + incremental Lance/graph catch-up after pip upgrade. |
 | Lifecycle | `init` | First-time index. Refuses if artifacts already exist. |
-| Lifecycle | `increment` | CocoIndex catch-up + incremental Kuzu update. `--vectors-only` for Lance only. |
-| Lifecycle | `reprocess` | Full Lance + Kuzu rebuild. `--vectors-only` / `--graph-only` for a single phase. |
+| Lifecycle | `increment` | CocoIndex catch-up + incremental LadybugDB update. `--vectors-only` for Lance only. |
+| Lifecycle | `reprocess` | Full Lance + LadybugDB rebuild. `--vectors-only` / `--graph-only` for a single phase. |
 | Lifecycle | `erase` | Delete index artifacts. Requires `--yes` or TTY confirm. |
 | Introspection | `meta`, `tables`, `diagnose-ignore`, `unresolved-calls` | Health, table listing, ignore-layer diagnostics, receiver-failure call sites. |
 | Analysis | `analyze-pr` | Blast-radius / risk from a unified diff. |
@@ -235,7 +235,7 @@ python3 -m venv .venv
 
 The `cocoindex` package powers lifecycle commands that run the indexer (`init`, `increment`, `reprocess`, `erase`). Search and MCP navigation do not invoke it directly.
 
-The default embedding model is `sentence-transformers/all-MiniLM-L6-v2` (downloaded on first `init`). Override via the `EMBEDDING_MODEL` env var — see [`docs/CONFIGURATION.md` §1](./docs/CONFIGURATION.md#1-environment-variables).
+The default embedding model is `sentence-transformers/all-MiniLM-L6-v2` (downloaded on first `init`). Override via the `SBERT_MODEL` env var — see [`docs/CONFIGURATION.md` §1](./docs/CONFIGURATION.md#1-environment-variables).
 
 ---
 

@@ -1565,62 +1565,20 @@ def _parse_codebase_http_route_inner_annotation(
     return out
 
 
-def _codebase_route_inner_annotation_nodes(container_ann: Node, src: bytes) -> list[Node]:
-    found: list[Node] = []
-
-    def visit(n: Node) -> None:
-        if n.type == "annotation":
-            name_node = n.child_by_field_name("name")
-            n_simple = _txt(name_node, src).rsplit(".", 1)[-1] if name_node is not None else ""
-            if n_simple == "CodebaseHttpRoute":
-                found.append(n)
-        for c in n.children:
-            visit(c)
-
-    visit(container_ann)
-    return found
-
-
-def _codebase_async_route_inner_annotation_nodes(container_ann: Node, src: bytes) -> list[Node]:
-    found: list[Node] = []
-
-    def visit(n: Node) -> None:
-        if n.type == "annotation":
-            name_node = n.child_by_field_name("name")
-            n_simple = _txt(name_node, src).rsplit(".", 1)[-1] if name_node is not None else ""
-            if n_simple == "CodebaseAsyncRoute":
-                found.append(n)
-        for c in n.children:
-            visit(c)
-
-    visit(container_ann)
-    return found
-
+def _inner_annotation_nodes(container_ann: Node, src: bytes, target_simple: str) -> list[Node]:
+    """Collect nested ``@<target_simple>`` annotations anywhere under ``container_ann``.
 
-def _codebase_http_client_inner_annotation_nodes(container_ann: Node, src: bytes) -> list[Node]:
-    found: list[Node] = []
-
-    def visit(n: Node) -> None:
-        if n.type == "annotation":
-            name_node = n.child_by_field_name("name")
-            n_simple = _txt(name_node, src).rsplit(".", 1)[-1] if name_node is not None else ""
-            if n_simple == "CodebaseHttpClient":
-                found.append(n)
-        for c in n.children:
-            visit(c)
-
-    visit(container_ann)
-    return found
-
-
-def _codebase_producer_inner_annotation_nodes(container_ann: Node, src: bytes) -> list[Node]:
+    Shared by the four brownfield container walkers — ``CodebaseHttpRoute``,
+    ``CodebaseAsyncRoute``, ``CodebaseHttpClient``, ``CodebaseProducer`` — which
+    differ only by the target annotation simple name.
+    """
     found: list[Node] = []
 
     def visit(n: Node) -> None:
         if n.type == "annotation":
             name_node = n.child_by_field_name("name")
             n_simple = _txt(name_node, src).rsplit(".", 1)[-1] if name_node is not None else ""
-            if n_simple == "CodebaseProducer":
+            if n_simple == target_simple:
                 found.append(n)
         for c in n.children:
             visit(c)
@@ -1842,7 +1800,7 @@ def _outgoing_calls_from_codebase_http_client_producer_annotations(
                 ),
             )
         elif simple == "CodebaseHttpClients":
-            for inner in _codebase_http_client_inner_annotation_nodes(ann, src):
+            for inner in _inner_annotation_nodes(ann, src, "CodebaseHttpClient"):
                 out.append(
                     _parse_codebase_http_client_annotation(
                         inner,
@@ -1869,7 +1827,7 @@ def _outgoing_calls_from_codebase_http_client_producer_annotations(
                 ),
             )
         elif simple == "CodebaseProducers":
-            for inner in _codebase_producer_inner_annotation_nodes(ann, src):
+            for inner in _inner_annotation_nodes(ann, src, "CodebaseProducer"):
                 out.append(
                     _parse_codebase_producer_annotation(
                         inner,
@@ -2343,7 +2301,7 @@ def _collect_routes(
                 ),
             )
         elif simple == "CodebaseHttpRoutes":
-            for inner in _codebase_route_inner_annotation_nodes(node, src):
+            for inner in _inner_annotation_nodes(node, src, "CodebaseHttpRoute"):
                 routes.extend(
                     _parse_codebase_http_route_inner_annotation(
                         inner,
@@ -2359,7 +2317,7 @@ def _collect_routes(
         elif simple in ("CodebaseAsyncRoute", "CodebaseAsyncRoutes"):
             nodes = [node]
             if simple == "CodebaseAsyncRoutes":
-                nodes = list(_codebase_async_route_inner_annotation_nodes(node, src))
+                nodes = list(_inner_annotation_nodes(node, src, "CodebaseAsyncRoute"))
             for ann in nodes:
                 pairs, _ = _annotation_kv_nodes(ann, src)
                 topic_node = pairs.get("topic")

@@ -2010,8 +2010,21 @@ def _producer_id(
     return f"p:{hashlib.sha1(key.encode()).hexdigest()[:16]}"
 
 
+# The four brownfield source layers — single source of truth. Consumed by the
+# client/producer source-layer classifiers, the *_from_brownfield_pct stats
+# (via brownfield_strategies), and the brownfield_only authoritativeness gate in
+# _is_brownfield_sourced. codebase_client/codebase_producer are caller-side
+# declaration strategies, not layers — they extend brownfield_strategies only.
+_BROWNFIELD_LAYERS = frozenset({
+    "layer_a_meta",
+    "layer_b_ann",
+    "layer_b_fqn",
+    "layer_c_source",
+})
+
+
 def _client_source_layer(strategy: str) -> str:
-    if strategy in {"layer_a_meta", "layer_b_ann", "layer_b_fqn", "layer_c_source"}:
+    if strategy in _BROWNFIELD_LAYERS:
         return strategy
     # Some caller extraction paths emit client kind as strategy; treat those
     # as builtin-source declarations instead of warning on every row.
@@ -2023,7 +2036,7 @@ def _client_source_layer(strategy: str) -> str:
 
 
 def _producer_source_layer(strategy: str) -> str:
-    if strategy in {"layer_a_meta", "layer_b_ann", "layer_b_fqn", "layer_c_source"}:
+    if strategy in _BROWNFIELD_LAYERS:
         return strategy
     if strategy in VALID_PRODUCER_KINDS:
         return "builtin"
@@ -2458,15 +2471,14 @@ def _phantom_async_route_id(call: OutgoingCallDecl) -> str:
         tables.producer_stats.producers_by_kind = defaultdict(int)
         for row in tables.producer_rows:
             tables.producer_stats.producers_by_kind[row.producer_kind] += 1
-        brownfield_strategies = frozenset(
-            (
-                "layer_b_ann",
-                "layer_a_meta",
-                "layer_c_source",
-                "layer_b_fqn",
-                "codebase_client",
-                "codebase_producer",
-            ),
+        # brownfield_strategies = the four brownfield layers plus the two
+        # caller-side declaration strategies (@CodebaseHttpClient /
+        # @CodebaseProducer). These extend _BROWNFIELD_LAYERS deliberately:
+        # the *_from_brownfield_pct stats count annotation-declared callers as
+        # brownfield-sourced even though they are not "layers" and so do not
+        # gate brownfield_only authoritativeness in _is_brownfield_sourced.
+        brownfield_strategies = _BROWNFIELD_LAYERS | frozenset(
+            {"codebase_client", "codebase_producer"},
         )
         if tables.call_edge_stats.http_calls_total:
             n_http = sum(
@@ -2568,14 +2580,6 @@ def _match_call_edge(
     return "cross_service", candidates
 
 
-_BROWNFIELD_LAYERS = frozenset({
-    "layer_c_source",
-    "layer_b_ann",
-    "layer_b_fqn",
-    "layer_a_meta",
-})
-
-
 def _is_brownfield_sourced(
     call_strategy: str,
     candidates: list[RouteRow],