refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77) by allmonday · Pull Request #78 · KLR-Pattern/nexusx

allmonday · 2026-06-17T00:10:11Z

Summary

Restructures Resolver core from recursive _process_level to a two-phase iterative model, addressing items A1 / A2 / B3 / C1 from #77 in a single structural change, plus Tier 1+2 polish on the same surface.

Structural refactor (Phase A + Phase B)

Mirror of the reference implementation in ~/pydantic-resolve/pydantic_resolve/resolver.py:391-487:

Phase A (_phase_a_resolve): top-down, level-by-level resolve_* with asyncio.gather per level. Immediate object.__setattr__ removes the dedup-by-None hack that previously forced _collect_existing_children to scan for still-None fields.
Phase B (_phase_b_prepare_collectors + _phase_b_execute_posts): top-down collector prep, then bottom-up post_* with asyncio.gather per level (sync fast path when no async post_* is present at the level — avoids Task-creation overhead on the common case).

Tier 1+2 cleanup

Item	Change
A1	`post_*` at the same level run concurrently via `asyncio.gather`
A2	`_do_resolve` setattr happens immediately, no deferred assignment
B1	Extract `_build_injected_params` helper (collapses context/parent/ancestor/loader injection shared between resolve and post)
B2	`_LevelState` 4-tuple → `_LevelNode` dataclass (slots=True)
B3	Delete misleading `_bfs_post_and_set` (was recursive DFS)
B4	`_MISSING` sentinel disambiguates "not computed" from cached `None` / `[]` in `_get_traversable_fields`
C1	Iterative `levels: list[list[_LevelNode]]` — no Python recursion per BFS level
C2	`T = TypeVar("T", bound=BaseModel \| list \| tuple)`
C3	Comment explaining `id(node)` safety across Phase A→B window
D1	Single `await` replaces `while isawaitable` loop
D2	`_dto_cls_cache` / `_traversable_fields_cache` promoted to class-level attributes
D3	`_orm_to_dto` `is_custom` branch extracted out of the list comprehension
D4	`scan_send_to_fields` normalizes `str` → 1-tuple at exit, drops the per-field `isinstance` check in the SendTo hot loop

Large-scale perf follow-up

After landing the refactor I noticed L2/L3/L4 Large (1000-task) scenarios had regressed 10-33% vs master. Three optimizations in abc864f:

Cache scan_expose_fields / scan_send_to_fields results onto _ClassMeta (eliminates per-node function calls in _init_level_nodes and _collect_send_to_for_node)
Shared _EMPTY_SNAPSHOT sentinel for _LevelNode.collector_snapshot — no per-node dict allocation when no class in the tree uses Collectors
_phase_b_prepare_collectors short-circuits to inherit-by-reference when neither self nor parent has Collectors
_LevelNode uses slots=True for smaller per-instance memory + faster attribute access

Result on Large scale (median of 3 runs, nexusx absolute timings):

Scenario	master	pre-opt refactor	post-opt refactor	Δ vs master
L1	422us	383us	384us	within noise
L2	12.84ms	17.04ms	16.95ms	+32% (structural)
L3	20.50ms	24.18ms	21.75ms	within noise
L4	24.57ms	27.00ms	26.11ms	within noise
L5	n/a (new)	new	28.50ms	-91% vs Pydantic serial

L2 Large still carries a structural cost — the iterative + phase-split design has more Python frames per node than master's recursive inlined _process_level. Closing that fully would mean re-inlining Phase A into one function (losing the readability win) or going back to recursion (losing C1 / deep-tree support). The 30% L2 Large regression is the accepted trade-off for the L5 win, deep-tree support, and cleaner code.

Drive-by fix

subset.py:612 Union[X, None] → X | None (ruff UP007). The Union form was introduced in 911f3a2 on master and currently fails CI on master as well; carrying the fix forward here.

Deferred (Tier 3)

_get_traversable_fields still doesn't handle single-value BaseModel | None fields (pre-existing bug, isolated in tests)
SendToInfo.collector_name public type remains str | tuple for API compatibility
L2 Large structural overhead — would need to re-inline Phase A to close fully

Test plan

uv run pytest --no-cov -q — 935 passed, 6 skipped
uv run ruff check src/ — clean
tests/test_resolver_deep.py (new): 2000-level chain resolves without RecursionError; 50-level correctness walk
tests/test_resolver.py: test_post_concurrent_execution (10 nodes × 3 async post_* × 5ms completes <300ms vs 1.5s serial), test_resolve_overwriting_field_does_not_double_traverse
tests/test_subset.py / tests/test_fixes.py: SendTo assertions updated for tuple normalization
All existing tests in test_resolver / test_autoload / test_context / test_relationship / test_subset / test_query_executor pass as regression net
CI green on Python 3.10 / 3.11 / 3.12

Benchmark (`benchmarks/bench_resolver.py`)

Scenario	Before	After	Δ
L1-L4 Small/Medium	baseline	baseline ± noise	flat
L1/L3/L4 Large	baseline	baseline ± 10%	within noise
L2 Large	baseline	+30%	structural cost (see above)
L5 (async post_* heavy, 20 sprints × 3 post × 5ms)	~300ms	~30ms	-85% to -91%
L6 (deep tree, recursionlimit=1000)	`RecursionError` at depth 1000	passes at 100/500/1000/1500/2000	works

benchmarks/baseline_v2.10.1.txt captured against master v2.10.1 for reproducibility — filename is version-anchored so future baselines (e.g. baseline_v2.11.0.txt) compose cleanly instead of colliding with a transient "pre_refactor" label.

Closes #77 (Tier 1 + Tier 2 items).

🤖 Generated with Claude Code

…77) Replace recursive _process_level with a two-phase iterative model: - Phase A (_phase_a_resolve): top-down, level-by-level resolve_* with asyncio.gather per level. Immediate setattr removes the dedup-by-None hack in child collection. - Phase B (_phase_b_prepare_collectors + _phase_b_execute_posts): top-down collector prep, then bottom-up post_* with asyncio.gather per level (sync fast path when no async post_* present at the level). Fixes #77 items A1 (post_* serialization 3-5x), A2 (delayed setattr), B3 (misleading _bfs_post_and_set name), C1 (RecursionError on deep trees — 2000-level chains now resolve cleanly). Tier 1+2 cleanup on the same surface: - B1: _build_injected_params helper collapses context/parent/ancestor /loader injection shared between resolve and post methods - B2: _LevelState 4-tuple -> _LevelNode dataclass - B4: _MISSING sentinel disambiguates "not computed" from cached None / empty list in _get_traversable_fields - C2: TypeVar bound to BaseModel | list | tuple - C3: comment explaining id(node) safety across Phase A->B window - D1: single await replaces while-isawaitable loop - D2: _dto_cls_cache / _traversable_fields_cache promoted to class-level - D3: _orm_to_dto is_custom branch extracted out of list comprehension - D4: scan_send_to_fields normalizes str -> 1-tuple at exit, dropping the per-field isinstance check in the SendTo hot loop Tests: - tests/test_resolver.py: + test_post_concurrent_execution, test_resolve_overwriting_field_does_not_double_traverse - tests/test_resolver_deep.py (new): 2000-level chain without RecursionError + correctness walk on a 50-level chain Bench (benchmarks/bench_resolver.py): - L5 (async post_* heavy): -85% to -90% vs serial baseline - L6 (deep tree): passes at 100/500/1000/1500/2000 levels - L1-L4 held within +/-10% of master baseline (benchmarks/baseline_pre_refactor.txt captured for reproducibility) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Carries the lint fix forward so this PR's CI is green. The Union form was introduced in 911f3a2 (master) and currently fails ruff UP007 on master as well. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

"pre_refactor" is a transient label that loses meaning once the refactor lands and the next iteration ships its own baseline. Anchoring on the version tag (master HEAD when the snapshot was captured) keeps the file self-describing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… snapshot Three changes addressing Large-scale (1000-task) regressions surfaced after the two-phase refactor: 1. Cache scan_expose_fields / scan_send_to_fields results on _ClassMeta (new expose_map / send_to_map fields). Previously _init_level_nodes called scan_expose_fields once per node and _collect_send_to_for_node called scan_send_to_fields once per node — both were dict lookups on a cache, but the function-call + cache-lookup overhead dominated at 1k+ nodes. Now both are attribute reads on the already-fetched meta. 2. Shared _EMPTY_SNAPSHOT sentinel for _LevelNode.collector_snapshot default. Classes without Collectors (the common case for L1/L2) no longer allocate a fresh empty dict per node. Matters at 1k+ nodes where allocation cost dominates. 3. _phase_b_prepare_collectors: when a node has no own Collectors AND its parent's snapshot is the sentinel, inherit by reference instead of allocating dict(parent_snapshot). For trees with zero Collectors anywhere (L1/L2), the entire phase is now ~free. 4. _LevelNode gets slots=True — smaller per-instance memory and faster attribute access, both of which matter when the level has 1k+ nodes. 5. Drop scan_send_to_fields import where no longer used at module top-level (still imported by _get_class_meta). Bench impact on Large scale (50 users, 20 sprints, 1000 tasks), median of 3 runs, nexusx absolute timings: Scenario | master | pre-opt | post-opt | Δ vs master ---------|----------|----------|----------|------------ L1 | 422us | 383us | 384us | -9% (noise) L2 | 12.84ms | 17.04ms | 16.95ms | +32% (structural) L3 | 20.50ms | 24.18ms | 21.75ms | +6% (noise) L4 | 24.57ms | 27.00ms | 26.11ms | +6% (noise) L5 | n/a | new | 28.50ms | -91% vs serial L3/L4 are back within noise of master. L2 still carries a structural cost: the iterative + phase-split design has more Python frames per node than master's recursive inlined _process_level. Closing that fully would mean re-inlining Phase A into a single function (losing the readability win) or giving up the iterative structure (losing C1 / deep-tree support). The 30% L2 Large regression is the price of the correctness + readability + RecursionError fix. Tests: 193 passing across resolver/deep/autoload/context/relationship/ subset/fixes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Profiling L2 Large (1000 TaskSummary nodes) showed _LevelNode dataclass construction was 0.29ms per iteration vs 0.10ms for plain tuple — a 2.4x overhead from Python __init__ + descriptor-based __setattr__ vs C-level tuple packing. That single difference accounted for ~85% of the L2 regression vs master. Revert B2's dataclass and define _LevelNode as a typed tuple alias: _LevelNode = tuple[_WorkItem, _ClassMeta, dict, dict] Consumer sites use positional unpacking — same readability as named attributes: for item, meta, ctx, snap in level: ... Side effects of the switch: - _phase_b_prepare_collectors rewrote from in-place attribute mutation (ln.collector_snapshot = ...) to lazy per-level tuple rebuild. Skips allocation entirely when no class declares Collectors (L1/L2 fast path). - _batch_auto_load had a latent bug surface: the outer-loop new_ctx capture was wrong for grouped entries (which span multiple ln's). Fixed by re-unpacking per entry inside the inner loop. The dataclass version hid this because ln.new_ancestor_context always read the right attribute; tuple unpacking via captured local exposed the issue. - Dropped the now-unused _collect_send_to_for_node helper; its body is inlined into _phase_b_execute_posts (single consumer). Bench impact, median of 3 runs, Large scale (1000 tasks): Scenario | master | dataclass | tuple | tuple vs master ---------|----------|-----------|----------|---------------- L1 | 422us | 383us | 377us | -11% L2 | 12.84ms | 17.04ms | 14.65ms | +14% (was +33%) L3 | 20.50ms | 24.18ms | 20.68ms | +1% (was +18%) L4 | 24.57ms | 27.00ms | 24.43ms | -1% (was +10%) L5 | n/a | 30.24ms | 26.62ms | -92% vs serial L3/L4 back to noise parity. L2 still carries ~14% structural overhead (Phase A/B function-call setup + the level_has_send_to precheck loops), but the worst of the regression is gone. Resolve-only timings on L2: master 0.55ms → refactor 0.62ms (+13%), down from dataclass 0.77ms (+40%). Trade-off: tuple indices [0]/[1]/[2]/[3] are less self-documenting than named attributes. Mitigated by consistent unpack-into-locals at the top of every consumer loop. Tests: 935 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

_get_traversable_fields() previously only recognized bare `list[X]` and bare `X` annotations, silently skipping `X | None`, `Optional[X]`, `list[X] | None`, and `Annotated[X, ...]` populated child fields. This meant the most natural way to spell a nullable relation (`child: ChildDTO | None`) caused the child's resolve_*/post_* methods to never execute — a footgun with no error signal. Refactored the typing-unwrap logic into a single helper `_extract_dto_cls_and_cardinality()` returning `(dto_cls, is_list)`. Both `_do_extract_dto_cls` (used by auto-load) and `_get_traversable_fields` (used by pre-existing-children traversal) now share this helper, eliminating the dual-semantics drift flagged in the issue #77 review. Also added explanatory comments for three tradeoffs the review flagged: - String annotations in `_do_extract_dto_cls` return None (silent degradation when model_rebuild() hasn't been called). - `_orm_to_dto` drops None values intentionally (DB NULL ≡ field absent for read-side projections). - `resolve()` clears caches at entry (Resolver is request-scoped by convention). Added 6 regression tests covering each typing shape plus a direct unit test of the helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

allmonday and others added 8 commits June 17, 2026 08:09

fix(subset): convert Union[X, None] to X | None for ruff UP007

ebe10ed

Carries the lint fix forward so this PR's CI is green. The Union form was introduced in 911f3a2 (master) and currently fails ruff UP007 on master as well. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chore(bench): refresh baseline_v2.10.1.txt with current master run

2b2c47f

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chore(bench): add master vs refactor comparison to baseline_v2.10.1.txt

9b6b404

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

allmonday merged commit e9b3dd5 into master Jun 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77)#78

refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77)#78
allmonday merged 8 commits into
masterfrom
refactor/resolver-two-phase

allmonday commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

allmonday commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structural refactor (Phase A + Phase B)

Tier 1+2 cleanup

Large-scale perf follow-up

Drive-by fix

Deferred (Tier 3)

Test plan

Benchmark (benchmarks/bench_resolver.py)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

allmonday commented Jun 17, 2026 •

edited

Loading

Benchmark (`benchmarks/bench_resolver.py`)