refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77)#78
Merged
Conversation
…77) Replace recursive _process_level with a two-phase iterative model: - Phase A (_phase_a_resolve): top-down, level-by-level resolve_* with asyncio.gather per level. Immediate setattr removes the dedup-by-None hack in child collection. - Phase B (_phase_b_prepare_collectors + _phase_b_execute_posts): top-down collector prep, then bottom-up post_* with asyncio.gather per level (sync fast path when no async post_* present at the level). Fixes #77 items A1 (post_* serialization 3-5x), A2 (delayed setattr), B3 (misleading _bfs_post_and_set name), C1 (RecursionError on deep trees — 2000-level chains now resolve cleanly). Tier 1+2 cleanup on the same surface: - B1: _build_injected_params helper collapses context/parent/ancestor /loader injection shared between resolve and post methods - B2: _LevelState 4-tuple -> _LevelNode dataclass - B4: _MISSING sentinel disambiguates "not computed" from cached None / empty list in _get_traversable_fields - C2: TypeVar bound to BaseModel | list | tuple - C3: comment explaining id(node) safety across Phase A->B window - D1: single await replaces while-isawaitable loop - D2: _dto_cls_cache / _traversable_fields_cache promoted to class-level - D3: _orm_to_dto is_custom branch extracted out of list comprehension - D4: scan_send_to_fields normalizes str -> 1-tuple at exit, dropping the per-field isinstance check in the SendTo hot loop Tests: - tests/test_resolver.py: + test_post_concurrent_execution, test_resolve_overwriting_field_does_not_double_traverse - tests/test_resolver_deep.py (new): 2000-level chain without RecursionError + correctness walk on a 50-level chain Bench (benchmarks/bench_resolver.py): - L5 (async post_* heavy): -85% to -90% vs serial baseline - L6 (deep tree): passes at 100/500/1000/1500/2000 levels - L1-L4 held within +/-10% of master baseline (benchmarks/baseline_pre_refactor.txt captured for reproducibility) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Carries the lint fix forward so this PR's CI is green. The Union form was introduced in 911f3a2 (master) and currently fails ruff UP007 on master as well. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"pre_refactor" is a transient label that loses meaning once the refactor lands and the next iteration ships its own baseline. Anchoring on the version tag (master HEAD when the snapshot was captured) keeps the file self-describing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… snapshot Three changes addressing Large-scale (1000-task) regressions surfaced after the two-phase refactor: 1. Cache scan_expose_fields / scan_send_to_fields results on _ClassMeta (new expose_map / send_to_map fields). Previously _init_level_nodes called scan_expose_fields once per node and _collect_send_to_for_node called scan_send_to_fields once per node — both were dict lookups on a cache, but the function-call + cache-lookup overhead dominated at 1k+ nodes. Now both are attribute reads on the already-fetched meta. 2. Shared _EMPTY_SNAPSHOT sentinel for _LevelNode.collector_snapshot default. Classes without Collectors (the common case for L1/L2) no longer allocate a fresh empty dict per node. Matters at 1k+ nodes where allocation cost dominates. 3. _phase_b_prepare_collectors: when a node has no own Collectors AND its parent's snapshot is the sentinel, inherit by reference instead of allocating dict(parent_snapshot). For trees with zero Collectors anywhere (L1/L2), the entire phase is now ~free. 4. _LevelNode gets slots=True — smaller per-instance memory and faster attribute access, both of which matter when the level has 1k+ nodes. 5. Drop scan_send_to_fields import where no longer used at module top-level (still imported by _get_class_meta). Bench impact on Large scale (50 users, 20 sprints, 1000 tasks), median of 3 runs, nexusx absolute timings: Scenario | master | pre-opt | post-opt | Δ vs master ---------|----------|----------|----------|------------ L1 | 422us | 383us | 384us | -9% (noise) L2 | 12.84ms | 17.04ms | 16.95ms | +32% (structural) L3 | 20.50ms | 24.18ms | 21.75ms | +6% (noise) L4 | 24.57ms | 27.00ms | 26.11ms | +6% (noise) L5 | n/a | new | 28.50ms | -91% vs serial L3/L4 are back within noise of master. L2 still carries a structural cost: the iterative + phase-split design has more Python frames per node than master's recursive inlined _process_level. Closing that fully would mean re-inlining Phase A into a single function (losing the readability win) or giving up the iterative structure (losing C1 / deep-tree support). The 30% L2 Large regression is the price of the correctness + readability + RecursionError fix. Tests: 193 passing across resolver/deep/autoload/context/relationship/ subset/fixes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Profiling L2 Large (1000 TaskSummary nodes) showed _LevelNode dataclass
construction was 0.29ms per iteration vs 0.10ms for plain tuple — a 2.4x
overhead from Python __init__ + descriptor-based __setattr__ vs C-level
tuple packing. That single difference accounted for ~85% of the L2
regression vs master.
Revert B2's dataclass and define _LevelNode as a typed tuple alias:
_LevelNode = tuple[_WorkItem, _ClassMeta, dict, dict]
Consumer sites use positional unpacking — same readability as named
attributes:
for item, meta, ctx, snap in level: ...
Side effects of the switch:
- _phase_b_prepare_collectors rewrote from in-place attribute mutation
(ln.collector_snapshot = ...) to lazy per-level tuple rebuild. Skips
allocation entirely when no class declares Collectors (L1/L2 fast path).
- _batch_auto_load had a latent bug surface: the outer-loop new_ctx
capture was wrong for grouped entries (which span multiple ln's).
Fixed by re-unpacking per entry inside the inner loop. The dataclass
version hid this because ln.new_ancestor_context always read the right
attribute; tuple unpacking via captured local exposed the issue.
- Dropped the now-unused _collect_send_to_for_node helper; its body is
inlined into _phase_b_execute_posts (single consumer).
Bench impact, median of 3 runs, Large scale (1000 tasks):
Scenario | master | dataclass | tuple | tuple vs master
---------|----------|-----------|----------|----------------
L1 | 422us | 383us | 377us | -11%
L2 | 12.84ms | 17.04ms | 14.65ms | +14% (was +33%)
L3 | 20.50ms | 24.18ms | 20.68ms | +1% (was +18%)
L4 | 24.57ms | 27.00ms | 24.43ms | -1% (was +10%)
L5 | n/a | 30.24ms | 26.62ms | -92% vs serial
L3/L4 back to noise parity. L2 still carries ~14% structural overhead
(Phase A/B function-call setup + the level_has_send_to precheck loops),
but the worst of the regression is gone. Resolve-only timings on L2:
master 0.55ms → refactor 0.62ms (+13%), down from dataclass 0.77ms (+40%).
Trade-off: tuple indices [0]/[1]/[2]/[3] are less self-documenting than
named attributes. Mitigated by consistent unpack-into-locals at the top
of every consumer loop.
Tests: 935 passing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_get_traversable_fields() previously only recognized bare `list[X]` and bare `X` annotations, silently skipping `X | None`, `Optional[X]`, `list[X] | None`, and `Annotated[X, ...]` populated child fields. This meant the most natural way to spell a nullable relation (`child: ChildDTO | None`) caused the child's resolve_*/post_* methods to never execute — a footgun with no error signal. Refactored the typing-unwrap logic into a single helper `_extract_dto_cls_and_cardinality()` returning `(dto_cls, is_list)`. Both `_do_extract_dto_cls` (used by auto-load) and `_get_traversable_fields` (used by pre-existing-children traversal) now share this helper, eliminating the dual-semantics drift flagged in the issue #77 review. Also added explanatory comments for three tradeoffs the review flagged: - String annotations in `_do_extract_dto_cls` return None (silent degradation when model_rebuild() hasn't been called). - `_orm_to_dto` drops None values intentionally (DB NULL ≡ field absent for read-side projections). - `resolve()` clears caches at entry (Resolver is request-scoped by convention). Added 6 regression tests covering each typing shape plus a direct unit test of the helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restructures
Resolvercore from recursive_process_levelto a two-phase iterative model, addressing items A1 / A2 / B3 / C1 from #77 in a single structural change, plus Tier 1+2 polish on the same surface.Structural refactor (Phase A + Phase B)
Mirror of the reference implementation in
~/pydantic-resolve/pydantic_resolve/resolver.py:391-487:_phase_a_resolve): top-down, level-by-levelresolve_*withasyncio.gatherper level. Immediateobject.__setattr__removes the dedup-by-None hack that previously forced_collect_existing_childrento scan for still-None fields._phase_b_prepare_collectors+_phase_b_execute_posts): top-down collector prep, then bottom-uppost_*withasyncio.gatherper level (sync fast path when no asyncpost_*is present at the level — avoids Task-creation overhead on the common case).Tier 1+2 cleanup
post_*at the same level run concurrently viaasyncio.gather_do_resolvesetattr happens immediately, no deferred assignment_build_injected_paramshelper (collapses context/parent/ancestor/loader injection shared between resolve and post)_LevelState4-tuple →_LevelNodedataclass (slots=True)_bfs_post_and_set(was recursive DFS)_MISSINGsentinel disambiguates "not computed" from cachedNone/[]in_get_traversable_fieldslevels: list[list[_LevelNode]]— no Python recursion per BFS levelT = TypeVar("T", bound=BaseModel | list | tuple)id(node)safety across Phase A→B windowawaitreplaceswhile isawaitableloop_dto_cls_cache/_traversable_fields_cachepromoted to class-level attributes_orm_to_dtois_custombranch extracted out of the list comprehensionscan_send_to_fieldsnormalizesstr→ 1-tuple at exit, drops the per-fieldisinstancecheck in the SendTo hot loopLarge-scale perf follow-up
After landing the refactor I noticed L2/L3/L4 Large (1000-task) scenarios had regressed 10-33% vs master. Three optimizations in
abc864f:scan_expose_fields/scan_send_to_fieldsresults onto_ClassMeta(eliminates per-node function calls in_init_level_nodesand_collect_send_to_for_node)_EMPTY_SNAPSHOTsentinel for_LevelNode.collector_snapshot— no per-node dict allocation when no class in the tree uses Collectors_phase_b_prepare_collectorsshort-circuits to inherit-by-reference when neither self nor parent has Collectors_LevelNodeusesslots=Truefor smaller per-instance memory + faster attribute accessResult on Large scale (median of 3 runs, nexusx absolute timings):
L2 Large still carries a structural cost — the iterative + phase-split design has more Python frames per node than master's recursive inlined
_process_level. Closing that fully would mean re-inlining Phase A into one function (losing the readability win) or going back to recursion (losing C1 / deep-tree support). The 30% L2 Large regression is the accepted trade-off for the L5 win, deep-tree support, and cleaner code.Drive-by fix
subset.py:612Union[X, None]→X | None(ruff UP007). The Union form was introduced in 911f3a2 on master and currently fails CI on master as well; carrying the fix forward here.Deferred (Tier 3)
_get_traversable_fieldsstill doesn't handle single-valueBaseModel | Nonefields (pre-existing bug, isolated in tests)SendToInfo.collector_namepublic type remainsstr | tuplefor API compatibilityTest plan
uv run pytest --no-cov -q— 935 passed, 6 skippeduv run ruff check src/— cleantests/test_resolver_deep.py(new): 2000-level chain resolves withoutRecursionError; 50-level correctness walktests/test_resolver.py:test_post_concurrent_execution(10 nodes × 3 async post_* × 5ms completes <300ms vs 1.5s serial),test_resolve_overwriting_field_does_not_double_traversetests/test_subset.py/tests/test_fixes.py: SendTo assertions updated for tuple normalizationtest_resolver / test_autoload / test_context / test_relationship / test_subset / test_query_executorpass as regression netBenchmark (
benchmarks/bench_resolver.py)RecursionErrorat depth 1000benchmarks/baseline_v2.10.1.txtcaptured against masterv2.10.1for reproducibility — filename is version-anchored so future baselines (e.g.baseline_v2.11.0.txt) compose cleanly instead of colliding with a transient "pre_refactor" label.Closes #77 (Tier 1 + Tier 2 items).
🤖 Generated with Claude Code