Skip to content

refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77)#78

Merged
allmonday merged 8 commits into
masterfrom
refactor/resolver-two-phase
Jun 17, 2026
Merged

refactor(resolver): two-phase iterative traversal + Tier 1+2 cleanup (#77)#78
allmonday merged 8 commits into
masterfrom
refactor/resolver-two-phase

Conversation

@allmonday

@allmonday allmonday commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Restructures Resolver core from recursive _process_level to a two-phase iterative model, addressing items A1 / A2 / B3 / C1 from #77 in a single structural change, plus Tier 1+2 polish on the same surface.

Structural refactor (Phase A + Phase B)

Mirror of the reference implementation in ~/pydantic-resolve/pydantic_resolve/resolver.py:391-487:

  • Phase A (_phase_a_resolve): top-down, level-by-level resolve_* with asyncio.gather per level. Immediate object.__setattr__ removes the dedup-by-None hack that previously forced _collect_existing_children to scan for still-None fields.
  • Phase B (_phase_b_prepare_collectors + _phase_b_execute_posts): top-down collector prep, then bottom-up post_* with asyncio.gather per level (sync fast path when no async post_* is present at the level — avoids Task-creation overhead on the common case).

Tier 1+2 cleanup

Item Change
A1 post_* at the same level run concurrently via asyncio.gather
A2 _do_resolve setattr happens immediately, no deferred assignment
B1 Extract _build_injected_params helper (collapses context/parent/ancestor/loader injection shared between resolve and post)
B2 _LevelState 4-tuple → _LevelNode dataclass (slots=True)
B3 Delete misleading _bfs_post_and_set (was recursive DFS)
B4 _MISSING sentinel disambiguates "not computed" from cached None / [] in _get_traversable_fields
C1 Iterative levels: list[list[_LevelNode]] — no Python recursion per BFS level
C2 T = TypeVar("T", bound=BaseModel | list | tuple)
C3 Comment explaining id(node) safety across Phase A→B window
D1 Single await replaces while isawaitable loop
D2 _dto_cls_cache / _traversable_fields_cache promoted to class-level attributes
D3 _orm_to_dto is_custom branch extracted out of the list comprehension
D4 scan_send_to_fields normalizes str → 1-tuple at exit, drops the per-field isinstance check in the SendTo hot loop

Large-scale perf follow-up

After landing the refactor I noticed L2/L3/L4 Large (1000-task) scenarios had regressed 10-33% vs master. Three optimizations in abc864f:

  • Cache scan_expose_fields / scan_send_to_fields results onto _ClassMeta (eliminates per-node function calls in _init_level_nodes and _collect_send_to_for_node)
  • Shared _EMPTY_SNAPSHOT sentinel for _LevelNode.collector_snapshot — no per-node dict allocation when no class in the tree uses Collectors
  • _phase_b_prepare_collectors short-circuits to inherit-by-reference when neither self nor parent has Collectors
  • _LevelNode uses slots=True for smaller per-instance memory + faster attribute access

Result on Large scale (median of 3 runs, nexusx absolute timings):

Scenario master pre-opt refactor post-opt refactor Δ vs master
L1 422us 383us 384us within noise
L2 12.84ms 17.04ms 16.95ms +32% (structural)
L3 20.50ms 24.18ms 21.75ms within noise
L4 24.57ms 27.00ms 26.11ms within noise
L5 n/a (new) new 28.50ms -91% vs Pydantic serial

L2 Large still carries a structural cost — the iterative + phase-split design has more Python frames per node than master's recursive inlined _process_level. Closing that fully would mean re-inlining Phase A into one function (losing the readability win) or going back to recursion (losing C1 / deep-tree support). The 30% L2 Large regression is the accepted trade-off for the L5 win, deep-tree support, and cleaner code.

Drive-by fix

  • subset.py:612 Union[X, None]X | None (ruff UP007). The Union form was introduced in 911f3a2 on master and currently fails CI on master as well; carrying the fix forward here.

Deferred (Tier 3)

  • _get_traversable_fields still doesn't handle single-value BaseModel | None fields (pre-existing bug, isolated in tests)
  • SendToInfo.collector_name public type remains str | tuple for API compatibility
  • L2 Large structural overhead — would need to re-inline Phase A to close fully

Test plan

  • uv run pytest --no-cov -q935 passed, 6 skipped
  • uv run ruff check src/ — clean
  • tests/test_resolver_deep.py (new): 2000-level chain resolves without RecursionError; 50-level correctness walk
  • tests/test_resolver.py: test_post_concurrent_execution (10 nodes × 3 async post_* × 5ms completes <300ms vs 1.5s serial), test_resolve_overwriting_field_does_not_double_traverse
  • tests/test_subset.py / tests/test_fixes.py: SendTo assertions updated for tuple normalization
  • All existing tests in test_resolver / test_autoload / test_context / test_relationship / test_subset / test_query_executor pass as regression net
  • CI green on Python 3.10 / 3.11 / 3.12

Benchmark (benchmarks/bench_resolver.py)

Scenario Before After Δ
L1-L4 Small/Medium baseline baseline ± noise flat
L1/L3/L4 Large baseline baseline ± 10% within noise
L2 Large baseline +30% structural cost (see above)
L5 (async post_* heavy, 20 sprints × 3 post × 5ms) ~300ms ~30ms -85% to -91%
L6 (deep tree, recursionlimit=1000) RecursionError at depth 1000 passes at 100/500/1000/1500/2000 works

benchmarks/baseline_v2.10.1.txt captured against master v2.10.1 for reproducibility — filename is version-anchored so future baselines (e.g. baseline_v2.11.0.txt) compose cleanly instead of colliding with a transient "pre_refactor" label.

Closes #77 (Tier 1 + Tier 2 items).

🤖 Generated with Claude Code

allmonday and others added 8 commits June 17, 2026 08:09
…77)

Replace recursive _process_level with a two-phase iterative model:
- Phase A (_phase_a_resolve): top-down, level-by-level resolve_* with
  asyncio.gather per level. Immediate setattr removes the
  dedup-by-None hack in child collection.
- Phase B (_phase_b_prepare_collectors + _phase_b_execute_posts):
  top-down collector prep, then bottom-up post_* with asyncio.gather
  per level (sync fast path when no async post_* present at the level).

Fixes #77 items A1 (post_* serialization 3-5x), A2 (delayed setattr),
B3 (misleading _bfs_post_and_set name), C1 (RecursionError on deep
trees — 2000-level chains now resolve cleanly).

Tier 1+2 cleanup on the same surface:
- B1: _build_injected_params helper collapses context/parent/ancestor
  /loader injection shared between resolve and post methods
- B2: _LevelState 4-tuple -> _LevelNode dataclass
- B4: _MISSING sentinel disambiguates "not computed" from cached
  None / empty list in _get_traversable_fields
- C2: TypeVar bound to BaseModel | list | tuple
- C3: comment explaining id(node) safety across Phase A->B window
- D1: single await replaces while-isawaitable loop
- D2: _dto_cls_cache / _traversable_fields_cache promoted to class-level
- D3: _orm_to_dto is_custom branch extracted out of list comprehension
- D4: scan_send_to_fields normalizes str -> 1-tuple at exit, dropping
  the per-field isinstance check in the SendTo hot loop

Tests:
- tests/test_resolver.py: + test_post_concurrent_execution,
  test_resolve_overwriting_field_does_not_double_traverse
- tests/test_resolver_deep.py (new): 2000-level chain without
  RecursionError + correctness walk on a 50-level chain

Bench (benchmarks/bench_resolver.py):
- L5 (async post_* heavy): -85% to -90% vs serial baseline
- L6 (deep tree): passes at 100/500/1000/1500/2000 levels
- L1-L4 held within +/-10% of master baseline
  (benchmarks/baseline_pre_refactor.txt captured for reproducibility)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Carries the lint fix forward so this PR's CI is green. The Union
form was introduced in 911f3a2 (master) and currently fails ruff
UP007 on master as well.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"pre_refactor" is a transient label that loses meaning once the refactor
lands and the next iteration ships its own baseline. Anchoring on the
version tag (master HEAD when the snapshot was captured) keeps the
file self-describing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… snapshot

Three changes addressing Large-scale (1000-task) regressions surfaced
after the two-phase refactor:

1. Cache scan_expose_fields / scan_send_to_fields results on _ClassMeta
   (new expose_map / send_to_map fields). Previously _init_level_nodes
   called scan_expose_fields once per node and _collect_send_to_for_node
   called scan_send_to_fields once per node — both were dict lookups on
   a cache, but the function-call + cache-lookup overhead dominated at
   1k+ nodes. Now both are attribute reads on the already-fetched meta.

2. Shared _EMPTY_SNAPSHOT sentinel for _LevelNode.collector_snapshot
   default. Classes without Collectors (the common case for L1/L2) no
   longer allocate a fresh empty dict per node. Matters at 1k+ nodes
   where allocation cost dominates.

3. _phase_b_prepare_collectors: when a node has no own Collectors AND
   its parent's snapshot is the sentinel, inherit by reference instead
   of allocating dict(parent_snapshot). For trees with zero Collectors
   anywhere (L1/L2), the entire phase is now ~free.

4. _LevelNode gets slots=True — smaller per-instance memory and faster
   attribute access, both of which matter when the level has 1k+ nodes.

5. Drop scan_send_to_fields import where no longer used at module
   top-level (still imported by _get_class_meta).

Bench impact on Large scale (50 users, 20 sprints, 1000 tasks),
median of 3 runs, nexusx absolute timings:

  Scenario | master   | pre-opt  | post-opt | Δ vs master
  ---------|----------|----------|----------|------------
  L1       | 422us    | 383us    | 384us    | -9% (noise)
  L2       | 12.84ms  | 17.04ms  | 16.95ms  | +32% (structural)
  L3       | 20.50ms  | 24.18ms  | 21.75ms  | +6% (noise)
  L4       | 24.57ms  | 27.00ms  | 26.11ms  | +6% (noise)
  L5       | n/a      | new      | 28.50ms  | -91% vs serial

L3/L4 are back within noise of master. L2 still carries a structural
cost: the iterative + phase-split design has more Python frames per
node than master's recursive inlined _process_level. Closing that
fully would mean re-inlining Phase A into a single function (losing
the readability win) or giving up the iterative structure (losing C1 /
deep-tree support). The 30% L2 Large regression is the price of the
correctness + readability + RecursionError fix.

Tests: 193 passing across resolver/deep/autoload/context/relationship/
subset/fixes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Profiling L2 Large (1000 TaskSummary nodes) showed _LevelNode dataclass
construction was 0.29ms per iteration vs 0.10ms for plain tuple — a 2.4x
overhead from Python __init__ + descriptor-based __setattr__ vs C-level
tuple packing. That single difference accounted for ~85% of the L2
regression vs master.

Revert B2's dataclass and define _LevelNode as a typed tuple alias:
  _LevelNode = tuple[_WorkItem, _ClassMeta, dict, dict]

Consumer sites use positional unpacking — same readability as named
attributes:
    for item, meta, ctx, snap in level: ...

Side effects of the switch:
- _phase_b_prepare_collectors rewrote from in-place attribute mutation
  (ln.collector_snapshot = ...) to lazy per-level tuple rebuild. Skips
  allocation entirely when no class declares Collectors (L1/L2 fast path).
- _batch_auto_load had a latent bug surface: the outer-loop new_ctx
  capture was wrong for grouped entries (which span multiple ln's).
  Fixed by re-unpacking per entry inside the inner loop. The dataclass
  version hid this because ln.new_ancestor_context always read the right
  attribute; tuple unpacking via captured local exposed the issue.
- Dropped the now-unused _collect_send_to_for_node helper; its body is
  inlined into _phase_b_execute_posts (single consumer).

Bench impact, median of 3 runs, Large scale (1000 tasks):

  Scenario | master   | dataclass | tuple    | tuple vs master
  ---------|----------|-----------|----------|----------------
  L1       | 422us    | 383us     | 377us    | -11%
  L2       | 12.84ms  | 17.04ms   | 14.65ms  | +14% (was +33%)
  L3       | 20.50ms  | 24.18ms   | 20.68ms  | +1%  (was +18%)
  L4       | 24.57ms  | 27.00ms   | 24.43ms  | -1%  (was +10%)
  L5       | n/a      | 30.24ms   | 26.62ms  | -92% vs serial

L3/L4 back to noise parity. L2 still carries ~14% structural overhead
(Phase A/B function-call setup + the level_has_send_to precheck loops),
but the worst of the regression is gone. Resolve-only timings on L2:
master 0.55ms → refactor 0.62ms (+13%), down from dataclass 0.77ms (+40%).

Trade-off: tuple indices [0]/[1]/[2]/[3] are less self-documenting than
named attributes. Mitigated by consistent unpack-into-locals at the top
of every consumer loop.

Tests: 935 passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_get_traversable_fields() previously only recognized bare `list[X]` and
bare `X` annotations, silently skipping `X | None`, `Optional[X]`,
`list[X] | None`, and `Annotated[X, ...]` populated child fields. This
meant the most natural way to spell a nullable relation
(`child: ChildDTO | None`) caused the child's resolve_*/post_* methods
to never execute — a footgun with no error signal.

Refactored the typing-unwrap logic into a single helper
`_extract_dto_cls_and_cardinality()` returning `(dto_cls, is_list)`.
Both `_do_extract_dto_cls` (used by auto-load) and
`_get_traversable_fields` (used by pre-existing-children traversal) now
share this helper, eliminating the dual-semantics drift flagged in the
issue #77 review.

Also added explanatory comments for three tradeoffs the review flagged:
- String annotations in `_do_extract_dto_cls` return None (silent
  degradation when model_rebuild() hasn't been called).
- `_orm_to_dto` drops None values intentionally (DB NULL ≡ field
  absent for read-side projections).
- `resolve()` clears caches at entry (Resolver is request-scoped by
  convention).

Added 6 regression tests covering each typing shape plus a direct unit
test of the helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@allmonday allmonday merged commit e9b3dd5 into master Jun 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resolver 性能与可读性 review: post_* 并发化、回填时机、注入 helper 等

1 participant