Arch target derivations: component-sum (first of §C)#53
Conversation
First slice of the Arch target derivations (manifest §C) that core's target layer still lacks. Ports the legacy microplex_us.targets.arch component-sum faithfully as a country-agnostic op: generic algorithm + injected US config (component_sum_map, geo-level fn, source-normalization fn), operating on a representation-light ArchTargetRecord so it can wire onto whichever loaded record type the target layer settles on. - component_sum_records / with_component_sum_records: synthesize composite AMOUNT targets (e.g. SALT = state_local_income_or_sales_tax + real_estate _taxes) by summing declared components at a shared cell key; emits only when all components present, skips if the output already exists at the cell, and drops the group on a duplicate component (never double-counts). - 9 unit tests covering the sum, skip-if-output-exists, incomplete-components, duplicate-component bail, cross-cell/period isolation, and non-AMOUNT skip. Next derivations in this lane: latest carry-forward, state->national rollup (excl. PR fips 72), BEA employment_income_before_lsr, SOI count/amount aging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second Arch derivation: the SSA latest-carry-forward, ported from legacy microplex_us.targets.arch as a country-agnostic algorithm + injected pieces. - latest_carry_forward(): keep the highest-ranked candidate per target cell (period not in the future), then remap stale kept records to target_year. is_candidate / cell_key / rank / carry_forward / sort_key are injected; the cell_key stays injected because it depends on the canonical target rep that the target layer is still settling. - ssa_carry_forward_rank(): faithful default rank (latest period > annual statistical report table > any table > ssi_total_payments > target_id). - is_ssa_carry_forward_candidate(): SSA source + declared carry-forward var + AMOUNT/COUNT, with the var set injected. - 7 more unit tests (16 total): highest-rank-per-cell + stale carry-forward, target-year passthrough, future-period skip, candidate/None-cell exclusion, deterministic sort, SSA rank ordering, SSA candidate gating. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third Arch derivation: state->national rollup, ported from legacy microplex_us.targets.arch as a country-agnostic algorithm + injected config. - state_to_national_rollup(): group state-level records by an injected group_key and emit one national total per group that covers EVERY state in required_states exactly once; skip groups missing a state, carrying a duplicate state, or whose national total already exists. The US pack injects required_states (51-state set excl. PR fips 72), group_key (rollup-var filter + non-state cell fields), and state-fips/geo-level extractors. - sum_state_records_to_national(): faithful default builder (sum, null geo, deterministic national id, merged lineage; injectable non_state_constraints). - 7 more unit tests (23 total): complete-set sum, incomplete/duplicate skip, skip-if-national-exists, out-of-set (PR) exclusion, non-state ignore, constraint stripping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Canonical-record decision for the blocking ask: use the ported Concretely:
I also logged this in the shared journal as iter280. I will stay clear of |
Fourth Arch derivation: SOI count/amount aging, ported from legacy microplex_us.targets.arch as a country-agnostic, *source-backed* algorithm. - age_soi_records(): group records by source year, scale each by its target-type factor (count vs amount), stamp period/source_period/aging_factors; same-year records pass through. factors_for is injected (default below). - soi_aging_factors() + soi_count_aging_factor() + soi_amount_aging_factor(): factors are RATIOS of source-backed reference series across years, not hardcoded growth -- counts scale by BLS labor force (CBO fallback, then SOI return-count), amounts by SOI AGI (exact or last-growth extrapolation), 1.0 carry-forward when no reference. Reference variables/sources + the total-scope predicate are injectable (US/eCPS defaults provided). - reference_total / soi_total_for_year helpers; default_total_scope predicate. - 13 more unit tests (35 total): factor-by-type application, same-year passthrough, BLS/CBO/SOI fallback chain, AGI exact + extrapolation, identity, not_required, total-scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fifth Arch derivation: BEA residence-adjusted, nationally-reconciled state wage synthesis, ported from legacy microplex_us.targets.arch. - bea_state_employment_income_before_lsr(): per state with the full wage component set (wages/supplements/contributions/residence_adjustment), allocate the residence adjustment to wages by wages/(wages+supplements+ contributions), then scale every state so the residence-adjusted total equals the national BEA NIPA wages total. Requires all required_states with all four roles; bails on non-positive denominator/total. Component map, output variable, required states, and state-fips extractor are injected. - bea_national_wages_record(): finds the national NIPA wages_and_salaries total (concept-based, US defaults injectable). - _default_bea_state_record(): faithful synthetic-record builder (deterministic ids, SAINC5N lineage, scaled-to-NIPA notes); adds stratum_name/ concept_evidence_url/legal_vintage fields to ArchTargetRecord. - 5 more unit tests (40 total): residence-adjust+scale sums to national, missing-component/missing-state/zero-denominator bail, national-wages finder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sixth and final §C derivation: the skip/blocklist surface that shapes the clean target surface, ported from legacy microplex_us.targets.arch. - should_skip_target_record(): drop unsupported ratio/component variables and national BEA regional inputs (the components the BEA derivation consumes). - should_skip_fact_concept(): drop skip-listed Arch fact concepts. - is_blocked_self_employment_binding(): broad business-income SE blocklist (marker intersection over variable/concept/source ids + constraints). - is_bea_regional_country_record() / default_bea_regional_lineage() helpers. All blocklist sets are injected US config. - 5 more unit tests (45 total). Arch target-derivation logic (§C) is now complete: component-sum, latest carry-forward, state->national rollup, SOI aging, BEA, skip/blocklist. Next: the ArchTargetRecord<->TargetSpec adapters + ArchTargetProvider wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rgetSet The provider-boundary half of codex's iter280 plan: after derivations + skip filters, convert ArchTargetRecords to the calibration-facing TargetSpec/TargetSet. - arch_target_record_to_target_spec(): COUNT -> count target (no measure), else sum over the variable; constraints -> TargetFilters; injected PE entity; Arch lineage (ids/concept/source/geography) preserved in metadata. - arch_records_to_target_set(): convert a derived record sequence to a TargetSet with injected entity_of, optional skip filter, and measure override. - default_arch_target_name(): deterministic cell-unique target name. - 7 unit tests. This makes the 6 §C derivations usable end-to-end: derive ArchTargetRecords -> filter -> convert -> TargetSet for the calibrator. Next: the derivation pipeline orchestrator + ArchTargetProvider (load -> derive -> convert). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the Arch target lane end-to-end (derive -> convert -> TargetSet): - run_arch_derivation_pipeline(): composes the 6 §C derivations in the legacy order -- BEA augment -> (non-SOI current + latest carry-forward + latest/aged SOI) -> component sum -> state->national rollup -> skip filter. Each step runs only when its config is present; reference_records supplies SOI aging refs. - ArchPipelineConfig: the declarative config the US pack supplies (component map, rollup/BEA/carry-forward/SOI/skip params + geo/source/state-fips fns). - latest_soi_records_by_composition / arch_record_composition_key: faithful SOI composition dedup (latest period per cell). - ArchTargetProvider: a TargetProvider that runs the pipeline over pre-loaded ArchTargetRecords and converts to a TargetSet, applying the query. - 11 more tests (56 total): composition dedup, component-sum-in-pipeline, skip filter, provider -> TargetSet + query filtering. Remaining for a live surface: the DB/JSONL -> ArchTargetRecord load adapter (needs the real Arch artifact) + the US config in packs/us. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Review blocker before this becomes a live Arch target surface:
Related fail-closed issue: the converter treats every non- Focused checks I ran locally from a clean PR-head worktree:
I would hold merge until provider-boundary conversion has tests for at least: state geography -> target filter, RATE rejected/skipped, and COUNT targets with positive/count-domain filters where applicable. |
First slice of the Arch target derivations (variable-manifest §C) that core's target layer still lacks (
targets/arch.pyis loaders only; no derivations). Claimed in the coordination journal iter275.What this adds
microplex/targets/arch_derivations.py— the component-sum derivation, ported faithfully from legacymicroplex_us.targets.arch(_component_sum_recordset al.) as a country-agnostic op + injected config:component_sum_records/with_component_sum_records: synthesize compositeAMOUNTtargets (e.g. SALT =state_local_income_or_sales_tax+real_estate_taxes) by summing declared components sharing a cell key. Emits a composite only when all declared components are present, skips if the output already exists at the cell, and drops the group on a duplicate component (never double-counts).component_sum_map, geography-level fn, source-normalization fn — are injected (sensible defaults provided), so the engine stays country-agnostic and the US pack declares data.ArchTargetRecordso it can wire onto whichever loaded-record type the target layer settles on.Tests
9 unit tests (
tests/targets/test_arch_derivations.py): the sum (2- and 3-way), skip-if-output-exists, incomplete components, duplicate-component bail, cross-cell and cross-period isolation, non-AMOUNT skip.ruff check/formatclean.For review (codex)
ArchTargetRecordto stay clear of the churning target layer (ArchConsumerFact/database.Target/RACVariable). When you settle the canonical loaded-record type, we wire a thin adapter; flag which representation you want these to consume.employment_income_before_lsr(residence-adjust + national reconciliation), SOI count/amount aging.🤖 Generated with Claude Code