Skip to content

expressions: mixed T x ToT products in arbitrary expression trees (Phase F)#564

Merged
evaleev merged 6 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/mixed-t-tot-trees
Jun 12, 2026
Merged

expressions: mixed T x ToT products in arbitrary expression trees (Phase F)#564
evaleev merged 6 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/mixed-t-tot-trees

Conversation

@evaleev

@evaleev evaleev commented Jun 12, 2026

Copy link
Copy Markdown
Member

Stacked on #563 (which stacks on #562). Completes mixed plain-tensor x tensor-of-tensors support in the expression layer for arbitrary expression trees, plus native support for no-external general products.

What

  • ScalMultEngine general products + tree deduction: the Phase E child-demand down-pass moves from MultEngine into BinaryEngine::init_children_indices (shared), and ScalMultEngine adopts it together with the full general-product routing (init_struct_general, init_distribution_general, make_trange_general, make_dist_eval_general, inner-product classification) — replacing its "use einsum() instead" exception. w("b,i,k;x") = 2.0 * (a("b,i,j") * c("b,j,k;x")) now evaluates.
  • Identity-tolerant inner-perm gate: the general-product ToT gate fired on a non-null but identity inner permutation (the bipartite perm is constructed whole when only outer modes are re-permuted by expressions: tree-general index deduction (Phase E) — inner-node general products #563's streaming wrapper); it now requires a genuinely non-identity inner perm. This unblocks mixed T x ToT general products at inner tree nodes, e.g. w("i,j;x") = (g("b,i") * c("b,j;x")) * h("b").
  • Scalar prefactor in inner-Scale ops: the mixed T x ToT element ops never carried the expression-level scalar factor — invisible while only MultEngine (factor == 1) reached them. The fallback op now absorbs factor_; the factor-free fused arena ops are gated to factor == 1 (scaled products take the fallback).
  • Native no-external general products: a general product whose every outer index is fused or contracted (e.g. C("i,j;a,b") = A("x,i,j;a") * B("x,i,j;b")) folds to a GEMM with no free modes, i.e. rank-0 tensors, which the tile kernels do not support (this shape segfaulted through wild stride reads). It is now evaluated with a SYNTHETIC UNIT left-external mode: the folded product becomes (1,K) x (K) -> (1), the exact shape of the already-supported one-sided neB == 0 case. The unit mode lives only in the tile op's GemmHelper; tranges, shapes and tiles carry the true (external-free) ranks, and BatchedContractReduce / SparseShape::gemm_batched detect the synthetic mode from the one-rank mismatch and pad their folded views with a unit extent.

Notable non-findings

  • Mixed T/ToT contraction chains at depth ≥ 2 — (s("i,j") * t("j,m")) * c("m,k;x") and s("i,j") * (t("j,m") * c("m,k;x")) — already worked unchanged through the Phase E deduction (the empty-inner-demand convention for plain subtrees composes correctly).
  • Sums nested under productsf("i,j") = a("x,i") * (b("x,k") * c("x,k,j") + d("x,j")), with a general product as a summand — work by construction: an Add's available_indices() is the leaf-union of its summands and the parent's demand intersection prunes summand-internal contraction indices automatically.
  • Block expressions compose: block operands in general products, block leaves under inner general nodes, and general products (including re-permuted, non-canonical-target ones) assigned into block views of the result.

Tests

  • Mixed: expression_mixed_t_tot_depth2_chains (both nesting orders), expression_mixed_t_tot_inner_general, expression_mixed_t_tot_scaled.
  • Composition: expression_general_sum_under_product; expression_general_kitchen_sinkw("i,j,m;a,b") = 2.0 * ((g("x,i") * cv("x,j;a")) * dv("x,i,m;b")), combining a THC-like batching index, a mixed T x ToT general product, a ToT x ToT general product with an inner outer-product, and a ScalMult prefactor.
  • Blocks: expression_general_product_block_operands, expression_general_product_into_block, expression_general_product_block_in_tree, expression_general_product_repermute_into_block.
  • No-external: dense ToT (incl. the no-external root fed by a general T x ToT inner node), plain dense (the Hadamard-reduction shape C("i") = A("i,j") * B("i,j")), and block-sparse (exercising the gemm_batched unit handling), all differential-tested against legacy einsum.
  • Full regression: general_product, einsum_*, sparse_shape, expressions{,_sparse} (modulo the two pre-existing assign_subblock_block_base1 failures), tot suites — green.
  • mpqc c6h14/cc-pVDZ PNO-CCSD energy unchanged (3e-11, run-to-run noise).

Notes / still out of scope

  • einsum() is NOT cut over for no-external products: its !e regime ("hadamard-reduction-local", the arena kernel) handles them before the generalized-contraction dispatch and remains the right tool for distributed workloads — the engine's no-external path uses a degenerate 1x1 process grid (all result tiles on one rank), so it is correctness-first; unifying the einsum regime under the engine remains gated on a perf/distribution comparison (see the design doc's open decisions).
  • Inner-index (nested-dim) General products remain gated (also in ScalMultEngine, with a matching message).
  • ToT*ToT -> T inner reductions (DeNest) stay on the einsum path.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends TiledArray’s expression engine support for mixed plain-tensor × tensor-of-tensors (T × ToT / ToT × T) products across arbitrary expression trees, including scaled (ScalMultEngine) routing and correctness fixes for no-external general products that previously could segfault.

Changes:

  • Centralizes Phase-E-style top-down child demand deduction into BinaryEngine::init_children_indices() and reuses it from MultEngine and ScalMultEngine.
  • Adds a synthetic unit left-external mode for “no-external” general products (rank-0 folded GEMM) and propagates that handling through ContEngine, BatchedContractReduce, and SparseShape::gemm_batched.
  • Fixes/generalizes ToT general-product gating and mixed inner-scale behavior (identity inner-perm handling; scalar prefactor propagation/gating for arena fused ops).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/general_product.cpp Adds regression/coverage tests for mixed T/ToT depth-2 trees, inner-node general products, scaled mixed products, block composition, and no-external general products.
src/TiledArray/tile_op/batched_contract_reduce.h Pads folded views with an optional unit extent when the synthetic no-external mode is in effect.
src/TiledArray/sparse_shape.h Updates gemm_batched to detect/pad the synthetic unit mode and to build correct folded ranges/result structures without exposing the synthetic mode.
src/TiledArray/expressions/mult_engine.h Switches MultEngine to shared init_children_indices; upgrades ScalMultEngine to fully route/evaluate general products (outer) and adds inner-general gating.
src/TiledArray/expressions/cont_engine.h Implements synthetic-unit no-external handling in general products; refines inner-perm gate for ToT; ensures scalar prefactor handling for mixed inner-scale ops.
src/TiledArray/expressions/binary_engine.h Introduces init_children_indices() as the shared top-down child-demand deduction pass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

evaleev added 6 commits June 12, 2026 10:03
…e-deduction down-pass

The Phase E child-demand deduction moves from MultEngine into
BinaryEngine::init_children_indices and ScalMultEngine adopts it, along
with the full MultEngine routing for general products
(inner_product_type_ classification + inner-General gate,
init_struct_general, init_distribution_general, make_trange_general,
make_dist_eval_general), replacing its use-einsum-instead exception.
…factor in inner-Scale ops

The general-product ToT gate fired on a non-null but IDENTITY inner
permutation (the bipartite perm is constructed whole when only the outer
modes are re-permuted by the streaming wrapper); require a genuinely
non-identity inner perm. The inner-Scale element ops (mixed T x ToT)
never carried the expression-level scalar prefactor -- invisible while
only MultEngine (factor == 1) reached them; the fallback op now absorbs
factor_ and the factor-free fused arena ops are gated to factor == 1.
…der products, kitchen-sink, blocks in trees

A ToT x ToT general product with no external (free) outer indices --
every outer index fused or contracted -- segfaulted in the folded GEMM;
gate it with an informative error (einsum() evaluates this shape
natively via its no-external regime).

New tests: a SUM nested under a product with a general summand (the
down-pass prunes summand-internal contraction indices from the sum's
demand by construction); the kitchen-sink expression combining a
THC-like batching index, a mixed T x ToT general product, a ToT x ToT
general product with an inner outer-product, and a ScalMult prefactor;
a block leaf under an inner general node; a re-permuted general product
assigned into a block view; the no-external gate.
… left-external mode

A general product whose every outer index is fused or contracted (e.g.
C("i,j;a,b") = A("x,i,j;a") * B("x,i,j;b")) folds to a GEMM with no
free modes, i.e. rank-0 tensors, which the tile kernels do not support
(this shape used to segfault through wild stride reads). Evaluate it
with a synthetic unit left-external mode instead: the folded product
becomes (1,K) x (K) -> (1), the exact shape of the already-supported
one-sided neB == 0 case. The unit mode lives only in the tile op's
GemmHelper; tranges, shapes and tiles carry the true (external-free)
ranks, and BatchedContractReduce / SparseShape::gemm_batched detect the
synthetic mode from the one-rank mismatch and pad their folded views
with a unit extent. Replaces the interim gate.

Tests: dense ToT (incl. the no-external root fed by a general T x ToT
inner node), plain dense (the Hadamard-reduction shape), and
block-sparse (exercising the gemm_batched unit handling), all
differential-tested against legacy einsum.
@evaleev evaleev force-pushed the evaleev/feature/mixed-t-tot-trees branch from 2ae292f to 3f00dae Compare June 12, 2026 14:08
Base automatically changed from evaleev/feature/general-product-tree-deduction to evaleev/feature/general-product-expr June 12, 2026 15:06
@evaleev evaleev merged commit d38b314 into evaleev/feature/general-product-expr Jun 12, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/feature/mixed-t-tot-trees branch June 12, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants