expressions: mixed T x ToT products in arbitrary expression trees (Phase F) by evaleev · Pull Request #564 · ValeevGroup/tiledarray

evaleev · 2026-06-12T02:04:32Z

Stacked on #563 (which stacks on #562). Completes mixed plain-tensor x tensor-of-tensors support in the expression layer for arbitrary expression trees, plus native support for no-external general products.

What

ScalMultEngine general products + tree deduction: the Phase E child-demand down-pass moves from MultEngine into BinaryEngine::init_children_indices (shared), and ScalMultEngine adopts it together with the full general-product routing (init_struct_general, init_distribution_general, make_trange_general, make_dist_eval_general, inner-product classification) — replacing its "use einsum() instead" exception. w("b,i,k;x") = 2.0 * (a("b,i,j") * c("b,j,k;x")) now evaluates.
Identity-tolerant inner-perm gate: the general-product ToT gate fired on a non-null but identity inner permutation (the bipartite perm is constructed whole when only outer modes are re-permuted by expressions: tree-general index deduction (Phase E) — inner-node general products #563's streaming wrapper); it now requires a genuinely non-identity inner perm. This unblocks mixed T x ToT general products at inner tree nodes, e.g. w("i,j;x") = (g("b,i") * c("b,j;x")) * h("b").
Scalar prefactor in inner-Scale ops: the mixed T x ToT element ops never carried the expression-level scalar factor — invisible while only MultEngine (factor == 1) reached them. The fallback op now absorbs factor_; the factor-free fused arena ops are gated to factor == 1 (scaled products take the fallback).
Native no-external general products: a general product whose every outer index is fused or contracted (e.g. C("i,j;a,b") = A("x,i,j;a") * B("x,i,j;b")) folds to a GEMM with no free modes, i.e. rank-0 tensors, which the tile kernels do not support (this shape segfaulted through wild stride reads). It is now evaluated with a SYNTHETIC UNIT left-external mode: the folded product becomes (1,K) x (K) -> (1), the exact shape of the already-supported one-sided neB == 0 case. The unit mode lives only in the tile op's GemmHelper; tranges, shapes and tiles carry the true (external-free) ranks, and BatchedContractReduce / SparseShape::gemm_batched detect the synthetic mode from the one-rank mismatch and pad their folded views with a unit extent.

Notable non-findings

Mixed T/ToT contraction chains at depth ≥ 2 — (s("i,j") * t("j,m")) * c("m,k;x") and s("i,j") * (t("j,m") * c("m,k;x")) — already worked unchanged through the Phase E deduction (the empty-inner-demand convention for plain subtrees composes correctly).
Sums nested under products — f("i,j") = a("x,i") * (b("x,k") * c("x,k,j") + d("x,j")), with a general product as a summand — work by construction: an Add's available_indices() is the leaf-union of its summands and the parent's demand intersection prunes summand-internal contraction indices automatically.
Block expressions compose: block operands in general products, block leaves under inner general nodes, and general products (including re-permuted, non-canonical-target ones) assigned into block views of the result.

Tests

Mixed: expression_mixed_t_tot_depth2_chains (both nesting orders), expression_mixed_t_tot_inner_general, expression_mixed_t_tot_scaled.
Composition: expression_general_sum_under_product; expression_general_kitchen_sink — w("i,j,m;a,b") = 2.0 * ((g("x,i") * cv("x,j;a")) * dv("x,i,m;b")), combining a THC-like batching index, a mixed T x ToT general product, a ToT x ToT general product with an inner outer-product, and a ScalMult prefactor.
Blocks: expression_general_product_block_operands, expression_general_product_into_block, expression_general_product_block_in_tree, expression_general_product_repermute_into_block.
No-external: dense ToT (incl. the no-external root fed by a general T x ToT inner node), plain dense (the Hadamard-reduction shape C("i") = A("i,j") * B("i,j")), and block-sparse (exercising the gemm_batched unit handling), all differential-tested against legacy einsum.
Full regression: general_product, einsum_*, sparse_shape, expressions{,_sparse} (modulo the two pre-existing assign_subblock_block_base1 failures), tot suites — green.
mpqc c6h14/cc-pVDZ PNO-CCSD energy unchanged (3e-11, run-to-run noise).

Notes / still out of scope

einsum() is NOT cut over for no-external products: its !e regime ("hadamard-reduction-local", the arena kernel) handles them before the generalized-contraction dispatch and remains the right tool for distributed workloads — the engine's no-external path uses a degenerate 1x1 process grid (all result tiles on one rank), so it is correctness-first; unifying the einsum regime under the engine remains gated on a perf/distribution comparison (see the design doc's open decisions).
Inner-index (nested-dim) General products remain gated (also in ScalMultEngine, with a matching message).
ToT*ToT -> T inner reductions (DeNest) stay on the einsum path.

Copilot

Pull request overview

Extends TiledArray’s expression engine support for mixed plain-tensor × tensor-of-tensors (T × ToT / ToT × T) products across arbitrary expression trees, including scaled (ScalMultEngine) routing and correctness fixes for no-external general products that previously could segfault.

Changes:

Centralizes Phase-E-style top-down child demand deduction into BinaryEngine::init_children_indices() and reuses it from MultEngine and ScalMultEngine.
Adds a synthetic unit left-external mode for “no-external” general products (rank-0 folded GEMM) and propagates that handling through ContEngine, BatchedContractReduce, and SparseShape::gemm_batched.
Fixes/generalizes ToT general-product gating and mixed inner-scale behavior (identity inner-perm handling; scalar prefactor propagation/gating for arena fused ops).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/general_product.cpp	Adds regression/coverage tests for mixed T/ToT depth-2 trees, inner-node general products, scaled mixed products, block composition, and no-external general products.
src/TiledArray/tile_op/batched_contract_reduce.h	Pads folded views with an optional unit extent when the synthetic no-external mode is in effect.
src/TiledArray/sparse_shape.h	Updates `gemm_batched` to detect/pad the synthetic unit mode and to build correct folded ranges/result structures without exposing the synthetic mode.
src/TiledArray/expressions/mult_engine.h	Switches MultEngine to shared `init_children_indices`; upgrades ScalMultEngine to fully route/evaluate general products (outer) and adds inner-general gating.
src/TiledArray/expressions/cont_engine.h	Implements synthetic-unit no-external handling in general products; refines inner-perm gate for ToT; ensures scalar prefactor handling for mixed inner-scale ops.
src/TiledArray/expressions/binary_engine.h	Introduces `init_children_indices()` as the shared top-down child-demand deduction pass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e-deduction down-pass The Phase E child-demand deduction moves from MultEngine into BinaryEngine::init_children_indices and ScalMultEngine adopts it, along with the full MultEngine routing for general products (inner_product_type_ classification + inner-General gate, init_struct_general, init_distribution_general, make_trange_general, make_dist_eval_general), replacing its use-einsum-instead exception.

…factor in inner-Scale ops The general-product ToT gate fired on a non-null but IDENTITY inner permutation (the bipartite perm is constructed whole when only the outer modes are re-permuted by the streaming wrapper); require a genuinely non-identity inner perm. The inner-Scale element ops (mixed T x ToT) never carried the expression-level scalar prefactor -- invisible while only MultEngine (factor == 1) reached them; the fallback op now absorbs factor_ and the factor-free fused arena ops are gated to factor == 1.

…al, scaled)

…nment into a block view)

…der products, kitchen-sink, blocks in trees A ToT x ToT general product with no external (free) outer indices -- every outer index fused or contracted -- segfaulted in the folded GEMM; gate it with an informative error (einsum() evaluates this shape natively via its no-external regime). New tests: a SUM nested under a product with a general summand (the down-pass prunes summand-internal contraction indices from the sum's demand by construction); the kitchen-sink expression combining a THC-like batching index, a mixed T x ToT general product, a ToT x ToT general product with an inner outer-product, and a ScalMult prefactor; a block leaf under an inner general node; a re-permuted general product assigned into a block view; the no-external gate.

… left-external mode A general product whose every outer index is fused or contracted (e.g. C("i,j;a,b") = A("x,i,j;a") * B("x,i,j;b")) folds to a GEMM with no free modes, i.e. rank-0 tensors, which the tile kernels do not support (this shape used to segfault through wild stride reads). Evaluate it with a synthetic unit left-external mode instead: the folded product becomes (1,K) x (K) -> (1), the exact shape of the already-supported one-sided neB == 0 case. The unit mode lives only in the tile op's GemmHelper; tranges, shapes and tiles carry the true (external-free) ranks, and BatchedContractReduce / SparseShape::gemm_batched detect the synthetic mode from the one-rank mismatch and pad their folded views with a unit extent. Replaces the interim gate. Tests: dense ToT (incl. the no-external root fed by a general T x ToT inner node), plain dense (the Hadamard-reduction shape), and block-sparse (exercising the gemm_batched unit handling), all differential-tested against legacy einsum.

evaleev requested a review from Copilot June 12, 2026 12:51

Copilot started reviewing on behalf of evaleev June 12, 2026 12:51 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

evaleev mentioned this pull request Jun 12, 2026

summa: 3-d (proc_h) process grid for batched general products #565

Open

evaleev added 6 commits June 12, 2026 10:03

tests: mixed T x ToT at inner tree nodes (depth-2 chains, inner gener…

e3f8e36

…al, scaled)

tests: general products with block expressions (block operands; assig…

07f5b97

…nment into a block view)

evaleev force-pushed the evaleev/feature/mixed-t-tot-trees branch from 2ae292f to 3f00dae Compare June 12, 2026 14:08

Base automatically changed from evaleev/feature/general-product-tree-deduction to evaleev/feature/general-product-expr June 12, 2026 15:06

evaleev merged commit d38b314 into evaleev/feature/general-product-expr Jun 12, 2026
9 checks passed

evaleev deleted the evaleev/feature/mixed-t-tot-trees branch June 12, 2026 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expressions: mixed T x ToT products in arbitrary expression trees (Phase F)#564

expressions: mixed T x ToT products in arbitrary expression trees (Phase F)#564
evaleev merged 6 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/mixed-t-tot-trees

evaleev commented Jun 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evaleev commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Notable non-findings

Tests

Notes / still out of scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evaleev commented Jun 12, 2026 •

edited

Loading