Skip to content

[AURON #2362] Support native bit_and / bit_or / bit_xor aggregate#2363

Open
zhuxiangyi wants to merge 1 commit into
apache:masterfrom
zhuxiangyi:support-native-bitwise-aggregate
Open

[AURON #2362] Support native bit_and / bit_or / bit_xor aggregate#2363
zhuxiangyi wants to merge 1 commit into
apache:masterfrom
zhuxiangyi:support-native-bitwise-aggregate

Conversation

@zhuxiangyi

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2362

Rationale for this change

Auron does not implement bit_and / bit_or / bit_xor natively, so they fall back to the generic UDAF path (a JNI call back into the JVM), losing vectorized acceleration. This PR adds native support.

What changes are included in this PR?

  • native (datafusion-ext-plans): add a generic AggBitwise<P> (agg/bitwise.rs) with AggBitAnd / AggBitOr / AggBitXor aliases. The accumulator is a single column of the input type; the first non-null value initializes the slot and each subsequent value is folded in with the operator. The operators are associative and commutative, so the result is order-independent, and null inputs are skipped (an all-null group yields null). Integral inputs only (Int8/Int16/Int32/Int64). Wired through the AggFunction enum, create_agg, the protobuf contract (BIT_AND / BIT_OR / BIT_XOR), the protobuf::AggFunction -> AggFunction conversion, and the window-aggregate mapping.
  • spark-extension: add the BitAndAgg / BitOrAgg / BitXorAgg expression conversions in NativeConverters.convertAggregateExpr; declare the buffer schema in NativeAggBase.computeNativeAggBufferDataTypes (Seq(dataType)) so the partial -> shuffle -> final buffer schema matches the native side.

Note: proto numbers 12/13/14 are used here; 10/11 are left for the parallel LAST / LAST_IGNORES_NULL PR (#2359) so the two can merge in any order.

Are there any user-facing changes?

Yes. bit_and / bit_or / bit_xor are now executed natively (vectorized); previously they fell back to the UDAF path.

How was this patch tested?

  • Rust unit test agg_exec::test::test_agg_bitwise: partial -> final two-phase aggregation over a nullable integer column, verifying bit_and / bit_or / bit_xor including null skipping.
  • Scala end-to-end test in AuronDataFrameAggregateSuite ("native bit_and / bit_or / bit_xor aggregate", spark34 + spark35): a grouped aggregate exercising the full partial -> shuffle -> final native path (including an all-null group), asserting correct values and that the plan offloads to NativeAggBase.

Implement native bit_and / bit_or / bit_xor aggregates:

- native: add a generic AggBitwise<P> (agg/bitwise.rs) with AggBitAnd /
  AggBitOr / AggBitXor aliases. The accumulator is a single column of the
  input type; the first non-null value initializes the slot and each
  subsequent value is folded in with the bitwise operator. The operators
  are associative and commutative, so the result is order-independent, and
  null inputs are skipped (an all-null group yields null). Integral inputs
  only (Int8/Int16/Int32/Int64). Wire through the AggFunction enum,
  create_agg, the protobuf contract (BIT_AND / BIT_OR / BIT_XOR), the
  protobuf->AggFunction conversion, and the window-agg mapping.
- spark-extension: add the BitAndAgg / BitOrAgg / BitXorAgg expression
  conversions in NativeConverters; declare the buffer schema in
  NativeAggBase.computeNativeAggBufferDataTypes (Seq(dataType)) so the
  partial -> shuffle -> final buffer schema matches the native side.

Note: proto numbers 12/13/14 are used; 10/11 are left for the parallel
LAST / LAST_IGNORES_NULL PR.

Tests:
- Rust unit test agg_exec::test::test_agg_bitwise (partial -> final, nulls).
- Scala e2e AuronDataFrameAggregateSuite "native bit_and / bit_or / bit_xor
  aggregate" (spark34 + spark35), covering the partial -> shuffle -> final
  native path (incl. all-null group) and asserting NativeAggBase offload.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support native bit_and / bit_or / bit_xor aggregate

1 participant