[AURON #2362] Support native bit_and / bit_or / bit_xor aggregate#2363
Open
zhuxiangyi wants to merge 1 commit into
Open
[AURON #2362] Support native bit_and / bit_or / bit_xor aggregate#2363zhuxiangyi wants to merge 1 commit into
zhuxiangyi wants to merge 1 commit into
Conversation
Implement native bit_and / bit_or / bit_xor aggregates: - native: add a generic AggBitwise<P> (agg/bitwise.rs) with AggBitAnd / AggBitOr / AggBitXor aliases. The accumulator is a single column of the input type; the first non-null value initializes the slot and each subsequent value is folded in with the bitwise operator. The operators are associative and commutative, so the result is order-independent, and null inputs are skipped (an all-null group yields null). Integral inputs only (Int8/Int16/Int32/Int64). Wire through the AggFunction enum, create_agg, the protobuf contract (BIT_AND / BIT_OR / BIT_XOR), the protobuf->AggFunction conversion, and the window-agg mapping. - spark-extension: add the BitAndAgg / BitOrAgg / BitXorAgg expression conversions in NativeConverters; declare the buffer schema in NativeAggBase.computeNativeAggBufferDataTypes (Seq(dataType)) so the partial -> shuffle -> final buffer schema matches the native side. Note: proto numbers 12/13/14 are used; 10/11 are left for the parallel LAST / LAST_IGNORES_NULL PR. Tests: - Rust unit test agg_exec::test::test_agg_bitwise (partial -> final, nulls). - Scala e2e AuronDataFrameAggregateSuite "native bit_and / bit_or / bit_xor aggregate" (spark34 + spark35), covering the partial -> shuffle -> final native path (incl. all-null group) and asserting NativeAggBase offload.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2362
Rationale for this change
Auron does not implement
bit_and/bit_or/bit_xornatively, so they fall back to the generic UDAF path (a JNI call back into the JVM), losing vectorized acceleration. This PR adds native support.What changes are included in this PR?
datafusion-ext-plans): add a genericAggBitwise<P>(agg/bitwise.rs) withAggBitAnd/AggBitOr/AggBitXoraliases. The accumulator is a single column of the input type; the first non-null value initializes the slot and each subsequent value is folded in with the operator. The operators are associative and commutative, so the result is order-independent, and null inputs are skipped (an all-null group yields null). Integral inputs only (Int8/Int16/Int32/Int64). Wired through theAggFunctionenum,create_agg, the protobuf contract (BIT_AND/BIT_OR/BIT_XOR), theprotobuf::AggFunction -> AggFunctionconversion, and the window-aggregate mapping.BitAndAgg/BitOrAgg/BitXorAggexpression conversions inNativeConverters.convertAggregateExpr; declare the buffer schema inNativeAggBase.computeNativeAggBufferDataTypes(Seq(dataType)) so the partial -> shuffle -> final buffer schema matches the native side.Are there any user-facing changes?
Yes.
bit_and/bit_or/bit_xorare now executed natively (vectorized); previously they fell back to the UDAF path.How was this patch tested?
agg_exec::test::test_agg_bitwise: partial -> final two-phase aggregation over a nullable integer column, verifying bit_and / bit_or / bit_xor including null skipping.AuronDataFrameAggregateSuite("native bit_and / bit_or / bit_xor aggregate", spark34 + spark35): a grouped aggregate exercising the full partial -> shuffle -> final native path (including an all-null group), asserting correct values and that the plan offloads toNativeAggBase.