Skip to content

[AURON #2343] Support native flatten for mixed child array nullability#2344

Open
weimingdiit wants to merge 1 commit into
apache:masterfrom
weimingdiit:feat/native-flatten-nullability
Open

[AURON #2343] Support native flatten for mixed child array nullability#2344
weimingdiit wants to merge 1 commit into
apache:masterfrom
weimingdiit:feat/native-flatten-nullability

Conversation

@weimingdiit

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2343

Rationale for this change

Auron currently excludes Spark's flatten function tests because native execution can fail when child arrays have different containsNull metadata.

flatten is a built-in Spark array expression, so supporting it in native execution improves Spark SQL compatibility and reduces fallback coverage gaps.

What changes are included in this PR?

  • Adds a native Spark_ArrayFlatten function.
  • Converts Spark Flatten expressions to the new native function.
  • Preserves Spark semantics for:
    • null outer arrays
    • null child arrays
    • empty arrays
    • null elements inside child arrays
    • primitive and struct element arrays
  • Adds an Auron SQL test for mixed child array nullability.
  • Removes the flatten function excludes from Spark 3.1/3.2/3.4/3.5 test settings.

Are there any user-facing changes?

No user-facing API changes. More flatten expressions can now be executed natively by Auron.

How was this patch tested?

UT.

…ability

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/native-flatten-nullability branch from 21ea7d0 to e7be569 Compare June 24, 2026 11:24
@weimingdiit weimingdiit marked this pull request as ready for review June 24, 2026 12:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds native execution support for Spark SQL flatten(array<array<T>>) in Auron, aiming to remove existing Spark expression test exclusions and improve Spark SQL compatibility—especially for cases where nested arrays differ in containsNull metadata.

Changes:

  • Adds a new native scalar function Spark_ArrayFlatten and wires Spark Flatten expressions to it.
  • Improves native Spark_MakeArray handling for mixed child array nullability by widening list/struct element types.
  • Adds an Auron SQL regression test and removes flatten function exclusions from Spark 3.1/3.2/3.4/3.5 test settings.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala Converts Spark Flatten to the new native Spark_ArrayFlatten function.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala Adds SQL coverage for flatten with mixed child array nullability and null/empty edge cases.
native-engine/datafusion-ext-functions/src/spark_make_array.rs Widens list/struct element types and casts to a common element type for non-primitive array(...) creation.
native-engine/datafusion-ext-functions/src/spark_array.rs Implements array_flatten native function and adds a unit test.
native-engine/datafusion-ext-functions/src/lib.rs Registers Spark_ArrayFlatten in the native function factory.
auron-spark-tests/spark35/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala Removes Spark 3.5 flatten function exclusion.
auron-spark-tests/spark34/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala Removes Spark 3.4 flatten function exclusion.
auron-spark-tests/spark32/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala Removes Spark 3.2 flatten function exclusion.
auron-spark-tests/spark31/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala Removes Spark 3.1 flatten function exclusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +95 to +100
ColumnarValue::Array(array) if matches!(array.data_type(), DataType::Null) => {
Ok(ListArray::new_null(
Arc::new(Field::new_list_field(DataType::Null, true)),
array.len(),
))
}
Comment on lines +158 to +165
let values = make_array(mutable.freeze());
let field = Arc::new(Field::new_list_field(values.data_type().clone(), true));
Ok(ColumnarValue::Array(Arc::new(ListArray::try_new(
field,
OffsetBuffer::new(ScalarBuffer::from(offsets)),
values,
Some(NullBuffer::from(valids)),
)?)))
Comment on lines 112 to +116
// naive implementation with scalar values
let num_rows = args[0].len();
let data_type = common_array_element_data_type(args)?;
let args = args
.iter()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support native flatten with mixed child array nullability

3 participants