[AURON #1891] Implement randn() function by robreeves · Pull Request #1938 · apache/auron

robreeves · 2026-01-21T05:35:05Z

Which issue does this PR close?

Closes #1891

Rationale for this change

This improves function coverage in Auron by creating a native randn implementation.

What changes are included in this PR?

Adds a native randn implementation.

Are there any user-facing changes?

Yes, it adds the randn function.

How was this patch tested?

Added unit tests and manually tested in spark-shell.

import org.apache.spark.sql.functions.randn

val df = spark.range(5)
val outputPath = "/tmp/spark_range_output.parquet"
df.write.mode("overwrite").parquet(outputPath)

val readDf = spark.read.parquet(outputPath)
val resultDf = readDf.withColumn("random_normal", randn(18))
resultDf.collect

Output:

26/01/30 15:41:22 WARN NativeHelper: memory total: 1408.0 MiB, onheap: 1024.0 MiB, offheap: 384.0 MiB
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
26/01/30 15:41:24 WARN AuronCallNativeWrapper: Start executing native plan
------ initializing auron native environment ------
initializing logging with level: info
2026-01-30 15:41:24.368 (+0.000s) [INFO] [auron::exec:73] (stage: 0, partition: 0, tid: 0) - initializing JNI bridge
2026-01-30 15:41:24.369 (+0.001s) [INFO] [auron_jni_bridge::jni_bridge:473] (stage: 0, partition: 0, tid: 0) - Initializing JavaClasses...
2026-01-30 15:41:24.375 (+0.007s) [INFO] [auron_jni_bridge::jni_bridge:529] (stage: 0, partition: 0, tid: 0) - Initializing JavaClasses finished
2026-01-30 15:41:24.375 (+0.007s) [INFO] [auron::exec:77] (stage: 0, partition: 0, tid: 0) - initializing datafusion session
2026-01-30 15:41:24.375 (+0.007s) [INFO] [auron_memmgr:48] (stage: 0, partition: 0, tid: 0) - mem manager initialized with total memory: 230.4 MiB
2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 1, tid: 12) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=1) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [], statistics: None }, FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwMS04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 472, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 472 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 5, tid: 16) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=5) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwMC04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 297, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 297 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 2, tid: 13) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=2) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwMy04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 472, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 472 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 4, tid: 15) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=4) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwNS04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 471, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 471 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }, FileGroup { files: [], statistics: None }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 3, tid: 14) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=3) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwOS04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 472, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 472 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.385 (+0.017s) [INFO] [auron::rt:146] (stage: 2, partition: 0, tid: 11) - start executing plan:
ProjectExec [#3@0 AS #3, Randn(seed=18, partition=0) AS #5], schema=[#3:Int64;N, #5:Float64]
  RenameColumnsExec: ["#3"], schema=[#3:Int64;N]
    ParquetExec: limit=None, file_group=[FileGroup { files: [PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy90bXAvc3BhcmtfcmFuZ2Vfb3V0cHV0LnBhcnF1ZXQvcGFydC0wMDAwNy04ZTkwNmRiYS0zZDg3LTRkZWMtYjM0NC1hYjdiZWUyODEwZWQtYzAwMC5zbmFwcHkucGFycXVldA" }, last_modified: 1970-01-01T00:00:00Z, size: 472, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 472 }), statistics: None, extensions: None, metadata_size_hint: None }], statistics: Some(Statistics { num_rows: Exact(0), total_byte_size: Exact(0), column_statistics: [ColumnStatistics { null_count: Absent, max_value: Absent, min_value: Absent, sum_value: Absent, distinct_count: Absent }] }) }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }, FileGroup { files: [], statistics: None }], predicate=Some(Literal { value: Boolean(true), field: Field { name: "lit", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }), schema=[id:Int64;N]

2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 4, tid: 15) - executing parquet scan with adaptive batch size: 10000
2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 0, tid: 11) - executing parquet scan with adaptive batch size: 10000
2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 3, tid: 14) - executing parquet scan with adaptive batch size: 10000
2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 1, tid: 12) - executing parquet scan with adaptive batch size: 10000
2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 2, tid: 13) - executing parquet scan with adaptive batch size: 10000
2026-01-30 15:41:24.394 (+0.026s) [INFO] [datafusion_datasource_parquet::opener:421] (stage: 2, partition: 5, tid: 16) - executing parquet scan with adaptive batch size: 1
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 5, tid: 16) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 0, tid: 11) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 4, tid: 15) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 3, tid: 14) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=5) native execution finalizing
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 2, tid: 13) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:183] (stage: 2, partition: 1, tid: 12) - task finished
2026-01-30 15:41:24.488 (+0.120s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=5) native execution finalized
2026-01-30 15:41:24.511 (+0.143s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=3) native execution finalizing
2026-01-30 15:41:24.511 (+0.143s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=4) native execution finalizing
2026-01-30 15:41:24.511 (+0.143s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=0) native execution finalizing
2026-01-30 15:41:24.511 (+0.143s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=2) native execution finalizing
2026-01-30 15:41:24.511 (+0.143s) [INFO] [auron::rt:266] (stage: 0, partition: 0, tid: 0) - (partition=1) native execution finalizing
2026-01-30 15:41:24.512 (+0.144s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=0) native execution finalized
2026-01-30 15:41:24.512 (+0.144s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=4) native execution finalized
2026-01-30 15:41:24.512 (+0.144s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=1) native execution finalized
2026-01-30 15:41:24.512 (+0.144s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=2) native execution finalized
2026-01-30 15:41:24.512 (+0.144s) [INFO] [auron::rt:274] (stage: 0, partition: 0, tid: 0) - (partition=3) native execution finalized
import org.apache.spark.sql.functions.randn
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
outputPath: String = /tmp/spark_range_output.parquet
readDf: org.apache.spark.sql.DataFrame = [id: bigint]
resultDf: org.apache.spark.sql.DataFrame = [id: bigint, random_normal: double]
res0: Array[org.apache.spark.sql.Row] = Array([3,1.4607292672705405], [0,-0.3268302897860617], [1,-0.09087682847007866], [4,-1.2271197538792842], [2,-0.546398027932835])

Copilot

Pull request overview

This PR implements the randn() function to improve Spark function coverage in Auron. The function generates random values from a standard normal distribution with optional seed support.

Changes:

Added Rust implementation of spark_randn function with seed handling
Registered the new function in the Scala converter and Rust function registry
Added rand_distr dependency for normal distribution sampling

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala	Added case handler for Randn expression to route to native implementation
native-engine/datafusion-ext-functions/src/spark_randn.rs	New implementation of randn function with seed handling and unit tests
native-engine/datafusion-ext-functions/src/lib.rs	Registered Spark_Randn function in the extension function factory
native-engine/datafusion-ext-functions/Cargo.toml	Added rand and rand_distr dependencies
Cargo.toml	Added rand_distr workspace dependency
Cargo.lock	Updated lock file with rand_distr package metadata

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Resolve conflicts between randn and spark_partition_id features: - Proto: spark_partition_id_expr at 20101, randn_expr at 20102 - Planner: include both expression handlers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

yew1eb · 2026-01-31T04:35:33Z

@robreeves Nice work! LGTM.
Could add SQL unit tests (per AuronFunctionSuite) to align with Spark SQl's semantics, thanks!

Resolved conflicts by assigning separate IDs to randn and monotonically_increasing_id: - MonotonicIncreasingIdExprNode: ID 20102 - RandnExprNode: ID 20103 Both expressions are now supported in the proto definitions and planner. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add test to AuronFunctionSuite to verify randn functionality with seeds. The test validates that Auron's native randn implementation produces the same reproducible results as Spark's baseline when using explicit seeds. Test covers: - randn with seed 42 - randn with seed 100 - Validates against Spark baseline using checkSparkAnswerAndOperator Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

robreeves · 2026-02-04T04:26:48Z

@robreeves Nice work! LGTM. Could add SQL unit tests (per AuronFunctionSuite) to align with Spark SQl's semantics, thanks!

I added a AuronFunctionSuite test. Thanks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

robreeves · 2026-02-08T15:52:09Z

@richox can you run the PR checks again?

robreeves · 2026-02-12T18:59:38Z

@cxzl25 can you run the PR checks? Thanks

ShreyeshArangath

Changes LGTM, just one comment about the naming of this rust function.

…ntion Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…nces Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ShreyeshArangath · 2026-02-25T19:49:56Z

hey, @cxzl25, gentle ping on this

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

SteNicholas · 2026-06-15T12:41:21Z

@robreeves, could you resolve conflicts?

robreeves · 2026-06-18T15:33:11Z

@SteNicholas yes I'll handle the updates this weekend. Apologies for the delay on this one.

# Conflicts: # Cargo.lock # Cargo.toml # spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala

Copilot

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

robreeves · 2026-06-24T23:19:52Z

+use parking_lot::Mutex;
+use rand::{SeedableRng, rngs::StdRng};
+use rand_distr::{Distribution, StandardNormal};


The test does pass, but it is because there is a preexisting bug in the test suite. The Spark vanilla case is actually running with auron enabled so the cases always match in the assert.

I added this log:

@@ -55,6 +58,12 @@ abstract class AuronQueryTest var expected: Seq[Row] = null withSQLConf("spark.auron.enable" -> "false") { + // scalastyle:off println + println("[AURON_CONF_DEBUG] after setting spark.auron.enable=false -> " + + s"raw spark.auron.enable=${SQLConf.get.getConfString("spark.auron.enable", "<unset>")}, " + + s"raw spark.auron.enabled=${SQLConf.get.getConfString("spark.auron.enabled", "<unset>")}, " + + s"effective AURON_ENABLED.get()=${SparkAuronConfiguration.AURON_ENABLED.get()}") + // scalastyle:on println val dfSpark = dataframe() expected = dfSpark.collect() }

and it shows this:

[AURON_CONF_DEBUG] after setting spark.auron.enable=false -> raw spark.auron.enable=false, raw spark.auron.enabled=<unset>, effective AURON_ENABLED.get()=true

I also reproduced it with this test.

test("alt config key spark.auron.enable is silently ignored") { // Primary key is "spark.auron.enabled"; "spark.auron.enable" is only an alt key. // Setting the ALT key has no effect -> effective value stays at the default (true). withSQLConf("spark.auron.enable" -> "false") { assert(SparkAuronConfiguration.AURON_ENABLED.get() === true) } // Setting the PRIMARY key works -> effective value becomes false. withSQLConf("spark.auron.enabled" -> "false") { assert(SparkAuronConfiguration.AURON_ENABLED.get() === false) } }

Auron is enabled because spark.auron.enabled defaults to true here. The reason it is not set is because the alt keys are never registered with spark so ConfigEntry.findEntry("spark.auron.enable") does not return anything.

When I use the correct config spark.auron.enabled 3 tests failed. Since one test is unrelated I will open a separate PR to fix this first.

- acosh null propagation *** FAILED ** - randn function with seed *** FAILED *** - randn function with foldable seed expression *** FAILED *** - ```

The config issue is fixed in #2361

Auron's ConfigOption alt keys (declared via addAltKey) were silently ignored: getFromSpark only consulted alt keys via ConfigEntry.findEntry (always null for Auron's unregistered options) and then synthesized a ConfigEntryWithDefaultFunction with an empty alternatives list, so only the primary key was ever read from SQLConf. As a result, e.g. setting spark.auron.enable (alt of spark.auron.enabled) had no effect. Pass the spark-prefixed alt keys as the synthesized entry's alternatives so ConfigEntry#readString reads primary +: alternatives, with the primary key taking precedence. Also add a test asserting alt keys are honored. Fixing this makes the test harness's spark.auron.enable=false baseline actually fall back to vanilla Spark, which exposed that acosh(0.0) yields NaN with a different (implementation-defined) bit pattern in each engine; QueryTest compares doubles via Double.doubleToRawLongBits, so update the acosh test to assert NaN-ness for the out-of-domain input rather than exact equality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

QueryTest compares doubles via Double.doubleToRawLongBits, which is bit-exact. Vanilla Spark and the native engine can produce semantically equal NaNs with different (implementation-defined) bit patterns, so the comparison would spuriously fail. Canonicalize NaN on both sides before comparing. This lets the acosh null propagation test keep its original single-query form covering the out-of-domain (NaN) input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…cker Revert checkSparkAnswerAndOperator to plain checkAnswer and instead handle the NaN encoding difference locally in the acosh test. acosh of an out-of-domain input yields NaN, which vanilla Spark and the native engine may encode with different bits; checkAnswer/QueryTest compares doubles by raw bits. Split the test so in-domain/null values are compared numerically, and out-of-domain inputs are compared via the natively-supported isnan (a boolean) so no raw NaN bits are compared. This keeps the shared checker unchanged and avoids relaxing NaN comparison for all callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With the config alt-key fix in place, the test harness's spark.auron.enable=false baseline now actually runs vanilla Spark, so checkSparkAnswerAndOperator compares Auron's randn against Spark's randn. These differ by design: the native engine uses StdRng/StandardNormal while Spark uses XORShiftRandom + nextGaussian, and randn is non-deterministic and not intended to be bit-compatible with Spark. Rewrite the randn tests to verify the expression is executed natively, produces a non-null value per row, and is reproducible for a fixed seed (and that different seeds produce different values), instead of asserting exact equality with vanilla Spark. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

robreeves · 2026-06-25T20:39:07Z

This now includes #2361 changes. #2361 should be merged first then I will rebase this.

robreeves added 4 commits January 18, 2026 10:37

randn

175fa00

reduce arg parsing complexity

5bb5ca5

remove extra method

3d0fa7d

use rand_dist

0f793c4

github-actions Bot added spark native build labels Jan 21, 2026

robreeves added 2 commits January 20, 2026 21:40

revert auto format change

0d041f3

revert auto format changes

0b93080

cxzl25 requested a review from Copilot January 21, 2026 05:48

Copilot started reviewing on behalf of cxzl25 January 21, 2026 05:49 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Comment thread native-engine/datafusion-ext-functions/src/spark_randn.rs Outdated

Comment thread native-engine/datafusion-ext-functions/src/spark_randn.rs Outdated

Comment thread native-engine/datafusion-ext-functions/src/spark_randn.rs Outdated

robreeves and others added 8 commits January 21, 2026 21:40

autoformat

3b78e73

handle batches

4b2261b

new approach

81beb0b

forced format changes during build

a8183ae

Merge origin/master into randn

f591295

Resolve conflicts between randn and spark_partition_id features: - Proto: spark_partition_id_expr at 20101, randn_expr at 20102 - Planner: include both expression handlers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

simplify evaluate

0ea5dc9

revert whitespace

f792b1b

revert whitespace

dd6c98b

robreeves marked this pull request as ready for review January 31, 2026 00:26

robreeves and others added 2 commits February 3, 2026 20:00

robreeves and others added 2 commits February 8, 2026 07:36

Fix rustfmt formatting in randn.rs

388d0a1

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix clippy uninlined_format_args in randn.rs

6e1f2d4

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

ShreyeshArangath reviewed Feb 15, 2026

View reviewed changes

Comment thread native-engine/datafusion-ext-exprs/src/lib.rs Outdated

robreeves and others added 2 commits February 15, 2026 21:24

Rename randn to spark_randn to align with Spark-specific naming conve…

048fdfb

…ntion Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Rename RandnExprNode to SparkRandnExprNode in proto and update refere…

8f554f8

…nces Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ShreyeshArangath approved these changes Feb 16, 2026

View reviewed changes

ShreyeshArangath mentioned this pull request Mar 12, 2026

[EPIC] Support nondetermenistic expressions #1833

Open

5 tasks

cxzl25 requested a review from Copilot March 12, 2026 14:48

Copilot started reviewing on behalf of cxzl25 March 12, 2026 14:49 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

robreeves and others added 3 commits March 12, 2026 08:18

fix typo

7ea58fc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove unused rand/rand_distr dependencies from datafusion-ext-functions

e329725

Handle foldable seed expressions in randn converter

fef5506

SteNicholas assigned richox and ShreyeshArangath Jun 15, 2026

Merge remote-tracking branch 'origin/master' into randn

96ed5b2

# Conflicts: # Cargo.lock # Cargo.toml # spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala

robreeves requested a review from Copilot June 24, 2026 18:42

Copilot started reviewing on behalf of robreeves June 24, 2026 18:42 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

robreeves and others added 7 commits June 25, 2026 00:47

Remove explanatory comment on synthesized ConfigEntry alternatives

b606195

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Reword NaN canonicalization comment in plainer language

19b8426

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'altkeys' into randn

0a012ee

Uh oh!

Conversation

robreeves commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yew1eb commented Jan 31, 2026

Uh oh!

robreeves commented Feb 4, 2026

Uh oh!

robreeves commented Feb 8, 2026

Uh oh!

robreeves commented Feb 12, 2026

Uh oh!

ShreyeshArangath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShreyeshArangath commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SteNicholas commented Jun 15, 2026

Uh oh!

robreeves commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

robreeves Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

robreeves Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

robreeves Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robreeves commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

robreeves commented Jan 21, 2026 •

edited

Loading

robreeves Jun 25, 2026 •

edited

Loading