Skip to content

fix: classify unsupported format patterns as Unsupported in CometFromUnixTime#4660

Open
marvelshan wants to merge 3 commits into
apache:mainfrom
marvelshan:fix/from-unixtime-unsupported-format
Open

fix: classify unsupported format patterns as Unsupported in CometFromUnixTime#4660
marvelshan wants to merge 3 commits into
apache:mainfrom
marvelshan:fix/from-unixtime-unsupported-format

Conversation

@marvelshan

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4575

Rationale for this change

CometFromUnixTime incorrectly reports non-default datetime format patterns as Incompatible instead of Unsupported. This is misleading because:

What changes are included in this PR?

  • Split getIncompatibleReasons() to only contain the timestamp range difference (which is a true incompatibility — native can execute but results may differ at boundaries)
  • Added getUnsupportedReasons() to document format pattern limitations (native cannot execute these at all)
  • Updated getSupportLevel(expr) to dynamically return Unsupported for non-default format patterns and Incompatible for the default pattern, matching the actual behavior in convert()

How are these changes tested?

Existing tests cover the fallback behavior. The convert() method already returns None for non-default formats, and getSupportLevel() now aligns with that behavior. The GenerateDocs output will correctly classify format limitations under "Unsupported" rather than "Incompatible".

@marvelshan marvelshan force-pushed the fix/from-unixtime-unsupported-format branch from 0aa73b7 to 32f11be Compare June 16, 2026 07:04
val timeZone = exprToProtoInternal(Literal(expr.timeZoneId.orNull), inputs, binding)

if (expr.format != Literal(TimestampFormatter.defaultPattern)) {
if (expr.format != Literal(TimestampFormatter.defaultPattern())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : might be an unwanted change ?

if (expr.format != Literal(TimestampFormatter.defaultPattern())) {
Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported"))
} else {
Incompatible(None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might wnat to add some SQL tests to verify the fallback and make sure the plan is annotated

…UnixTime

CometFromUnixTime previously reported all limitations as Incompatible,
but non-default format patterns cannot be executed natively at all and
should be Unsupported. The DataFusion timestamp range difference is
the genuine Incompatible case.

- Split getIncompatibleReasons to only contain the timestamp range issue
- Add getUnsupportedReasons for the format pattern limitation
- Make getSupportLevel return Unsupported for non-default patterns,
  Incompatible for the default pattern
@marvelshan marvelshan force-pushed the fix/from-unixtime-unsupported-format branch 2 times, most recently from beeda26 to 1a8dea4 Compare June 18, 2026 07:20

override def getSupportLevel(expr: FromUnixTime): SupportLevel = {
if (expr.format != Literal(TimestampFormatter.defaultPattern())) {
Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported"))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change may prevent the codegen dispatch path from working.

SELECT from_unixtime(t) FROM test_from_unix_time

query
query expect_fallback(Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query was previously working (presumably due to codegen dispatch). This seems like a regression.

…patterns

Verify that non-default format patterns fall back to Spark with the
expected reason, and that the default pattern runs natively when
allowIncompatible=true.
@marvelshan marvelshan force-pushed the fix/from-unixtime-unsupported-format branch from 1a8dea4 to 35ebc98 Compare June 18, 2026 09:31
@marvelshan marvelshan marked this pull request as draft June 19, 2026 02:01
@marvelshan marvelshan marked this pull request as ready for review June 22, 2026 04:21
@andygrove

Copy link
Copy Markdown
Member

Thanks @marvelshan. The instinct here is good. Separating the genuine timestamp-range incompatibility from the format-pattern caveat is the right idea, and pulling the format text out of getIncompatibleReasons() so that it only describes what allowIncompatible=true actually changes (the native to_char path) is a real improvement. The from_unix_time_enabled.sql cleanup is nice too, since the default pattern really does run natively under allowIncompatible=true and asserting that with query is more accurate.

A few things I would like to work through before this lands.

1. The code keeps this Incompatible, not Unsupported. The title and the description say getSupportLevel now returns Unsupported for non-default patterns, but both branches return Incompatible:

override def getSupportLevel(expr: FromUnixTime): SupportLevel = {
  if (expr.format != Literal(TimestampFormatter.defaultPattern())) {
    Incompatible(Some("Only the default datetime pattern ... is supported natively"))
  } else {
    Incompatible(None)
  }
}

I actually think keeping it Incompatible is correct. CometFromUnixTime is a CodegenDispatchFallback, so non-default formats are handled by the JVM codegen dispatcher rather than being truly unsupported. If this returned Unsupported, the framework would skip the dispatcher and force a full fall back to Spark, which would be a regression. So the code looks right, it is the title and the "What changes are included" section that need to be updated to match.

2. getUnsupportedReasons() will read as self-contradictory in the docs. GenerateDocs calls getUnsupportedReasons() unconditionally and never consults getSupportLevel, so this entry renders under the heading "The following cases are not supported by Comet:", while the bullet text itself says non-default patterns are handled by the codegen dispatcher. So the generated page says a case is not supported and handled in the same sentence, and that does not match runtime behavior where non-default formats run in Comet via the dispatcher by default. date_format has the same native-allow-list-plus-dispatcher shape and documents it in getCompatibleNotes() (datetime.scala:715). Could the format caveat move there instead? That keeps getIncompatibleReasons() as just the timestamp range, which is the part of this change I like, and avoids the misleading "Unsupported" heading.

3. The non-default test downgrade looks inconsistent. In from_unix_time.sql, from_unixtime(t, 'yyyy-MM-dd') moved from query to query spark_answer_only, but the default-pattern from_unixtime(t) stayed query. At allowIncompatible=false both take the same codegen-dispatch path, so if the default-pattern one passes as native, the non-default one should too. Could those stay query so they keep asserting the dispatcher holds them in Comet? If query actually fails for the non-default case, that would be worth surfacing, since it would mean the dispatcher is not covering it the way the new reason text claims.

4. Minor. getSupportLevel calls TimestampFormatter.defaultPattern() with parens while convert() just below calls defaultPattern without. The parens form matches the Spark definition, so it is fine, but the two call sites reading differently in the same object is worth tidying. Also a heads up that #4693 also edits this serde's getSupportLevel and getIncompatibleReasons for collation, so whichever of the two merges second will need a rebase.

The getIncompatibleReasons() split and the enabled-test fixes are solid. The main rework is routing the format caveat through getCompatibleNotes() rather than getUnsupportedReasons(), and squaring the title and description with the code.

Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issues, and the related serde code before forming these comments, and I take responsibility for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CometFromUnixTime reports unsupported format patterns as Incompatible instead of Unsupported

3 participants