Skip to content

[AURON #2366] fix: Handle Paimon metadata columns in V2 native scan#2367

Open
lyne7-sc wants to merge 2 commits into
apache:masterfrom
lyne7-sc:fix/paimon_meta
Open

[AURON #2366] fix: Handle Paimon metadata columns in V2 native scan#2367
lyne7-sc wants to merge 2 commits into
apache:masterfrom
lyne7-sc:fix/paimon_meta

Conversation

@lyne7-sc

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2366

Rationale for this change

Paimon metadata columns are produced by the Paimon scan layer rather than stored as physical columns in data files. The Paimon V2 native scan was passing these columns to the native Parquet/ORC reader as file columns, which can return incorrect values.

For example:

create table paimon.db.t_metadata (id int, v string) using paimon;
insert into paimon.db.t_metadata values (1, 'a');
select id, __paimon_file_path from paimon.db.t_metadata;

The native path returned null for __paimon_file_path, while Spark/Paimon's scan path returns the actual file path.

What changes are included in this PR?

  • Recognize Paimon metadata columns using PaimonMetadataColumn.
  • Materialize supported file-level metadata columns (__paimon_file_path, __paimon_bucket) as per-file constants.
  • Keep unsupported Paimon metadata columns on Spark/Paimon's scan path instead of reading them from Parquet/ORC files.
  • Cover metadata columns both with and without table partition columns.

Are there any user-facing changes?

No API changes. This is a correctness fix for Paimon V2 native scan.

How was this patch tested?

Adds Paimon V2 integration tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Paimon V2 native scan does not handle metadata columns correctly

1 participant