Skip to content

[python] Add StartupMode enum and scan.mode option to CoreOptions#7900

Open
JunRuiLee wants to merge 2 commits into
apache:masterfrom
JunRuiLee:pypaimon-scan-mode-support
Open

[python] Add StartupMode enum and scan.mode option to CoreOptions#7900
JunRuiLee wants to merge 2 commits into
apache:masterfrom
JunRuiLee:pypaimon-scan-mode-support

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Purpose

Introduce StartupMode enum (default, latest-full, latest, from-timestamp, from-snapshot, etc.) and the scan.mode config option, aligning with Java CoreOptions.SCAN_MODE for explicit scan startup mode control.

@JunRuiLee JunRuiLee marked this pull request as draft May 19, 2026 11:47
@JunRuiLee JunRuiLee changed the title [python] Add StartupMode enum and scan.mode option to CoreOptions [WIP][python] Add StartupMode enum and scan.mode option to CoreOptions May 19, 2026
Introduce StartupMode enum and scan.mode config option, aligning with
Java CoreOptions.SCAN_MODE for explicit scan startup mode control.

- Add StartupMode enum with all Java values including deprecated FULL
- Add scan.mode ConfigOption with DEFAULT as default value
- Add startup_mode() method that resolves effective mode from scan.mode
  plus companion options (matching Java CoreOptions.startupMode() logic)
- Maps deprecated FULL to LATEST_FULL
- Auto-detects mode from scan.timestamp*, scan.snapshot-id, etc. when
  scan.mode is DEFAULT
- Add _validate_scan_mode() in TableScan to reject invalid
  combinations: from-timestamp without timestamp options,
  from-snapshot without snapshot/tag/watermark, and streaming-only
  modes (latest, compacted-full) in batch reads
@JunRuiLee JunRuiLee force-pushed the pypaimon-scan-mode-support branch from 1bff72c to 8eae2f0 Compare May 23, 2026 16:20
@JunRuiLee JunRuiLee marked this pull request as ready for review May 23, 2026 16:21
@JunRuiLee JunRuiLee changed the title [WIP][python] Add StartupMode enum and scan.mode option to CoreOptions [python] Add StartupMode enum and scan.mode option to CoreOptions May 23, 2026
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good alignment with Java's CoreOptions.SCAN_MODE. A few comments:

  1. startup_mode() resolution logic: The fallback from DEFAULT to LATEST_FULL when no scan options are present matches Java behavior. However, the contains_key checks use raw string literals like "scan.timestamp-millis" and "scan.watermark" instead of CoreOptions constants. This makes it fragile if those option keys are ever refactored. Please use the constant-based approach (e.g., CoreOptions.SCAN_TIMESTAMP_MILLIS) if they exist, or define them as constants.

  2. Validation whitelist: The _validate_scan_mode() method in table_scan.py is well-structured, but it raises NotImplementedError for COMPACTED_FULL, FROM_CREATION_TIMESTAMP, FROM_FILE_CREATION_TIME. These should probably log a warning or raise a more specific exception type that doesn't suggest a code bug (since NotImplementedError typically means the method should have been overridden).

  3. Missing test coverage: I don't see test files in this PR. The validation logic has many branches — at minimum there should be tests for:

    • scan.mode=from-timestamp without scan.timestamp-millis → error
    • Conflicting options (e.g., scan.mode=latest + scan.snapshot-id) → error
    • scan.mode=default with scan.timestamp-millis → resolves to FROM_TIMESTAMP
  4. FULL deprecation: The mapping from FULL to LATEST_FULL is correct but should there be a deprecation warning emitted?

Please add unit tests covering the validation paths before merging.

…e, add tests

- Replace raw string literals with CoreOptions constants in
  _validate_scan_mode() for refactoring safety
- Change NotImplementedError to ValueError for unsupported scan modes
- Add unit tests covering validation paths: missing required options,
  conflicting options, DEFAULT auto-detection, unsupported modes, and
  deprecated FULL mapping
@JunRuiLee
Copy link
Copy Markdown
Contributor Author

Good alignment with Java's CoreOptions.SCAN_MODE. A few comments:

  1. startup_mode() resolution logic: The fallback from DEFAULT to LATEST_FULL when no scan options are present matches Java behavior. However, the contains_key checks use raw string literals like "scan.timestamp-millis" and "scan.watermark" instead of CoreOptions constants. This makes it fragile if those option keys are ever refactored. Please use the constant-based approach (e.g., CoreOptions.SCAN_TIMESTAMP_MILLIS) if they exist, or define them as constants.

  2. Validation whitelist: The _validate_scan_mode() method in table_scan.py is well-structured, but it raises NotImplementedError for COMPACTED_FULL, FROM_CREATION_TIMESTAMP, FROM_FILE_CREATION_TIME. These should probably log a warning or raise a more specific exception type that doesn't suggest a code bug (since NotImplementedError typically means the method should have been overridden).

  3. Missing test coverage: I don't see test files in this PR. The validation logic has many branches — at minimum there should be tests for:

    • scan.mode=from-timestamp without scan.timestamp-millis → error
    • Conflicting options (e.g., scan.mode=latest + scan.snapshot-id) → error
    • scan.mode=default with scan.timestamp-millis → resolves to FROM_TIMESTAMP
  4. FULL deprecation: The mapping from FULL to LATEST_FULL is correct but should there be a deprecation warning emitted?

Please add unit tests covering the validation paths before merging.

Thanks for the detailed review. I addressed the comments in the latest update. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants