[python] Add StartupMode enum and scan.mode option to CoreOptions#7900
[python] Add StartupMode enum and scan.mode option to CoreOptions#7900JunRuiLee wants to merge 2 commits into
Conversation
Introduce StartupMode enum and scan.mode config option, aligning with Java CoreOptions.SCAN_MODE for explicit scan startup mode control. - Add StartupMode enum with all Java values including deprecated FULL - Add scan.mode ConfigOption with DEFAULT as default value - Add startup_mode() method that resolves effective mode from scan.mode plus companion options (matching Java CoreOptions.startupMode() logic) - Maps deprecated FULL to LATEST_FULL - Auto-detects mode from scan.timestamp*, scan.snapshot-id, etc. when scan.mode is DEFAULT - Add _validate_scan_mode() in TableScan to reject invalid combinations: from-timestamp without timestamp options, from-snapshot without snapshot/tag/watermark, and streaming-only modes (latest, compacted-full) in batch reads
1bff72c to
8eae2f0
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
Good alignment with Java's CoreOptions.SCAN_MODE. A few comments:
-
startup_mode()resolution logic: The fallback fromDEFAULTtoLATEST_FULLwhen no scan options are present matches Java behavior. However, thecontains_keychecks use raw string literals like"scan.timestamp-millis"and"scan.watermark"instead ofCoreOptionsconstants. This makes it fragile if those option keys are ever refactored. Please use the constant-based approach (e.g.,CoreOptions.SCAN_TIMESTAMP_MILLIS) if they exist, or define them as constants. -
Validation whitelist: The
_validate_scan_mode()method intable_scan.pyis well-structured, but it raisesNotImplementedErrorforCOMPACTED_FULL,FROM_CREATION_TIMESTAMP,FROM_FILE_CREATION_TIME. These should probably log a warning or raise a more specific exception type that doesn't suggest a code bug (sinceNotImplementedErrortypically means the method should have been overridden). -
Missing test coverage: I don't see test files in this PR. The validation logic has many branches — at minimum there should be tests for:
scan.mode=from-timestampwithoutscan.timestamp-millis→ error- Conflicting options (e.g.,
scan.mode=latest+scan.snapshot-id) → error scan.mode=defaultwithscan.timestamp-millis→ resolves toFROM_TIMESTAMP
-
FULLdeprecation: The mapping fromFULLtoLATEST_FULLis correct but should there be a deprecation warning emitted?
Please add unit tests covering the validation paths before merging.
…e, add tests - Replace raw string literals with CoreOptions constants in _validate_scan_mode() for refactoring safety - Change NotImplementedError to ValueError for unsupported scan modes - Add unit tests covering validation paths: missing required options, conflicting options, DEFAULT auto-detection, unsupported modes, and deprecated FULL mapping
Thanks for the detailed review. I addressed the comments in the latest update. PTAL. |
Purpose
Introduce StartupMode enum (default, latest-full, latest, from-timestamp, from-snapshot, etc.) and the scan.mode config option, aligning with Java CoreOptions.SCAN_MODE for explicit scan startup mode control.