Skip to content

[core] Fix silent data corruption due to overflow in FieldProductAgg#7906

Open
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/fix-product-overflow
Open

[core] Fix silent data corruption due to overflow in FieldProductAgg#7906
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/fix-product-overflow

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented May 19, 2026

Purpose

  • FieldProductAgg multiplies integers without overflow detection.
  • Accumulator silently wraps corrupting downstream product aggregations.
  • Throw on overflow using Math multiplyExact and range checks.
  • Previously the data corruption was silently ignored. However with this change, an overflow throws a hard exception. Downstream consumers should be aware.

Tests

  • UT

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @JingsongLi thanks! :)

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @leaves12138 @JingsongLi can you pls review thanks!

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of FieldProductAgg overflow detection

The intent is good — silent integer overflow during aggregation is indeed a correctness bug that corrupts data. The agg() fix for INTEGER/BIGINT using Math.multiplyExact is clean. The toExactByte/toExactShort helpers are correct since Java promotes byte * byte and short * short to int before the multiplication, so the int result can be range-checked without itself overflowing.

Issues

1. retract() method has the same overflow bug but is not fixed

paimon-core/src/main/java/org/apache/paimon/mergetree/compact/aggregate/FieldProductAgg.java

The retract method performs division, which can also overflow silently in the same cases:

  • (byte) ((byte) Byte.MIN_VALUE / (byte) -1) → result is 128, which wraps to -128 when cast to byte
  • (short) ((short) Short.MIN_VALUE / (short) -1) → result is 32768, wraps to -32768
  • (int) Integer.MIN_VALUE / (int) -1 → wraps to Integer.MIN_VALUE in Java
  • (long) Long.MIN_VALUE / (long) -1 → wraps to Long.MIN_VALUE in Java

If the goal of this PR is to eliminate silent data corruption from overflow, the retract() path should get the same treatment for consistency. At minimum, the TINYINT and SMALLINT cases need the same range-check pattern since the int promotion makes the overflow detectable. For INTEGER and BIGINT, you would need explicit checks (if (accumulator == Integer.MIN_VALUE && inputField == -1) etc.) since Java's / operator does not throw.

2. Exception message could include operand values for debuggability

When a compaction task fails with ArithmeticException("byte overflow"), the operator has no idea which values caused it. Consider including the operands:

throw new ArithmeticException(
    String.format("byte overflow: %d * %d = %d", (byte) accumulator, (byte) inputField, value));

This is especially important because this exception will surface during compaction, potentially long after the data was written, making reproduction difficult.

3. Consider the failure mode during compaction

An unchecked ArithmeticException thrown during compaction will fail the compaction task. Depending on the retry/error-handling policy, this could block compaction indefinitely for a partition that has overflowing values. This is arguably better than silent corruption, but it would be worth documenting this behavioral change (e.g., in the PR description or release notes) so users understand that previously-silent overflows will now cause task failures.

Minor notes

  • The test cases are well chosen (boundary values, both positive and negative overflow). Good coverage.
  • FLOAT/DOUBLE overflow to infinity is not addressed, which is fine — that is standard IEEE 754 semantics and expected for floating-point types.
  • FieldSumAgg has the same class of overflow bugs, but that is out of scope for this PR.

Overall this is a useful correctness fix. The main request is to apply the same protection to retract() for completeness, since a half-fix leaves an inconsistency where agg() detects overflow but retract() on the same type silently corrupts.

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-product-overflow branch from 53057d1 to 0885945 Compare May 23, 2026 15:00
@ArnavBalyan
Copy link
Copy Markdown
Member Author

ArnavBalyan commented May 23, 2026

Thanks for the review! Have addressed all comments:

  1. retract() method has the same overflow bug but is not fixed

    • Covered all cases, division and multiplication are now guarded and will not silently corrupt data.
  2. Exception message could include operand values for debuggability

    • Made the exception more clear and descriptive.
  3. Consider the failure mode during compaction

    • Added more documentation to this PR for posterity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants