Fix overflow and precision issues in total parts calculation in presigned URL multipart download#7026
Open
jencymaryjoseph wants to merge 1 commit into
Conversation
0a3ec57 to
1acedb8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
The S3 presigned URL multipart download feature uses range-based GETs to download large objects in chunks. The number of parts is calculated client-side from the total object size and the configured part size.
Unlike the standard multipart download (
MultipartDownloaderSubscriber), which readstotalPartsfrom thex-amz-mp-parts-countresponse header (always ≤ 10,000 since S3 caps multipart uploads at 10,000 parts), the presigned URL multipart download calculatestotalPartsclient-side fromcontentLength / partSize.This calculation has no S3-enforced upper bound — the only constraint is the configuredminimumPartSizeInBytes.The current implementation has two issues:
Issue 1: Floating-point precision loss Casting long → double loses precision for values > 2^53. With small partSize, the division loses precision and the result can be off.
Issue 2: Silent integer saturation When contentLength / partSize > Integer.MAX_VALUE, the (int) cast on the Math.ceil result silently saturates to Integer.MAX_VALUE rather than failing. The downloader uses this incorrect totalParts, the completion gate (completedParts == totalParts) fires before all data is downloaded, and the customer receives a truncated download reported as success
When the bug is reachable: The SDK does not enforce a minimum on
MultipartConfiguration.minimumPartSizeInBytes(), so a customer can set it as low as 1L. With sub-KB part sizes against multi-GB objects, contentLength / partSize exceeds Integer.MAX_VALUE and the bug manifests.Modifications
Replaced the floating-point ceiling division with exact integer arithmetic and
Changed calculateTotalParts return type from int to long
Changed totalParts field from Integer to Long in both subscribers
Testing
MultipartDownloadUtilsTest):Existing cases continue to pass (last-part smaller, exact boundary, contentLength == partSize, contentLength == 0, contentLength < partSize).
Added 5 GiB / 1 byte → returns 5_368_709_120 (exceeds Integer.MAX_VALUE, previously saturated).
Added Long.MAX_VALUE - 1 with partSize = 1 → returns Long.MAX_VALUE - 1 (verifies no overflow at the upper bound).
PresignedUrlDownloadHelperTest):All validatePartResponse call sites updated from 2 to 2L to match the new Long parameter type.
Screenshots (if appropriate)
Types of changes
Checklist
mvn installsucceedsscripts/new-changescript and following the instructions. Commit the new file created by the script in.changes/next-releasewith your changes.License