[SYNSD-2567] Fix httpx.ResponseNotRead crash in download chunk retry path by BryanFauble · Pull Request #1413 · Sage-Bionetworks/synapsePythonClient

BryanFauble · 2026-06-22T21:16:30Z

Problem

A user reported a crash while downloading an 8 MiB+ S3 file (multi-threaded download path). A chunk range request hit a retryable HTTP status, and the retry machinery then crashed trying to read the body of a streaming response that was never read:

download_async.py:477  _stream_and_write_chunk
  -> retry.py:498       with_retry_time_based
  -> retry.py:576       _is_retryable        response_message = response_message or _get_message(response)
  -> retry.py:661       _get_message         return response.text
  -> httpx _models.py   raise ResponseNotRead()

The download eventually succeeded because the outer count-based retry in download_by_file_handle() (60 retries / ~30 min) re-ran the whole download. So this is a resiliency bug, not data loss: a transient retryable status (e.g. 503/429 from S3) that should be silently retried at the chunk level instead raised an unexpected ResponseNotRead, aborted the file, deleted it, and forced a full-file restart — wasting the work of every other in-flight chunk.

Root cause

The streaming download deliberately tells the retry layer not to read response bodies, because the body is an unconsumed stream:

download_async.py calls with_retry_time_based(..., read_response_content=False).
_execute_stream_and_write_chunk opens with session.stream(...) as response: and calls _raise_for_status_httpx(response, read_response_content=False), which raises SynapseHTTPError without reading the body and attaches the unread, stream-closed response as ex.response.

read_response_content was only forwarded to _log_for_retry, not to _is_retryable. So _is_retryable unconditionally called _get_message(response) → response.text, raising httpx.ResponseNotRead. Its except (AttributeError, ValueError) did not catch it — httpx.ResponseNotRead is a RuntimeError subclass (ResponseNotRead -> StreamError -> RuntimeError) — so the exception escaped the retry loop it was meant to drive.

This affected both with_retry_time_based and with_retry_time_based_async.

Download lifecycle (REST API as source of truth)

S3 file handle, multi-threaded path. Range requests must return 206 Partial Content per the Synapse/S3 REST contract; transient throttling/availability errors come back as 429/503 etc. and are meant to be retried.

sequenceDiagram
    autonumber
    participant DBH as download_by_file_handle()<br/>(outer retry: 60x / ~30min)
    participant MT as download_from_url_multi_threaded()
    participant DL as _MultithreadedDownloader.download_file()
    participant EX as _execute_download_tasks()
    participant CW as _stream_and_write_chunk()<br/>(with_retry_time_based)
    participant ST as _execute_stream_and_write_chunk()
    participant S3 as S3 (presigned URL)

    DBH->>MT: download (file > 8MiB, S3_FILE_HANDLE)
    MT->>DL: download_file(request)
    DL->>S3: GET (stream) Content-Length  [retry 403, read_response_content=False]
    S3-->>DL: 200 + headers
    DL->>DL: _prep_file() (truncate to empty)
    DL->>EX: create N chunk tasks (8MiB ranges)

    loop each chunk (concurrent, thread pool)
        EX->>CW: _stream_and_write_chunk_wrapper -> run_in_executor
        CW->>ST: with_retry_time_based(expected=206, read_response_content=False)
        ST->>S3: GET Range: bytes=start-end  (session.stream)
        alt 206 Partial Content (happy path)
            S3-->>ST: 206 + body stream
            ST->>ST: response.read(), _write_chunk(seek+write), record span
            ST-->>CW: end byte
        else transient 429/503 (THE BUG)
            S3-->>ST: 503 (body NOT read)
            ST->>ST: _raise_for_status_httpx(read_response_content=False)<br/>raises SynapseHTTPError(response=unread stream)
            ST-->>CW: raises
            Note over CW: _is_retryable() ignores read_response_content,<br/>calls _get_message()->response.text<br/>=> httpx.ResponseNotRead  ❌ escapes retry
        end
    end

    EX->>EX: completed_task.result() re-raises ResponseNotRead
    EX->>EX: set _aborted, log "Failed downloading", os.remove(path)
    EX-->>DL: raise
    DL-->>MT: raise
    MT-->>DBH: raise
    DBH->>DBH: is_retryable_download_error? -> restart ENTIRE file (wasteful)

…path The multi-threaded chunk download passes read_response_content=False to the retry layer because the chunk's HTTP body is an unconsumed httpx stream. That flag was forwarded to _log_for_retry but not to _is_retryable, so on a retryable status (e.g. 503/429 from S3) _is_retryable called _get_message() -> response.text on the unread stream, raising httpx.ResponseNotRead. Because that is a RuntimeError subclass, _get_message's except did not catch it and the exception escaped the retry loop, aborting the file and forcing a wasteful full-file restart via the outer retry. - Thread read_response_content into _is_retryable and guard both _get_message call sites with it (status-code retry decisions need no body) - Defensively widen _get_message's except to include httpx.StreamError (parent of httpx.ResponseNotRead) - Add sync + async regression tests using a genuinely unread httpx streaming response

Copilot

Pull request overview

Fixes a resiliency bug in the retry layer where handling a retryable HTTP status for an unread streaming httpx.Response could crash the retry loop with httpx.ResponseNotRead (notably in the multi-threaded download chunk retry path). This ensures chunk-level retries work as intended without aborting and restarting the entire file download.

Changes:

Thread read_response_content into _is_retryable() and propagate it from both sync/async time-based retry helpers.
Make _get_message() resilient to unread streaming responses by catching httpx.StreamError (covers ResponseNotRead).
Add sync + async regression tests that reproduce the unread-streaming-response scenario and assert the retry loop does not crash.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`synapseclient/core/retry.py`	Propagates `read_response_content` into retry decision logic and hardens message extraction for streaming responses.
`tests/unit/synapseclient/core/unit_test_retry.py`	Adds regression tests covering unread `httpx` streaming responses for both sync and async time-based retries.

Comments suppressed due to low confidence (1)

synapseclient/core/retry.py:607

_is_retryable() can crash when _get_message(response) returns None (e.g., truncated body or unread/streaming body). In that case response_message.lower() raises AttributeError, which will escape the retry loop. This becomes more likely now that _get_message catches httpx.StreamError and returns None. Guard against a None message before calling .lower() / substring checks.

        elif read_response_content and response.status_code not in range(200, 299):
            # For all other non 200 messages look for retryable errors in the body or reason field
            response_message = response_message or _get_message(response)
            if (
                any([msg.lower() in response_message.lower() for msg in retry_errors])
                # special case for message throttling
                or response_message
                and (
                    "Please slow down.  You may send a maximum of 10 message"
                    in response_message
                )
            ):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

andrewelamb

I found a possible bug unrelated to the actual ticket. LGTM!

andrewelamb · 2026-06-23T15:35:30Z

            # For all other non 200 messages look for retryable errors in the body or reason field
            response_message = response_message or _get_message(response)
            if (
                any([msg.lower() in response_message.lower() for msg in retry_errors])


This is out of scope for this ticket, but response_message could be None at this point and calling lower on it would cause an exception.

thomasyu888

🔥 LGTM! Thanks for doing this, the diagram was really helpful in stepping through where read_response_content is False, and the comment on httpx.ResponseNotRead being a subclass of httpx.StreamError is helpful.

linglp

Thanks @BryanFauble ! I spent some time understanding the problem and the fixes. It seems like that the real fix is the guard in _is_retryable to avoid reading response content, and the fix in _get_message is a good defensive addition.

BryanFauble requested a review from a team as a code owner June 22, 2026 21:16

Copilot AI review requested due to automatic review settings June 22, 2026 21:16

Copilot started reviewing on behalf of BryanFauble June 22, 2026 21:17 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

thomasyu888 requested review from andrewelamb and linglp June 22, 2026 23:02

andrewelamb approved these changes Jun 23, 2026

View reviewed changes

thomasyu888 approved these changes Jun 23, 2026

View reviewed changes

linglp approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYNSD-2567] Fix httpx.ResponseNotRead crash in download chunk retry path#1413

[SYNSD-2567] Fix httpx.ResponseNotRead crash in download chunk retry path#1413
BryanFauble wants to merge 1 commit into
developfrom
synsd-2567-download-retry

BryanFauble commented Jun 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

andrewelamb left a comment

Uh oh!

andrewelamb Jun 23, 2026

Uh oh!

thomasyu888 left a comment

Uh oh!

linglp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

BryanFauble commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Download lifecycle (REST API as source of truth)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

andrewelamb left a comment

Choose a reason for hiding this comment

Uh oh!

andrewelamb Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

thomasyu888 left a comment

Choose a reason for hiding this comment

Uh oh!

linglp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BryanFauble commented Jun 22, 2026 •

edited

Loading