Skip to content

Control sharding codec read coalescing with ArrayConfig and runtime config options#3987

Open
aldenks wants to merge 10 commits into
zarr-developers:mainfrom
aldenks:sharding-coalesce-config-options
Open

Control sharding codec read coalescing with ArrayConfig and runtime config options#3987
aldenks wants to merge 10 commits into
zarr-developers:mainfrom
aldenks:sharding-coalesce-config-options

Conversation

@aldenks
Copy link
Copy Markdown
Contributor

@aldenks aldenks commented May 20, 2026

Follow up #3004 by adding ArrayConfig and runtime configuration options for the thresholds that control how requests are coalesced when reading in the sharding codec.

Two new fields on ArrayConfig control how the sharding codec coalesces partial-shard reads: sharding_coalesce_max_gap_bytes (default 1 MiB) and sharding_coalesce_max_bytes (default 16 MiB). When reading multiple chunks from the same shard, nearby byte ranges are merged into a single request to the store if separated by no more than sharding_coalesce_max_gap_bytes and the merged read stays within sharding_coalesce_max_bytes. Defaults are seeded from the matching array.sharding_coalesce_max_gap_bytes / array.sharding_coalesce_max_bytes keys in zarr.config at array-creation time, and can be overridden per array by passing config={...} to zarr.create_array.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 20, 2026
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.49%. Comparing base (1907ad6) to head (d8acf47).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3987   +/-   ##
=======================================
  Coverage   93.49%   93.49%           
=======================================
  Files          88       88           
  Lines       11861    11873   +12     
=======================================
+ Hits        11089    11101   +12     
  Misses        772      772           
Files with missing lines Coverage Δ
src/zarr/codecs/sharding.py 92.13% <ø> (ø)
src/zarr/core/array_spec.py 100.00% <100.00%> (ø)
src/zarr/core/common.py 92.94% <100.00%> (+0.17%) ⬆️
src/zarr/core/config.py 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented May 20, 2026

disclaimer: I'm not a big fan of our global config object, so I'd like to explore some alternative ways for the sharding reads to access this configuration.

A few options:

  • new attributes on the sharding codec
    I'm not a big fan of this, because declaring the sharding codec explicitly from create_array is tedious, and also because we want to move away from the codecs knowing too much about IO operations.
  • new fields on ArrayConfig
    probably the best option. we still use the global config for setting the defaults, but the sharding codec gets these parameters from the array config object, which is tied to the array, not a mutable global.

@aldenks
Copy link
Copy Markdown
Contributor Author

aldenks commented May 20, 2026

@d-v-b I also like new fields on ArrayConfig. Thinking that through:

  • This allows you to set coalesce options differently per array
  • If you only use global configs, whatever global setting you have at the time of array open is what is used for the life of that array.
  • The global config field becomes something like array.sharding_coalesce_max_gap_bytes and array.sharding_coalesce_max_bytes matching the ArrayConfig convention of pulling from a singly nested field under array..
  • For the time being, if you're interacting with zarr via xarray then you still can only set these via the global config but that's a pre-existing inability to specify the ArrayConfig via when opening via xarray.

That sound alright?

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented May 21, 2026

@d-v-b I also like new fields on ArrayConfig. Thinking that through:

* This allows you to set coalesce options differently per array

* If you only use global configs, whatever global setting you have _at the time of array open_ is what is used for the life of that array.

* The global config field becomes something like `array.sharding_coalesce_max_gap_bytes` and `array.sharding_coalesce_max_bytes` matching the ArrayConfig convention of pulling from a singly nested field under `array.`.

* For the time being, if you're interacting with zarr via xarray then you still can only set these via the global config but that's a pre-existing inability to specify the ArrayConfig via when opening via xarray.

That sound alright?

yeah, that sounds right. the array config object is designed to make it easy to get a cheap copy of an array with a new config, using the with_config method. Unfortunately, xarray makes it very hard to use this, because xarray doesn't give access to the base zarr array. So until xarray adds a zarr array-config-aware API, the global config is the only knob xarray users have, without re-creating the dataarray entirely.

@aldenks aldenks changed the title Add sharding.read.* coalescing runtime config options Control sharding codec read coalescing with ArrayConfig and runtime config options May 22, 2026
@aldenks
Copy link
Copy Markdown
Contributor Author

aldenks commented May 22, 2026

ArrayConfig options implemented, pulling from the global config if not set explicitly. This is ready for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants