Batch updates by key groups by suddendust · Pull Request #303 · hypertrace/document-store

suddendust · 2026-04-30T09:44:36Z

Batch updates by key groups for `bulkUpdate`

Problem

The previous bulkUpdate implementation in FlatPostgresCollection executed a separate SQL UPDATE per key, even when multiple keys shared identical update operations (same columns, operators, and paths). For bulk updates with N keys, this meant N individual database round-trips — each preparing and executing its own PreparedStatement.

Key 1: UPDATE t SET price=?, tags=... WHERE pk = ?   -- round-trip 1
Key 5: UPDATE t SET price=?, tags=... WHERE pk = ?   -- round-trip 2  (same SQL shape!)
Key 8: UPDATE t SET price=?, tags=... WHERE pk = ?   -- round-trip 3  (same SQL shape!)

This is inefficient when many keys receive the same type of update, which is a common pattern in practice.

Solution

Keys are now grouped by their "update shape" — a canonical key derived from the sorted combination of column:operator:path — and each group is executed as a single JDBC batch using PreparedStatement.addBatch() / executeBatch().

New components

groupKeysByUpdateShape() — Iterates over all (Key, Collection<SubDocumentUpdate>) entries, validates and resolves columns, then buckets keys into groups that share the same shape key.
computeUpdateShapeKey() — Builds a deterministic string signature by sorting updates by path and concatenating column:operator:path; for each. Keys with identical signatures share a SQL template.
KeyUpdateGroup (inner class) — Holds the resolved columns, list of keys, and per-key update values for a single group.
executeBatchUpdate() — Builds one PreparedStatement from the group's SQL template, then loops over all keys in the group: binds per-key parameter values (including the lastUpdatedTs column), calls addBatch(), and finally executeBatch() in a single round-trip.

What changed in the existing flow

Before	After
`bulkUpdate` → loop per key → `updateSingleKey()` → `executeKeyUpdate()`	`bulkUpdate` → `groupKeysByUpdateShape()` → loop per group → `executeBatchUpdate()`
N keys = N round-trips	N keys in G groups = G round-trips (G ≤ N)
Tracked updated keys via `Set<Key>`	Tracks total updated count via `int` from `executeBatch()` results
`updateSingleKey()` method	Removed — grouping + batch replaces it

Files changed

FlatPostgresCollection.java (+189 / −41) — Core implementation: replaced per-key loop with grouping and JDBC batching. Added groupKeysByUpdateShape(), computeUpdateShapeKey(), executeBatchUpdate(), and KeyUpdateGroup inner class. Removed updateSingleKey().
FlatCollectionWriteTest.java (+118) — New integration test testBulkUpdateMultipleGroupsComplexOperations that exercises 3 distinct update groups across 7 keys:
- Group 1 (keys 1, 5, 8): SET on primitive field + APPEND_TO_LIST on array
- Group 2 (keys 3, 7): SET on nested JSONB fields
- Group 3 (keys 2, 6): ADD on numeric field + REMOVE_ALL_FROM_LIST on array

Performance Gains

For around 4k QPM, we had the following before these changes:


Percentile | Value
-- | --
Avg | 2,391 ms
p50 | 2,691 ms
p90 | 3,397 ms
p95 | 3,650 ms
Max | 7,358 ms

For the same QPM, these were the results after the changes:


Percentile | Latency
-- | --
Avg | 221.96 ms
p50 | 248.32 ms
p90 | 281.17 ms
p95 | 294.49 ms
p99 | N/A
Max | 3,651 ms (single spike)

codecov · 2026-04-30T09:46:10Z

Codecov Report

❌ Patch coverage is 83.75000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.43%. Comparing base (f25bebd) to head (3dec2d5).

Files with missing lines	Patch %	Lines
...documentstore/postgres/FlatPostgresCollection.java	83.75%	11 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #303      +/-   ##
============================================
- Coverage     81.63%   81.43%   -0.21%     
  Complexity     1549     1549              
============================================
  Files           242      242              
  Lines          7450     7514      +64     
  Branches        720      726       +6     
============================================
+ Hits           6082     6119      +37     
- Misses          916      943      +27     
  Partials        452      452

Flag	Coverage Δ
integration	`81.43% <83.75%> (-0.21%)`	⬇️
unit	`55.50% <0.00%> (-0.48%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-04-30T09:47:12Z

Test Results

124 files ±0 124 suites ±0 37s ⏱️ ±0s
840 tests +2 839 ✅ +2 1 💤 ±0 0 ❌ ±0
1 176 runs +2 1 175 ✅ +2 1 💤 ±0 0 ❌ ±0

Results for commit 3dec2d5. ± Comparison against base commit f25bebd.

♻️ This comment has been updated with latest results.

…/292

puneet-traceable · 2026-06-01T05:49:35Z

Couple of questions before I review this?

What kind of updates are possible here? id based or other conditions as well?
How do we handle failures here? How is this handled for batch? How do they defer from mongo based implementation?

suddendust · 2026-06-01T09:48:11Z

@puneet-traceable

This API is for key based updates only, as it accepts a map of key to their corresponding sub-doc updates. The query template is:

UPDATE <table> SET ... WHERE <pk> = ?.

So actually, this API is not implemented for Mongo. Mongo uses the deprecated:

  @Deprecated(
    forRemoval = true
  )
  BulkUpdateResult bulkUpdateSubDocs(Map<Key, Map<String, Document>> var1) throws Exception;

We did not want to implement deprecated APIs for PG, and at the same time, did not have the capability to support per-key sub-doc updates in PG, so that's why we introduced this new API:

BulkUpdateResult bulkUpdate(
      Map<Key, Collection<SubDocumentUpdate>> updates, UpdateOptions updateOptions)

Since this API is not implemented for Mongo currently, clients are essentially using two different APIs for Mongo and PG respectively, to achieve the same result (the northstar target would be to implement this new API for Mongo as well but that was deprioritised).

Now, both of these methods handle failures differently:

| Aspect | Mongo bulkUpdateSubDocs | Flat Postgres bulkUpdate(Map) (this PR) |
| --- | --- | --- |
| Ordering | ordered(true) - sequential, fail-fast | Grouped by shape, continues past a failing group |
| On error | Throws MongoBulkWriteException, stops remaining ops | Swallows (logs warn), proceeds with other groups |
| Atomicity | None (no txn); ops before failure persist | None (autocommit); succeeded groups persist |
| Which ops persist on failure | Only those before the first error | Any group that succeeds (incl. ones after a failed group) |
| Retry | None | None |
| Result on failure | Exception to caller | Partial success count, no exception |

The important part if the none of these two methods support all-or-none behaviour, so they're similar in that. The other difference is the return type - The new method does not return the exception, but only the no of keys updated. That's fine at this moment because our client does not care about the result for either of those at call-sites.

puneet-traceable · 2026-06-01T18:01:16Z

The important part if the none of these two methods support all-or-none behaviour, so they're similar in that. The other difference is the return type - The new method does not return the exception, but only the no of keys updated. That's fine at this moment because our client does not care about the result for either of those at call-sites.

Within a group can one failure lead to whole group failure?

Batch updated by key groups

d869af5

suddendust requested review from avinashkolluru, kotharironak, puneet-traceable, skjindal93 and suresh-prakash as code owners April 30, 2026 09:44

suddendust and others added 3 commits May 18, 2026 10:14

Merge branch 'main' into feat/292

fb9132c

WIP

5db4adb

Merge branch 'main' into feat/292

9e532a2

suddendust changed the title ~~[Draft] Batch updated by key groups~~ [Draft] Batch updates by key groups May 27, 2026

suddendust added 3 commits May 27, 2026 11:02

Merge branch 'main' of github.com:hypertrace/document-store into feat…

1038a4c

…/292

Spotless

87b8a54

Merge remote-tracking branch 'origin/feat/292' into feat/292

3dec2d5

suddendust changed the title ~~[Draft] Batch updates by key groups~~ Batch updates by key groups May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch updates by key groups#303

Batch updates by key groups#303
suddendust wants to merge 7 commits into
mainfrom
feat/292

suddendust commented Apr 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

puneet-traceable commented Jun 1, 2026

Uh oh!

suddendust commented Jun 1, 2026 •

edited

Loading

Uh oh!

puneet-traceable commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suddendust commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Batch updates by key groups for bulkUpdate

Problem

Solution

New components

What changed in the existing flow

Files changed

Performance Gains

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

puneet-traceable commented Jun 1, 2026

Uh oh!

suddendust commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puneet-traceable commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

suddendust commented Apr 30, 2026 •

edited

Loading

Batch updates by key groups for `bulkUpdate`

codecov Bot commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading

suddendust commented Jun 1, 2026 •

edited

Loading