Batch updates by key groups#303
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #303 +/- ##
============================================
- Coverage 81.63% 81.43% -0.21%
Complexity 1549 1549
============================================
Files 242 242
Lines 7450 7514 +64
Branches 720 726 +6
============================================
+ Hits 6082 6119 +37
- Misses 916 943 +27
Partials 452 452
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Couple of questions before I review this?
|
We did not want to implement deprecated APIs for PG, and at the same time, did not have the capability to support per-key sub-doc updates in PG, so that's why we introduced this new API: Since this API is not implemented for Mongo currently, clients are essentially using two different APIs for Mongo and PG respectively, to achieve the same result (the northstar target would be to implement this new API for Mongo as well but that was deprioritised). Now, both of these methods handle failures differently: The important part if the none of these two methods support all-or-none behaviour, so they're similar in that. The other difference is the return type - The new method does not return the exception, but only the no of keys updated. That's fine at this moment because our client does not care about the result for either of those at call-sites. |
Within a group can one failure lead to whole group failure? |
Batch updates by key groups for
bulkUpdateProblem
The previous
bulkUpdateimplementation inFlatPostgresCollectionexecuted a separate SQL UPDATE per key, even when multiple keys shared identical update operations (same columns, operators, and paths). For bulk updates with N keys, this meant N individual database round-trips — each preparing and executing its ownPreparedStatement.This is inefficient when many keys receive the same type of update, which is a common pattern in practice.
Solution
Keys are now grouped by their "update shape" — a canonical key derived from the sorted combination of
column:operator:path— and each group is executed as a single JDBC batch usingPreparedStatement.addBatch()/executeBatch().New components
groupKeysByUpdateShape()— Iterates over all(Key, Collection<SubDocumentUpdate>)entries, validates and resolves columns, then buckets keys into groups that share the same shape key.computeUpdateShapeKey()— Builds a deterministic string signature by sorting updates by path and concatenatingcolumn:operator:path;for each. Keys with identical signatures share a SQL template.KeyUpdateGroup(inner class) — Holds the resolved columns, list of keys, and per-key update values for a single group.executeBatchUpdate()— Builds onePreparedStatementfrom the group's SQL template, then loops over all keys in the group: binds per-key parameter values (including thelastUpdatedTscolumn), callsaddBatch(), and finallyexecuteBatch()in a single round-trip.What changed in the existing flow
bulkUpdate→ loop per key →updateSingleKey()→executeKeyUpdate()bulkUpdate→groupKeysByUpdateShape()→ loop per group →executeBatchUpdate()Set<Key>intfromexecuteBatch()resultsupdateSingleKey()methodFiles changed
FlatPostgresCollection.java(+189 / −41) — Core implementation: replaced per-key loop with grouping and JDBC batching. AddedgroupKeysByUpdateShape(),computeUpdateShapeKey(),executeBatchUpdate(), andKeyUpdateGroupinner class. RemovedupdateSingleKey().FlatCollectionWriteTest.java(+118) — New integration testtestBulkUpdateMultipleGroupsComplexOperationsthat exercises 3 distinct update groups across 7 keys:SETon primitive field +APPEND_TO_LISTon arraySETon nested JSONB fieldsADDon numeric field +REMOVE_ALL_FROM_LISTon arrayPerformance Gains
For around 4k QPM, we had the following before these changes:
For the same QPM, these were the results after the changes: