You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Normally, an export task is comitted by the thread that uploads the last part. At the same time, there is a cleanup thread that runs periodically that might try to commit tasks that are marked as PENDING and already uploaded all parts. Those tasks are considered to be in commit state, and if this happens, there may be a race condition.
Introducing a commit lock to prevent that from happening
Changelog category (leave one):
Improvement
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Prevent commit race conditions by adding a per-task commit_lock
@arthurpassos I am definitely not the best person to review this, but some questions/suggestions
Let's mention 'Export partition' in PR title.
Is replica_name definitely exists and is meaningful? Cannot it be e.g. empty string?
A bit strange that both successful and unsuccessful lock attempts logs has same severity, but i don't think that WARNING makes sense here.
What is actually happening if commit lock is already acquired by another replica?
Why not introduce a test?
Done
AFAIK, you can't create a replicated* table without either providing the replica name or having {replica} defined.
hm.. I am not very good at deciding log severities, but failing to acquire a lock is not an error by default. It just means another replica grabbed it first. So all I need is logging to keep track of it and understand what is going on internally if I have to debug it.
It means the other replica is trying to commit it, and this replica can skip it.
Testing race conditions is always hard, not sure how to cause this specific scenario. At the same time, I have 100+ export partition tests for happy and unhappy cases. They are passing
Approach looks solid
— ephemeral commit_lock + RAII release is the correct primitive.
However, One thing before merge: commit_lock is created as a child of entry_path, and the holder lives until function scope exits. If the successful-commit path does a non-recursive zk->remove(entry_path) while the lock is still held, ZK will reject it with ZNOTEMPTY. Can you confirm the entry isn't torn down inside the lock's lifetime? If it is, release before teardown or remove recursively.
(Minor: failed-acquire LOG_INFO will fire every poll for tasks another replica owns — DEBUG may fit better.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note for the reviewer:
Normally, an export task is comitted by the thread that uploads the last part. At the same time, there is a cleanup thread that runs periodically that might try to commit tasks that are marked as PENDING and already uploaded all parts. Those tasks are considered to be in commit state, and if this happens, there may be a race condition.
Introducing a commit lock to prevent that from happening
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Prevent commit race conditions by adding a per-task
commit_lockDocumentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: