Updates clusteroperator_controller progressing#473
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
WalkthroughThe controller's ChangesOperand Convergence and Status Progression
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 12 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/controllers/clusteroperator_controller_test.go (1)
1144-1265: ⚡ Quick winAdd edge-case tests for stale generations and daemonset availability.
The new suite doesn’t guard against two important false-positive cases: stale
ObservedGenerationandNumberUnavailable > 0. Adding these would protect the convergence contract from regressing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/clusteroperator_controller_test.go` around lines 1144 - 1265, Add two edge-case tests to protect against false-positive rollout completion checks. In the Deployment context, add a test case that verifies isDeploymentRolloutComplete returns false when ObservedGeneration in the status is stale (less than the deployment's generation in spec), even if the Progressing condition indicates success. In the DaemonSet context, add a test case that verifies isDaemonSetRolloutComplete returns false when NumberUnavailable is greater than 0, even when UpdatedNumberScheduled matches or exceeds DesiredNumberScheduled and NumberMisscheduled is 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controllers/clusteroperator_controller.go`:
- Around line 168-174: When operands have not yet converged (the `!converged`
condition in the clusteroperator_controller.go around lines 168-174), the
function returns true without updating the ClusterOperator status. This leaves
the Progressing condition as False even though operands are actively rolling
out. Update the ClusterOperator status to set Progressing=True before returning
when converged is false, ensuring the status accurately reflects that operands
are still in the process of rolling out.
- Around line 204-218: The isDeploymentRolloutComplete function must be enhanced
to verify the controller has observed the current generation and that all
replicas are actually updated and available. In addition to checking the
Progressing condition, add validations that deploy.Status.ObservedGeneration >=
deploy.Metadata.Generation, deploy.Status.UpdatedReplicas >=
deploy.Spec.Replicas, and deploy.Status.AvailableReplicas >=
deploy.Spec.Replicas. Similarly, the isDaemonSetRolloutComplete function needs
to replace the >= operator with == when comparing UpdatedNumberScheduled to
DesiredNumberScheduled to ensure all nodes are updated, and add two additional
checks: verify ds.Status.ObservedGeneration >= ds.Metadata.Generation and
confirm ds.Status.NumberAvailable >= ds.Status.DesiredNumberScheduled to ensure
all desired daemon pods are actually running.
---
Nitpick comments:
In `@pkg/controllers/clusteroperator_controller_test.go`:
- Around line 1144-1265: Add two edge-case tests to protect against
false-positive rollout completion checks. In the Deployment context, add a test
case that verifies isDeploymentRolloutComplete returns false when
ObservedGeneration in the status is stale (less than the deployment's generation
in spec), even if the Progressing condition indicates success. In the DaemonSet
context, add a test case that verifies isDaemonSetRolloutComplete returns false
when NumberUnavailable is greater than 0, even when UpdatedNumberScheduled
matches or exceeds DesiredNumberScheduled and NumberMisscheduled is 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6c6ccf8e-002e-4339-96f9-4289dbde175a
📒 Files selected for processing (2)
pkg/controllers/clusteroperator_controller.gopkg/controllers/clusteroperator_controller_test.go
620ea7f to
a71d6a0
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/controllers/clusteroperator_controller_test.go (1)
644-659: ⚡ Quick winExtract duplicated deployment rollout-completion patch into a shared test helper.
The same status-patch block appears in three tests. Extracting it into one helper will reduce drift when rollout-completion criteria change.
♻️ Proposed refactor
+func markDeploymentRolloutComplete(ctx context.Context, c client.Client, dep *appsv1.Deployment) { + live := &appsv1.Deployment{} + Expect(c.Get(ctx, client.ObjectKeyFromObject(dep), live)).To(Succeed()) + live.Status.ObservedGeneration = live.Generation + live.Status.Conditions = []appsv1.DeploymentCondition{ + { + Type: appsv1.DeploymentProgressing, + Status: corev1.ConditionTrue, + Reason: "NewReplicaSetAvailable", + }, + } + Expect(c.Status().Update(ctx, live)).To(Succeed()) +}Then replace each repeated inline patch block with:
- live := &appsv1.Deployment{} - Expect(cl.Get(context.Background(), client.ObjectKeyFromObject(dep), live)).To(Succeed()) - live.Status.ObservedGeneration = live.Generation - live.Status.Conditions = []appsv1.DeploymentCondition{ - { - Type: appsv1.DeploymentProgressing, - Status: corev1.ConditionTrue, - Reason: "NewReplicaSetAvailable", - }, - } - Expect(cl.Status().Update(context.Background(), live)).To(Succeed()) + markDeploymentRolloutComplete(context.Background(), cl, dep)Also applies to: 698-713, 1112-1127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/clusteroperator_controller_test.go` around lines 644 - 659, The deployment status patching logic is duplicated across multiple test cases. Create a shared test helper function (e.g., patchDeploymentRolloutComplete) that encapsulates the logic of fetching a Deployment, setting its ObservedGeneration to match Generation, adding the DeploymentProgressing condition with status True and reason NewReplicaSetAvailable, and updating the status via the test client. Then replace all three inline patch blocks with calls to this new helper function, passing the client and deployment resource as parameters. This will eliminate drift and make future changes to rollout-completion criteria easier to maintain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controllers/clusteroperator_controller.go`:
- Line 254: The error message in the fmt.Errorf call violates the ST1005 linting
rule by starting with an uppercase letter. In the clusteroperator_controller.go
file where the error is returned for the ProgressDeadlineExceeded condition,
change the error message to begin with a lowercase letter instead of uppercase,
so "Deployment" becomes "deployment" to comply with Go error message
conventions.
In `@pkg/controllers/status.go`:
- Around line 113-123: The early return condition in ensureStatusProgressing at
lines 119-121 is too narrow and skips calling setStatusProgressing when the
Progressing condition is already True, which can leave stale
OperatorUpgradeable=False conditions from previous degraded states. Remove or
modify the conditional check that returns early (the if statement checking
cond.Status == configv1.ConditionTrue) to ensure setStatusProgressing is always
called so that all status conditions including Upgradeable are properly
reconciled during healthy-progressing rollouts.
---
Nitpick comments:
In `@pkg/controllers/clusteroperator_controller_test.go`:
- Around line 644-659: The deployment status patching logic is duplicated across
multiple test cases. Create a shared test helper function (e.g.,
patchDeploymentRolloutComplete) that encapsulates the logic of fetching a
Deployment, setting its ObservedGeneration to match Generation, adding the
DeploymentProgressing condition with status True and reason
NewReplicaSetAvailable, and updating the status via the test client. Then
replace all three inline patch blocks with calls to this new helper function,
passing the client and deployment resource as parameters. This will eliminate
drift and make future changes to rollout-completion criteria easier to maintain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5ab410a5-a9d1-4a24-bbea-c971b554f9cf
📒 Files selected for processing (3)
pkg/controllers/clusteroperator_controller.gopkg/controllers/clusteroperator_controller_test.gopkg/controllers/status.go
Updates the logic in the clusteroperator_controller to handle progressing more robustly. The previous fix showed a jump from 14 -> 85% in sippy, but was still possible to set progressing=True then immediately unset on next reconcile. This isn't that useful to a cluster admin trying to watch the progress of the rollout / upgrade. This change keeps Progressing=True until we see the deployment/daemonset actually stabilise.
a71d6a0 to
e05d774
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nrb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@theobarberbany: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Updates the logic in the clusteroperator_controller to handle progressing more robustly.
The previous fix showed a jump from 14 -> 85% in sippy, but was still possible to set progressing=True then immediately unset on next reconcile.
This isn't that useful to a cluster admin trying to watch the progress of the rollout / upgrade.
This change keeps Progressing=True until we see the deployment/daemonset actually stabilise.
Summary by CodeRabbit