fix(alerts): Remove duplicate information in the recommend command#2279
fix(alerts): Remove duplicate information in the recommend command#2279nbottari9 wants to merge 4 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe alerts function now detects when the ClusterUpdateAcceptRisks feature gate is enabled and hypershift is not in use; under those conditions, it returns early to skip client-side alert evaluation. Two helper functions inspect cluster FeatureGate and Infrastructure state to determine these conditions, and new unit tests validate the helper behavior. ChangesConditional alert checking based on feature gates and hypershift
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nbottari9 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| func shouldSkipClientAlertChecking(cv *configv1.ClusterVersion) bool { | ||
| for _, update := range cv.Status.ConditionalUpdates { | ||
| for _, condition := range update.Conditions { | ||
| if condition.Type == "Recommended" && condition.Status == metav1.ConditionFalse { |
There was a problem hiding this comment.
There could be reasons for an update to be Recommended=False that aren't about the alert checks. And we want to be exhaustive in checking, because a cluster-admin should be able to say "oh, I'm ok with those issues" and be able to confidently update anyway. Maybe we can look in conditionalUpdateRisks for risk names that start with Alert? Or maybe the presence of conditionalUpdateRisks at all is enough to mark the relevant feature as enabled, and we don't need to build heuristics based on risk name? Either of those might double-check a cluster where the CVO checked and thought the cluster was clean, and then the client re-checked (and hopefully agreed the cluster was clean), but that seems ok during the transitional period, before we can drop the client-side checking entirely.
There was a problem hiding this comment.
(Sorry for late comments)
CVO works on the risks with the above guard, including the ones from alerts.
Could we do similar guard in oc to CVO, like this?
It checks the feature gate and HyperShift: oc has the tools to receive both.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/cli/admin/upgrade/recommend/recommend.go`:
- Around line 187-190: The call to o.Client.ConfigV1().FeatureGates().Get(ctx,
"cluster", metav1.GetOptions{}) is ignoring its error return; capture the error
(e.g., err) instead of using `_` and log it with klog.V(4).Infof() (including
context like that the FeatureGate fetch failed) while keeping the existing nil
fallback behavior for featureGate; update the block that references featureGate
and o.Client to handle the captured error and log it appropriately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 260e5e72-5faa-490c-859c-5290a98e1dc7
📒 Files selected for processing (1)
pkg/cli/admin/upgrade/recommend/recommend.go
| // if CVO is present in the cluster, | ||
| // and Hypershift is not enabled, | ||
| // we don't need to check for alerts here | ||
| if !isAcceptRisksEnabled(featureGate, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure) { |
There was a problem hiding this comment.
In my mind, the guard will be inside the alert checking:
Something like if the gate is enabled and not Hypershift, then the function returns nil, nil like there were no alerts to check (because they are checked by CVO already and CVO will generate the Recommend condition if needed).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/cli/admin/upgrade/recommend/recommend_test.go (1)
34-67: ⚡ Quick winExpand test coverage with table-driven tests.
These tests only cover the positive cases. Critical edge cases are missing: nil inputs, version mismatches, and feature-not-enabled scenarios. Table-driven tests would improve coverage and align with guidelines.
As per coding guidelines: "Unit tests should use co-located
*_test.gofiles with table-driven tests."♻️ Proposed table-driven tests
-func TestIsAcceptRisksEnabled(t *testing.T) { - featureGate := &configv1.FeatureGate{ - Status: configv1.FeatureGateStatus{ - FeatureGates: []configv1.FeatureGateDetails{ - { - Version: "4.22.0", - Enabled: []configv1.FeatureGateAttributes{ - { - Name: features.FeatureGateClusterUpdateAcceptRisks, - }, - }, - }, - }, - }, - } - - result := isAcceptRisksEnabled(featureGate, "4.22.0") - if result != true { - t.Errorf("ClusterUpdateAcceptRisks feature gate should report as enabled") - } -} - -func TestIsHypershiftEnabled(t *testing.T) { - infra := &configv1.Infrastructure{ - Status: configv1.InfrastructureStatus{ - ControlPlaneTopology: configv1.ExternalTopologyMode, - }, - } - - result := isHypershiftEnabled(infra) - if result != true { - t.Errorf("Hypershift should report as enabled") - } -} +func TestIsAcceptRisksEnabled(t *testing.T) { + tests := []struct { + name string + featureGate *configv1.FeatureGate + clusterVersion string + want bool + }{ + { + name: "nil feature gate", + featureGate: nil, + clusterVersion: "4.22.0", + want: false, + }, + { + name: "feature gate enabled for matching version", + featureGate: &configv1.FeatureGate{ + Status: configv1.FeatureGateStatus{ + FeatureGates: []configv1.FeatureGateDetails{ + { + Version: "4.22.0", + Enabled: []configv1.FeatureGateAttributes{ + {Name: features.FeatureGateClusterUpdateAcceptRisks}, + }, + }, + }, + }, + }, + clusterVersion: "4.22.0", + want: true, + }, + { + name: "feature gate enabled for different version", + featureGate: &configv1.FeatureGate{ + Status: configv1.FeatureGateStatus{ + FeatureGates: []configv1.FeatureGateDetails{ + { + Version: "4.21.0", + Enabled: []configv1.FeatureGateAttributes{ + {Name: features.FeatureGateClusterUpdateAcceptRisks}, + }, + }, + }, + }, + }, + clusterVersion: "4.22.0", + want: false, + }, + { + name: "feature gate not in enabled list", + featureGate: &configv1.FeatureGate{ + Status: configv1.FeatureGateStatus{ + FeatureGates: []configv1.FeatureGateDetails{ + { + Version: "4.22.0", + Enabled: []configv1.FeatureGateAttributes{ + {Name: "SomeOtherFeature"}, + }, + }, + }, + }, + }, + clusterVersion: "4.22.0", + want: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := isAcceptRisksEnabled(tt.featureGate, tt.clusterVersion) + if got != tt.want { + t.Errorf("isAcceptRisksEnabled() = %v, want %v", got, tt.want) + } + }) + } +} + +func TestIsHypershiftEnabled(t *testing.T) { + tests := []struct { + name string + infra *configv1.Infrastructure + want bool + }{ + { + name: "nil infrastructure", + infra: nil, + want: false, + }, + { + name: "external topology mode (hypershift)", + infra: &configv1.Infrastructure{ + Status: configv1.InfrastructureStatus{ + ControlPlaneTopology: configv1.ExternalTopologyMode, + }, + }, + want: true, + }, + { + name: "high availability mode", + infra: &configv1.Infrastructure{ + Status: configv1.InfrastructureStatus{ + ControlPlaneTopology: configv1.HighlyAvailableTopologyMode, + }, + }, + want: false, + }, + { + name: "single replica mode", + infra: &configv1.Infrastructure{ + Status: configv1.InfrastructureStatus{ + ControlPlaneTopology: configv1.SingleReplicaTopologyMode, + }, + }, + want: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := isHypershiftEnabled(tt.infra) + if got != tt.want { + t.Errorf("isHypershiftEnabled() = %v, want %v", got, tt.want) + } + }) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/cli/admin/upgrade/recommend/recommend_test.go` around lines 34 - 67, Refactor the TestIsAcceptRisksEnabled and TestIsHypershiftEnabled test functions to use table-driven tests instead of single-case tests. The current tests only cover positive scenarios and miss critical edge cases. For TestIsAcceptRisksEnabled, add test cases for nil featureGate input, version mismatches, disabled features, and empty feature gates. For TestIsHypershiftEnabled, add test cases for nil infra input, different control plane topology modes, and other edge conditions. Use a table structure with test case names, input parameters, and expected results, then iterate through the table to run each scenario.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/cli/admin/upgrade/recommend/alerts.go`:
- Around line 33-35: The condition checking cv == nil then attempting to
dereference cv.Status.Desired.Version will cause a nil pointer panic because cv
cannot be dereferenced when it is nil. Change the condition from cv == nil to cv
!= nil in the if statement at line 33 to ensure the ClusterVersion object is not
nil before accessing its Status.Desired.Version field in the
isAcceptRisksEnabled function call.
- Around line 27-29: The three API calls FeatureGates().Get(),
Infrastructures().Get(), and ClusterVersions().Get() are silently discarding
their error returns using the blank identifier. Instead of ignoring errors with
_, capture the error return values from each of these Get() calls and handle
them appropriately. Check each error and either log it, return it, or handle the
failure case explicitly so that network issues, RBAC denials, and other failures
are properly distinguished from successful operations rather than being silently
ignored.
---
Nitpick comments:
In `@pkg/cli/admin/upgrade/recommend/recommend_test.go`:
- Around line 34-67: Refactor the TestIsAcceptRisksEnabled and
TestIsHypershiftEnabled test functions to use table-driven tests instead of
single-case tests. The current tests only cover positive scenarios and miss
critical edge cases. For TestIsAcceptRisksEnabled, add test cases for nil
featureGate input, version mismatches, disabled features, and empty feature
gates. For TestIsHypershiftEnabled, add test cases for nil infra input,
different control plane topology modes, and other edge conditions. Use a table
structure with test case names, input parameters, and expected results, then
iterate through the table to run each scenario.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a2529d24-e031-4cd3-ad50-33dc89d0cd7f
📒 Files selected for processing (3)
pkg/cli/admin/upgrade/recommend/alerts.gopkg/cli/admin/upgrade/recommend/recommend.gopkg/cli/admin/upgrade/recommend/recommend_test.go
5e4859e to
8f28014
Compare
…t test for new functions Signed-off-by: Nick Bottari <nbottari9@gmail.com> fix(alerts): prevent duplication of warnings by disabling client-side alert checking if the server is already doing it. Signed-off-by: Nick Bottari <nbottari@nbottari-thinkpadp1gen4i.boston.csb> fix(adm-upgrade): changed method of detecting if CVO is already checking alerts Signed-off-by: Nick Bottari <nbottari9@gmail.com> fix(adm-upgrade): add check for hypershift and improve error checking Signed-off-by: Nick Bottari <nbottari9@gmail.com> fix(adm-upgrade): switched to checking for CVO in alerts(). Wrote unit test for new functions Signed-off-by: Nick Bottari <nbottari9@gmail.com> cv != nil Signed-off-by: Nick Bottari <nbottari9@gmail.com>
d9a7f2a to
9cb58aa
Compare
| featureGates, _ := o.Client.ConfigV1().FeatureGates().Get(ctx, "cluster", metav1.GetOptions{}) | ||
| infrastructure, _ := o.Client.ConfigV1().Infrastructures().Get(ctx, "cluster", metav1.GetOptions{}) |
There was a problem hiding this comment.
featureGates and infrastructure become inputs. So they should be integrated into mockData:
oc/pkg/cli/admin/upgrade/recommend/mockresources.go
Lines 14 to 17 in e11b84d
Then you can write up tests based on some new mocks.
…unction, added comprehensive tests for the 2 pure functions Signed-off-by: Nick Bottari <nbottari9@gmail.com>
…Gates's and infrastructure's Signed-off-by: Nick Bottari <nbottari9@gmail.com> fix(adm-upgrade-recommend-alerts): add test for different featureGate version Signed-off-by: Nick Bottari <nbottari9@gmail.com> add comma Signed-off-by: Nick Bottari <nbottari9@gmail.com>
b6b3fb2 to
05b6aaa
Compare
hongkailiu
left a comment
There was a problem hiding this comment.
I believe it is pretty close.
Only a few of code refactoring for simplicity.
We are ready to take mock data from a cluster (HowTo is discussed over slack) and add them into examples and see if we do the output correctly.
| skip, err := o.alertsEvaluatedByCVO(ctx) | ||
| if err != nil { | ||
| klog.Warningf("An error occured while determining if the CVO is evaluating alerts, so the client will check. %v", err) | ||
| } | ||
| if skip { | ||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
nit: check skip only if err==nil.
If err!=nil (i.e., fail to determine if CVO does it already), oc does it anyway.
| skip, err := o.alertsEvaluatedByCVO(ctx) | |
| if err != nil { | |
| klog.Warningf("An error occured while determining if the CVO is evaluating alerts, so the client will check. %v", err) | |
| } | |
| if skip { | |
| return nil, nil | |
| } | |
| if skip, err := o.alertsEvaluatedByCVO(ctx); err != nil { | |
| klog.Warningf("An error occured while determining if the CVO is evaluating alerts, so the client will check. %v", err) | |
| } else if skip { | |
| return nil, nil | |
| } |
| var featureGates *configv1.FeatureGate | ||
| var infrastructure *configv1.Infrastructure | ||
| var cv *configv1.ClusterVersion | ||
|
|
||
| if o.mockData.cvPath != "" { | ||
| featureGates = o.mockData.featureGate | ||
| infrastructure = o.mockData.infrastructure | ||
| cv = o.mockData.clusterVersion | ||
| } else { |
There was a problem hiding this comment.
nit: it is a little simpler this way.
| var featureGates *configv1.FeatureGate | |
| var infrastructure *configv1.Infrastructure | |
| var cv *configv1.ClusterVersion | |
| if o.mockData.cvPath != "" { | |
| featureGates = o.mockData.featureGate | |
| infrastructure = o.mockData.infrastructure | |
| cv = o.mockData.clusterVersion | |
| } else { | |
| featureGates := o.mockData.featureGate | |
| infrastructure := o.mockData.infrastructure | |
| cv := o.mockData.clusterVersion | |
| if cv == nil { |
| } | ||
| } | ||
|
|
||
| // if the AcceptRisks feature gate AND hypershift is not enabled, |
There was a problem hiding this comment.
Ref. https://hypershift.pages.dev/getting-started/quick-setup/
| // if the AcceptRisks feature gate AND hypershift is not enabled, | |
| // if the AcceptRisks feature gate enabled AND the oc is not running against a hosted cluster, |
|
|
||
| // if the AcceptRisks feature gate AND hypershift is not enabled, | ||
| // the CVO is handling alerts and will generate the Recommended condition if needed | ||
| if isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure) { |
There was a problem hiding this comment.
| if isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure) { | |
| if isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !hostedCluster(infrastructure) { |
| if i == nil { | ||
| return false | ||
| } | ||
|
|
||
| return i.Status.ControlPlaneTopology == configv1.ExternalTopologyMode |
There was a problem hiding this comment.
| if i == nil { | |
| return false | |
| } | |
| return i.Status.ControlPlaneTopology == configv1.ExternalTopologyMode | |
| return i != nil && i.Status.ControlPlaneTopology == configv1.ExternalTopologyMode |
| // the CVO is handling alerts and will generate the Recommended condition if needed | ||
| if isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure) { | ||
| return true, nil | ||
| } | ||
|
|
||
| // if we get to this point, check on the client anyway to be safe | ||
| return false, fmt.Errorf("Failed to detect presence of CVO and/or if Hypershift is enabled") |
There was a problem hiding this comment.
| // the CVO is handling alerts and will generate the Recommended condition if needed | |
| if isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure) { | |
| return true, nil | |
| } | |
| // if we get to this point, check on the client anyway to be safe | |
| return false, fmt.Errorf("Failed to detect presence of CVO and/or if Hypershift is enabled") | |
| // the CVO is handling alerts and will generate the Recommended condition if needed | |
| return isAcceptRisksEnabled(featureGates, cv.Status.Desired.Version) && !isHypershiftEnabled(infrastructure), nil |
| }{ | ||
| { | ||
| name: "no infrastructure", | ||
| expected: false, |
There was a problem hiding this comment.
nit: no need to set up zero value explicitly. It applies to other cases too.
| expected: false, |
Signed-off-by: Nick Bottari <nbottari9@gmail.com>
|
@nbottari9: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Context
If the
ClusterUpdateAcceptRisksfeature gate is enabled on the cluster, the CVO will include alerts in the conditional update risks. The client also checks for alerts, which can cause warnings or info messages to be printed twice.Description
We can detect if the server (CVO) is already including alerts by checking if
Recommended=Falsein theClusterVersionconditional updates. If this is set, we know that the feature gate is enabled, therefore the server is checking alerts and we don't need to on the client.Changes
ifstatement before callingprecheck()to check if the server is including alertsshouldSkipClientAlertChecking()- loops over theConditionalUpdatesand checks to see if any of them haveRecommended=False. If so, return true to indicate the server IS checking for alerts and we don't need to on the client.TestServerSideAlertRiskstest suite - Added new tests to test new functionalitySummary by CodeRabbit
Bug Fixes
Tests