Skip to content

Optimize unknown-tests query with NOT EXISTS and partition pruning#3640

Open
mstaeble wants to merge 1 commit into
openshift:mainfrom
mstaeble:optimize-unknown-tests-query
Open

Optimize unknown-tests query with NOT EXISTS and partition pruning#3640
mstaeble wants to merge 1 commit into
openshift:mainfrom
mstaeble:optimize-unknown-tests-query

Conversation

@mstaeble

@mstaeble mstaeble commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Queries against the heavily partitioned prow_job_run_tests table (~3,500 partitions) were scanning all partitions due to missing partition pruning keys. Two changes address this:

  • Both paths: Load tests separately from the job run with explicit prow_job_run_release and prow_job_run_timestamp in the WHERE clause, enabling PostgreSQL to target a single partition instead of scanning ~3,500. Benchmarked against staging with enable_partitionwise_join = on:
    • Failed-tests path (unknownTests=false): ~432ms → ~0.1ms per call
    • New-tests path (unknownTests=true): ~3,300ms → ~0.2ms per call
  • New-tests path: Use NOT EXISTS instead of NOT IN for the test_ownerships anti-join and combine it with a merged-PR check in a single SQL query. This eliminates the N+1 per-test IsNewTest queries, the NewTestFilter interface, pgNewTestFilter struct, and notNewTests cache — Postgres handles all filtering in one pass.

Test plan

  • Unit tests pass (TestUnit_getNewTestsForJobRun, TestRiskScenarios)
  • Full project compiles with go build ./...
  • go vet clean
  • CI passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Improved efficiency of job run data retrieval with optimized database queries and partitioned test loading
    • Enhanced test identification performance by refactoring to batch-based filtering instead of per-test operations
    • Streamlined job run selection and test processing workflows for better overall system responsiveness

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2026
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

FetchJobRun is refactored to split test loading: it fetches the base job run first, then loads tests in a separate query keyed by release and timestamp for partition pruning, applying NOT EXISTS filters for unknown tests or failure-status filters as needed. The NewTestFilter interface is changed from a per-test IsNewTest predicate to a batch FilterNewTests function, and getNewTestsForJobRun is simplified to map test results directly without per-test filtering logic. Test helpers are refactored to use wrapped fetchJobRun functions, and test expectations are updated to match the new batch filtering flow.

Changes

New test detection query optimization and batch filtering

Layer / File(s) Summary
FetchJobRun partition-pruned test loading
pkg/api/job_runs.go
Splits test loading into a separate query with prow_job_run_release and prow_job_run_timestamp partition keys in the WHERE clause. The unknownTests path uses NOT EXISTS subqueries to exclude test ownerships and merged-PR tests; the failures-only path filters by failure status. Base jobRun is fetched first, then tests are preloaded and assigned directly.
NewTestFilter batch interface and worker refactoring
pkg/sippyserver/pr_new_tests_worker.go
NewTestFilter interface changed from IsNewTest(test) bool to FilterNewTests(tests []ProwJobRunTest) returning a subset. NewTestsWorker no longer stores newTestFilter; StandardNewTestsWorker is simplified to initialize only DB, job-run filter, and fetch function. getNewTestsForJobRun calls FilterNewTests once per batch and directly maps results to NewTest with status-derived success/failure fields.
Test file updates for batch filtering
pkg/sippyserver/pr_new_tests_worker_test.go
Imports updated to remove Kubernetes sets package. TestAssessJobRisks and TestAssessCrossJobRisks refactored to wrap fetchJobRun instead of using in-file filter helpers. TestUnit_getNewTestsForJobRun table structure adjusted to remove per-case filter injection; helper filter types and TestIsNewTest unit test removed; TestFunc_getNewTestsForJobRun updated to use StandardNewTestsWorker directly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

ok-to-test

Suggested reviewers

  • smg247
  • xueqzhan
🚥 Pre-merge checks | ✅ 18 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ⚠️ Warning Errors returned without context wrapping. FetchJobRun returns GORM errors directly (lines 205, 233). getNewTestsForJobRun returns errors without fmt.Errorf wrapping (lines 295, 303). Wrap errors with context using fmt.Errorf: e.g., return nil, fmt.Errorf("fetching tests: %w", err) for database errors and query errors.
Test Coverage For New Features ⚠️ Warning FetchJobRun function (pkg/api/job_runs.go) was significantly modified with optimizations but lacks direct unit tests. While getNewTestsForJobRun has unit tests, the critical partition-pruning and N... Add unit tests for FetchJobRun with unknownTests=true/false paths, covering partition pruning keys and NOT EXISTS anti-join logic against test_ownerships.
✅ Passed checks (18 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main optimization: replacing NOT IN with NOT EXISTS and adding partition pruning keys for the unknown-tests query.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sql Injection Prevention ✅ Passed All SQL queries in the PR use proper parameterization via GORM's Where() method with ? placeholders. The FetchJobRun optimization uses hardcoded SQL fragments with no user input, and the baseSubque...
Excessive Css In React Should Use Styles ✅ Passed This PR contains only Go backend code (database queries and worker logic) with no React/JavaScript files. The custom check for React inline CSS is not applicable to Go backend changes.
Single Responsibility And Clear Naming ✅ Passed PR improves single responsibility by moving database filtering to FetchJobRun, leaving NewTestsWorker focused on risk analysis. All naming is clear and action-oriented with no generic patterns.
Feature Documentation ✅ Passed No feature documentation updates are needed. The PR optimizes internal database queries in the PR new-tests worker but makes no user-facing changes. Only one feature doc exists (job-analysis-sympto...
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests found in repository. This PR modifies standard Go tests (testing.T) with stable, descriptive test names lacking dynamic content. Check is not applicable.
Test Structure And Quality ✅ Passed The custom check targets Ginkgo test code (It blocks, BeforeEach/AfterEach, Describe/Context), but the modified test file uses standard Go testing with *testing.T functions. The check is not applic...
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are limited to database query optimization (job_runs.go), worker refactoring (pr_new_tests_worker.go), and standard Go unit test updates (pr_new_te...
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. It modifies backend database query optimization code in pkg/api/job_runs.go and test worker logic in pkg/sippyserver/ using standard Go testing (func Test...
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only backend database query and worker process code with no deployment manifests, operator code, controllers, or Kubernetes scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed The PR modifies only library package files (pkg/api/job_runs.go, pkg/sippyserver/pr_new_tests_worker.go, pkg/sippyserver/pr_new_tests_worker_test.go) with no main() or init() functions, no stdout w...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. The modified files (pkg/api/job_runs.go, pkg/sippyserver/pr_new_tests_worker.go, pkg/sippyserver/pr_new_tests_worker_test.go) contain standard Go unit tes...
No-Weak-Crypto ✅ Passed No weak crypto algorithms, custom crypto implementations, or non-constant-time secret comparisons found in the PR code. Changes are database query optimizations with no cryptographic operations.
Container-Privileges ✅ Passed PR contains only Go source code changes (pkg/api/job_runs.go, pr_new_tests_worker.go/.test.go); no container/K8s manifests modified. Container-privileges check is not applicable.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs, hostnames, or customer data) found in logging statements across modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mstaeble
Once this PR has been reviewed and has the lgtm label, please assign deepsm007 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/sippyserver/pr_new_tests_worker.go`:
- Around line 363-370: The cache for "not new" tests in ntf.notNewTests is keyed
only by test_id, but the database query that determines whether tests are "not
new" is scoped by prow_job_run_release, causing cache entries to leak across
releases. Modify the cache key to include the release information alongside the
test_id in all three affected locations: where the cache is checked with
ntf.notNewTests.Has() around line 367, where the query is executed with release
filtering around line 386, and where cache results are stored around line
395-397. This ensures the cache properly respects release boundaries and
prevents genuinely new tests from being incorrectly suppressed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9c12e448-3333-4265-b129-228b9d4a5358

📥 Commits

Reviewing files that changed from the base of the PR and between d47db25 and c35c7fe.

📒 Files selected for processing (3)
  • pkg/api/job_runs.go
  • pkg/sippyserver/pr_new_tests_worker.go
  • pkg/sippyserver/pr_new_tests_worker_test.go

Comment thread pkg/sippyserver/pr_new_tests_worker.go Outdated
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 16, 2026
@mstaeble mstaeble force-pushed the optimize-unknown-tests-query branch from 2e02da7 to b3a38b8 Compare June 16, 2026 22:00
@mstaeble mstaeble marked this pull request as ready for review June 16, 2026 22:11
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2026
@openshift-ci openshift-ci Bot requested review from deepsm007 and petr-muller June 16, 2026 22:11
@mstaeble mstaeble force-pushed the optimize-unknown-tests-query branch 4 times, most recently from 74b880e to 5657332 Compare June 16, 2026 22:38
Queries against the heavily partitioned prow_job_run_tests table
(~3,500 partitions) were scanning all partitions due to missing
partition pruning keys. Two changes address this:

1. Load tests separately from the job run with explicit
   prow_job_run_release and prow_job_run_timestamp in the WHERE clause,
   enabling PostgreSQL to target a single partition. This applies to
   both the failed-tests path (onlyNewTests=false, ~432ms -> ~0.1ms)
   and the new-tests path (onlyNewTests=true, ~3,300ms -> ~0.2ms).

2. For the new-tests path, use NOT EXISTS instead of NOT IN for the
   test_ownerships anti-join and combine it with a merged-PR check in
   a single query. This eliminates the N+1 per-test IsNewTest queries,
   the NewTestFilter interface, pgNewTestFilter struct, and notNewTests
   cache — Postgres handles all filtering in one pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mstaeble mstaeble force-pushed the optimize-unknown-tests-query branch from 5657332 to f0d895c Compare June 16, 2026 22:57
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@mstaeble: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant