Skip to content

Adapt green-context SM split tests to topology#2098

Merged
rwgk merged 2 commits into
NVIDIA:mainfrom
rwgk:test_green_context_adapt_SM_split_tests_to_topology
May 22, 2026
Merged

Adapt green-context SM split tests to topology#2098
rwgk merged 2 commits into
NVIDIA:mainfrom
rwgk:test_green_context_adapt_SM_split_tests_to_topology

Conversation

@rwgk
Copy link
Copy Markdown
Contributor

@rwgk rwgk commented May 16, 2026

Summary

  • Stop assuming Hopper+ devices always expose min_partition_size and coscheduled_alignment values of 8.
  • Probe for supported explicit SM split sizes in the tests instead of assuming min_partition_size is always a valid request for multi-group splits.
  • Keep backfill coverage, but skip the backfill-specific case when the device does not expose a backfill-only two-group split.
  • Interactively confirmed to resolve nvbug 6097301 on a Tegra Thor system.
  • Agent used: Cursor GPT-5.4 Extra High Fast

Probe for supported explicit SM split sizes instead of assuming Hopper+ devices always expose 8-SM partitions, so Thor-like topologies pass without masking real driver errors.
@rwgk rwgk added this to the cuda.core next milestone May 16, 2026
@rwgk rwgk self-assigned this May 16, 2026
@rwgk rwgk added bug Something isn't working P0 High priority - Must do! test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels May 16, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 16, 2026

/ok to test

@github-actions

This comment has been minimized.

@rwgk rwgk marked this pull request as ready for review May 16, 2026 20:20
@leofang
Copy link
Copy Markdown
Member

leofang commented May 22, 2026

Let's re-run the CI with the latest driver!

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 22, 2026

Thanks @leofang!

@rwgk rwgk enabled auto-merge (squash) May 22, 2026 01:09
@rwgk rwgk merged commit c007c85 into NVIDIA:main May 22, 2026
96 checks passed
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

@rwgk rwgk deleted the test_green_context_adapt_SM_split_tests_to_topology branch May 22, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do! test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants