Skip to content

feat: Backport VRAM management patches for dmem cgroup#1889

Open
deepin-wm wants to merge 3 commits into
deepin-community:linux-6.18.yfrom
deepin-wm:vram-mgmt-backport
Open

feat: Backport VRAM management patches for dmem cgroup#1889
deepin-wm wants to merge 3 commits into
deepin-community:linux-6.18.yfrom
deepin-wm:vram-mgmt-backport

Conversation

@deepin-wm

Copy link
Copy Markdown

Summary

Backport VRAM management patches from pixelcluster's dmemcg-aggressive-protect branch to improve VRAM allocation for low-end GPUs.

These patches fix AMDGPU's VRAM management so that applications protected by dmem cgroup limits (dmem.low/dmem.min) are more aggressive about evicting unprotected buffers, preventing protected application buffers from being forced into GTT (system RAM) even when they are within their protection limits.

Changes

Patch 1: cgroup/dmem: Add queries for protection values

Add dmem_cgroup_below_min() and dmem_cgroup_below_low() helpers, counterparts to memcg's mem_cgroup_below_{min,low}. Callers can use these to be more aggressive in making space for allocations of a protected cgroup.

Patch 2: cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper

Add a helper to find the common ancestor of two cgroup pool states. This is needed to determine the correct subtree when making eviction decisions about protected buffers.

Patches 3-6 (adapted for 6.18.y):

  • drm/ttm: Extract code for attempting allocation in a place - Introduce struct ttm_bo_alloc_state and ttm_bo_alloc_at_place() for better allocation logic organization.
  • drm/ttm: Split cgroup charge and resource allocation - Separate cgroup charging from resource allocation via ttm_resource_try_charge() to fix race conditions when charge succeeds but allocation fails.
  • drm/ttm: Be more aggressive when allocating below protection limit - When the cgroup's memory usage is below low/min limit and allocation fails, try evicting unprotected buffers to make space.
  • drm/ttm: Use common ancestor of evictor and evictee as limit pool - Use the common ancestor cgroup for correct protection calculation when sibling cgroups compete for memory.

Source

Patches from: https://pixelcluster.github.io/VRAM-Mgmt-fixed/
Original commits by Natalie Vock natalie.vock@gmx.de

Notes

  • Patches 1-2 applied cleanly from upstream
  • Patches 3-6 were adapted for the 6.18.y code structure (minor differences in TTM allocation loop)
  • Targeting linux-6.18.y branch as it already has the dmem cgroup controller infrastructure (the linux-6.6.y branch does not)
  • Userspace utilities (dmemcg-booster, plasma-foreground-booster) are also needed for full functionality but are separate packages

pixelcluster and others added 3 commits June 18, 2026 19:01
Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.

These are counterparts to memcg's mem_cgroup_below_{min,low}.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
This helps to find a common subtree of two resources, which is important
when determining whether it's helpful to evict one resource in favor of
another.

To facilitate this, add a common helper to find the ancestor of two
cgroups using each cgroup's ancestor array.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Backport the following patches from pixelcluster's dmemcg-aggressive-protect
branch, adapted for kernel 6.18.y:

- drm/ttm: Extract code for attempting allocation in a place
  Introduce ttm_bo_alloc_state and ttm_bo_alloc_at_place() to better
  organize allocation logic. Move limit_pool from ttm_bo_evict_walk
  to the new alloc_state structure.

- drm/ttm: Split cgroup charge and resource allocation
  Separate cgroup charging from resource allocation to fix race
  conditions when charge succeeds but allocation fails. Add
  ttm_resource_try_charge() for pre-charging cgroups before
  resource allocation attempts.

- drm/ttm: Be more aggressive when allocating below protection limit
  When the cgroup's memory usage is below the low/min limit and
  allocation fails, try evicting unprotected buffers to make space.
  This prevents application buffers from being forced into GTT even
  though usage is below the protection limit.

- drm/ttm: Use common ancestor of evictor and evictee as limit pool
  When checking whether to skip protected buffers, use the common
  ancestor of evictor and evictee cgroups as the limit pool. This
  ensures correct protection calculation for sibling cgroups.

Original patches by Natalie Vock <natalie.vock@gmx.de>
Adapted for deepin-community/kernel linux-6.18.y branch.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @deepin-wm, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@deepin-ci-robot

Copy link
Copy Markdown

Hi @deepin-wm. Thanks for your PR. 😃

@deepin-ci-robot

Copy link
Copy Markdown

Hi @deepin-wm. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deepin-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign avenger-285714 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants