feat: Backport VRAM management patches for dmem cgroup#1889
Conversation
Callers can use this feedback to be more aggressive in making space for
allocations of a cgroup if they know it is protected.
These are counterparts to memcg's mem_cgroup_below_{min,low}.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
This helps to find a common subtree of two resources, which is important when determining whether it's helpful to evict one resource in favor of another. To facilitate this, add a common helper to find the ancestor of two cgroups using each cgroup's ancestor array. Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Backport the following patches from pixelcluster's dmemcg-aggressive-protect branch, adapted for kernel 6.18.y: - drm/ttm: Extract code for attempting allocation in a place Introduce ttm_bo_alloc_state and ttm_bo_alloc_at_place() to better organize allocation logic. Move limit_pool from ttm_bo_evict_walk to the new alloc_state structure. - drm/ttm: Split cgroup charge and resource allocation Separate cgroup charging from resource allocation to fix race conditions when charge succeeds but allocation fails. Add ttm_resource_try_charge() for pre-charging cgroups before resource allocation attempts. - drm/ttm: Be more aggressive when allocating below protection limit When the cgroup's memory usage is below the low/min limit and allocation fails, try evicting unprotected buffers to make space. This prevents application buffers from being forced into GTT even though usage is below the protection limit. - drm/ttm: Use common ancestor of evictor and evictee as limit pool When checking whether to skip protected buffers, use the common ancestor of evictor and evictee cgroups as the limit pool. This ensures correct protection calculation for sibling cgroups. Original patches by Natalie Vock <natalie.vock@gmx.de> Adapted for deepin-community/kernel linux-6.18.y branch.
There was a problem hiding this comment.
Sorry @deepin-wm, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
|
Hi @deepin-wm. Thanks for your PR. 😃 |
|
Hi @deepin-wm. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary
Backport VRAM management patches from pixelcluster's dmemcg-aggressive-protect branch to improve VRAM allocation for low-end GPUs.
These patches fix AMDGPU's VRAM management so that applications protected by dmem cgroup limits (dmem.low/dmem.min) are more aggressive about evicting unprotected buffers, preventing protected application buffers from being forced into GTT (system RAM) even when they are within their protection limits.
Changes
Patch 1: cgroup/dmem: Add queries for protection values
Add
dmem_cgroup_below_min()anddmem_cgroup_below_low()helpers, counterparts to memcg'smem_cgroup_below_{min,low}. Callers can use these to be more aggressive in making space for allocations of a protected cgroup.Patch 2: cgroup,cgroup/dmem: Add (dmem_)cgroup_common_ancestor helper
Add a helper to find the common ancestor of two cgroup pool states. This is needed to determine the correct subtree when making eviction decisions about protected buffers.
Patches 3-6 (adapted for 6.18.y):
struct ttm_bo_alloc_stateandttm_bo_alloc_at_place()for better allocation logic organization.ttm_resource_try_charge()to fix race conditions when charge succeeds but allocation fails.Source
Patches from: https://pixelcluster.github.io/VRAM-Mgmt-fixed/
Original commits by Natalie Vock natalie.vock@gmx.de
Notes