Skip to content

fix(permission): clean up leftover access in SpiceDB when a permission is deleted#1685

Draft
whoAbhishekSah wants to merge 1 commit into
mainfrom
fix/permission-delete-cascade
Draft

fix(permission): clean up leftover access in SpiceDB when a permission is deleted#1685
whoAbhishekSah wants to merge 1 commit into
mainfrom
fix/permission-delete-cascade

Conversation

@whoAbhishekSah

@whoAbhishekSah whoAbhishekSah commented Jun 8, 2026

Copy link
Copy Markdown
Member

Background: how permissions are stored

A permission lives in three places:

  • permissions table (Postgres) — one row per permission.
  • roles table (Postgres) — each role keeps a list of the permission slugs it grants.
  • SpiceDB — the grant tuples that checks actually read, one per principal type:
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/user:*
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/serviceuser:*
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/pat:*

(format: app/role:<role-id>#<permission-slug>@<principal>:*)

Permission rows are written by exactly one function — bootstrap's AppendSchema (via migrateServiceDefinitionToDB). It's triggered from two entry points: at boot (MigrateSchema, from the base schema + config) and by the CreatePermission API.

The problems

  1. Delete didn't revoke access. permission.Delete only removed the Postgres row and left the SpiceDB grant tuples behind. As long as those tuples exist, a permission check keeps returning true — the deleted permission still grants access.
  2. Roles kept listing the deleted permission. Nothing removed the slug from the roles.permissions list (there's no FK between the tables — it's a denormalized jsonb array), so roles ended up referencing a permission that no longer exists.
  3. Delete wasn't even callable. The DeletePermission admin RPC was hardwired to return "function not available."
  4. No protection for built-in permissions. Permissions from the base schema / config are recreated on every boot. Deleting one only revokes access until the next restart, then it silently comes back — a confusing footgun.

The fix

  • Revoke on delete: permission.Delete removes every app/role:*#<slug> grant tuple (across all roles and principal types) before deleting the row, so a check returns false immediately after delete.
  • Prune role definitions: it then strips the slug from every role's permission list via a new role.Service.RemovePermissionFromRoles. This goes through the role service (not its repo) to respect domain boundaries; the role service uses a single targeted DB update and deliberately does not funnel through role.Update (which would rebuild each role's SpiceDB tuples and emit a per-role audit — pointless here, since the tuples are already gone). permission.Service reaches it via a setter-injected interface, the same SetMembershipService pattern org/project use for mutual service deps.
  • Make delete usable: implemented the DeletePermission admin handler, gated on superuser (was unavailable).
  • Protect built-in permissions: the handler rejects deleting any permission that bootstrap recreates — one defined by the base schema or the configured service definitions — with FailedPrecondition. Only API-created permissions outside that set (the "stray" ones) are deletable, and for those, deletion now sticks (bootstrap won't re-add them).

Known limitation (follow-up, not in this PR)

  • The now-unused relation stays in the compiled SpiceDB schema until the next bootstrap recompiles it.

Tests

TestPermissionDeleteCascade (e2e, real Postgres + SpiceDB):

  • create a stray permission, build a role that grants it, delete the permission → assert no app/role:*#<slug> tuple remains and the role no longer lists the slug;
  • attempt to delete a built-in permission (app/organization:get) → assert it's rejected with FailedPrecondition.

Plus unit tests for the relation sweep + role prune (core/permission) and the role-list strip (core/role).

Addresses gap (6) of #1661.

@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
frontier Ready Ready Preview, Comment Jun 9, 2026 9:50am

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 59ffdb66-83c4-4bb4-906c-93bdc2a98133

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements cascading deletion for permissions. When a permission is deleted, all role→permission relation tuples referencing that permission are also removed via RelationService. The permission service is wired with the relation service, a DeletePermission API handler is exposed, superuser authorization is applied, and end-to-end tests validate the cascade behavior.

Changes

Permission Cascade Deletion

Layer / File(s) Summary
Permission service with relation cleanup
core/permission/service.go, core/permission/mocks/relation_service.go, core/permission/service_test.go
RelationService interface added to permission Service, constructor accepts and stores relation service dependency, Delete method performs cascading cleanup by fetching permission, sweeping role→permission relations by slug, and deleting permission. Tests updated to mock RelationService and verify cascade flow including error paths for permission resolution and relation sweep failures.
API handler and service interface
internal/api/v1beta1connect/interfaces.go, internal/api/v1beta1connect/mocks/permission_service.go, internal/api/v1beta1connect/permission.go
PermissionService interface extended with Delete method signature, mock updated with Delete call tracking and expectation helpers, and DeletePermission handler added to accept request, call service, map not-found errors to HTTP NotFound, other errors to Internal, and return empty response on success.
Authorization and dependency wiring
pkg/server/connect_interceptors/authorization.go, cmd/serve.go
Authorization rule for AdminService/DeletePermission changed from ErrNotAvailable to IsSuperUser check, and relation service PostgreSQL repository is constructed and passed to permission NewService alongside permission repository.
End-to-end cascade regression test
test/e2e/regression/service_registration_test.go
TestPermissionDeleteCascade creates permission, assigns to role, verifies role→permission relation exists, deletes permission via API, and confirms relation is removed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • rohilsurana
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
core/permission/service_test.go (2)

219-220: ⚡ Quick win

Assert the exact cascade delete relation payload.

Line 219 and Line 252 use mock.Anything for the relation argument, so tests won’t catch a broken namespace/relation-name contract.

Proposed test assertion tightening
+ // add imports:
+ // "github.com/raystack/frontier/core/relation"
+ // "github.com/raystack/frontier/internal/bootstrap/schema"

+ expectedRel := relation.Relation{
+   Object: relation.Object{
+     Namespace: schema.RoleNamespace,
+   },
+   RelationName: perm.Slug,
+ }

- mockRelation.On("Delete", mock.Anything, mock.Anything).Return(nil).Once()
+ mockRelation.On("Delete", mock.Anything, expectedRel).Return(nil).Once()

- mockRelation.On("Delete", mock.Anything, mock.Anything).Return(expectedErr).Once()
+ mockRelation.On("Delete", mock.Anything, expectedRel).Return(expectedErr).Once()

Also applies to: 252-253


242-258: ⚡ Quick win

Add a delete test for ignored relation.ErrNotExist.

Service.Delete intentionally tolerates relation.ErrNotExist; add a subtest to lock that behavior and ensure repo delete still executes.

Suggested subtest
+ t.Run("should continue deleting permission when relation sweep returns ErrNotExist", func(t *testing.T) {
+   mockRepo := mocks.NewRepository(t)
+   mockRelation := mocks.NewRelationService(t)
+   svc := permission.NewService(mockRepo, mockRelation)
+
+   permissionID := uuid.New().String()
+   perm := permission.Permission{ID: permissionID, Name: "delete", Slug: "app_organization_delete"}
+
+   mockRepo.On("Get", mock.Anything, permissionID).Return(perm, nil).Once()
+   mockRelation.On("Delete", mock.Anything, mock.Anything).Return(relation.ErrNotExist).Once()
+   mockRepo.On("Delete", mock.Anything, permissionID).Return(nil).Once()
+
+   err := svc.Delete(context.Background(), permissionID)
+   assert.NoError(t, err)
+ })
test/e2e/regression/service_registration_test.go (1)

170-175: ⚡ Quick win

Derive permSlug via schema helper instead of hardcoding.

Line 170 hardcodes slug formatting; generating it from namespace/name keeps this test stable if slug rules change.

Proposed tweak
- const permSlug = "permcascade_res_act"
+ const permNamespace = "permcascade/res"
+ const permName = "act"
+ permSlug := schema.FQPermissionNameFromNamespace(permNamespace, permName)

  createPermResp, err := s.testBench.AdminClient.CreatePermission(ctx, connect.NewRequest(&frontierv1beta1.CreatePermissionRequest{
    Bodies: []*frontierv1beta1.PermissionRequestBody{
-     {Name: "act", Namespace: "permcascade/res"},
+     {Name: permName, Namespace: permNamespace},
    },
  }))

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2dca1fc8-e791-43e9-a9f1-c6f82b8eba39

📥 Commits

Reviewing files that changed from the base of the PR and between 00f567b and 27696db.

📒 Files selected for processing (9)
  • cmd/serve.go
  • core/permission/mocks/relation_service.go
  • core/permission/service.go
  • core/permission/service_test.go
  • internal/api/v1beta1connect/interfaces.go
  • internal/api/v1beta1connect/mocks/permission_service.go
  • internal/api/v1beta1connect/permission.go
  • pkg/server/connect_interceptors/authorization.go
  • test/e2e/regression/service_registration_test.go

@coveralls

coveralls commented Jun 8, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 27198032949

Coverage decreased (-0.06%) to 43.209%

Details

  • Coverage decreased (-0.06%) from the base build.
  • Patch coverage: 81 uncovered changes across 6 files (33 of 114 lines covered, 28.95%).
  • 4 coverage regressions across 1 file.

Uncovered Changes

File Changed Covered %
internal/api/v1beta1connect/permission.go 42 8 19.05%
internal/bootstrap/service.go 20 0 0.0%
internal/store/postgres/role_repository.go 16 0 0.0%
cmd/serve.go 6 0 0.0%
core/permission/service.go 26 22 84.62%
pkg/server/connect_interceptors/authorization.go 1 0 0.0%
Total (7 files) 114 33 28.95%

Coverage Regressions

4 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
internal/api/v1beta1connect/permission.go 4 64.97%

Coverage Stats

Coverage Status
Relevant Lines: 37115
Covered Lines: 16037
Line Coverage: 43.21%
Coverage Strength: 12.28 hits per line

💛 - Coveralls

@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 27696db to 8c37eb5 Compare June 8, 2026 05:35
@whoAbhishekSah whoAbhishekSah marked this pull request as draft June 8, 2026 06:03
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 8c37eb5 to 0d86850 Compare June 8, 2026 07:55
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 0d86850 to 632e276 Compare June 9, 2026 06:27
@whoAbhishekSah whoAbhishekSah changed the title fix(permission): cascade role→permission tuple cleanup on delete fix(permission): clean up leftover access in SpiceDB when a permission is deleted Jun 9, 2026
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 632e276 to 391819b Compare June 9, 2026 08:42
@whoAbhishekSah

whoAbhishekSah commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Manual verification ✅

Tested against a local build of this branch (live ConnectRPC + SpiceDB) as a super admin. The config (app.resources_config_path) points at a sample_kitchen.yml that defines kitchen/recipe:read|write, so we have permissions from all three sources to test: base zed schema, config file, and the CreatePermission API.


Part 1 — Which permissions can be deleted?

The rule: a permission is deletable only if it is not recreated by bootstrap — i.e. it does not come from the base schema or the config file. Built-ins are rejected because the next restart would just bring them back.

Setup: picked one permission from each source and called DeletePermission on it.

Permission Comes from Result
app/organization:get base zed schema ❌ Rejected — FailedPrecondition, "cannot delete a built-in permission…"
kitchen.recipe.read config file ❌ Rejected — same guard (it matches a config-defined permission)
garage.car.wash created via API, in neither set ✅ Deleted successfully

So only the API-created permission could be deleted; the base-schema and config-defined ones were protected. Note the config check reads the config file live on each request, so it caught kitchen/recipe even though this server had booted before that config existed.


Part 2 — What happens to a role that was using a deleted permission, and does an access check still pass?

Setup:

  1. Created a permission compute.machine.scrub via the API.
  2. Created a role that grants it, and bound a service account to that role on a project.
  3. Created a compute/machine resource.
  4. Ran CheckResourcePermission(service account, compute/machine:<id>, "scrub") at each stage.

Result:

Stage Check result
Before binding the role ❌ denied
After binding the role allowed — the role grants access
After deleting the permission denied (not_found)

Two things happen to the role on delete:

  • Access is actually revoked. The SpiceDB grant tuples behind the role (app/role:<role>#compute_machine_scrub@app/user:*, plus the serviceuser and pat variants) go from 3 → 0. This is the core fix — before this change those tuples lingered and kept granting access.
  • The role's own permission list is left untouched (a known follow-up): ListRoles still shows permissions: ["compute_machine_scrub"]. It's a harmless dangling label — it grants nothing, since both the permission row and the SpiceDB tuples are gone.

One detail: after deletion the check returns not_found rather than a plain "false". That's because the check resolves the permission name first and the row no longer exists (confirmed: checking any unregistered verb behaves the same). Either way the caller cannot get an "allowed" — access is genuinely gone.

permission.Delete only removed the DB row. Every role granting the
permission keeps app/role:<role>#<slug>@<*> tuples (one per principal
type), so deleting a permission left those tuples dangling on a relation no
longer backed by any permission row. The method was also unreachable: the
DeletePermission RPC was hardwired to "function not available".

Changes:
- permission.Service gains a relation dependency and, on Delete, sweeps the
  role->permission tuples by object-namespace (app/role) + relation-name
  (the permission slug), clearing them across all roles and principal types
  before removing the row. Tolerates ErrNotExist for unused permissions.
- Implement the DeletePermission admin handler and gate it on superuser
  (previously returned CodeUnavailable), making the guard reachable and
  giving the data-cleanup effort a way to remove stray permissions.

Adds an e2e regression test: create a permission, build a role on it,
delete the permission, assert no role->permission tuple remains. Verified
it fails without the sweep (tuple lingers) and passes with it.

Refs #1661

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants