Skip to content

fix(sandbox): trust exact declared private endpoints#1560

Open
mjamiv wants to merge 1 commit into
NVIDIA:mainfrom
mjamiv:fix/declared-endpoint-private-ip
Open

fix(sandbox): trust exact declared private endpoints#1560
mjamiv wants to merge 1 commit into
NVIDIA:mainfrom
mjamiv:fix/declared-endpoint-private-ip

Conversation

@mjamiv
Copy link
Copy Markdown
Contributor

@mjamiv mjamiv commented May 25, 2026

Summary

  • allow exact user-declared hostname endpoints to resolve to private RFC1918/internal addresses without duplicating allowed_ips
  • keep loopback/link-local/unspecified and blocked control-plane ports denied
  • preserve policy-advisor two-step SSRF flow for advisor-generated proposals

Fixes #1555.

Testing

  • cargo fmt --all -- --check
  • git diff --check
  • cargo test -p openshell-policy
  • cargo test -p openshell-sandbox --lib

@mjamiv mjamiv requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 25, 2026 21:44
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers
Copy link
Copy Markdown
Collaborator

[P1] Advisor provenance can be lost during policy merge, which reopens the private-endpoint SSRF bypass this PR is trying to close.

The new trust check relies on exact_declared_endpoint_host seeing a non-advisor_proposed binary before allowing an exact private endpoint. But advisor provenance is tracked on NetworkBinary, while merge_rules merges endpoints and binaries at the rule level. An agent-authored policy.local proposal can reuse an existing user rule name, causing the advisor-proposed endpoint to be appended into a rule that already has a user-declared binary. After that, the Rego check treats the private endpoint as user-declared and the proxy skips RFC1918 blocking.

Please preserve advisor/user provenance at the endpoint, rule, or merge chunk level, or reject/segregate advisor proposals that target existing user-declared rules. This also needs a regression test where an advisor proposal for an internal exact hostname is merged into a rule that already contains the same binary.

@johntmyers
Copy link
Copy Markdown
Collaborator

/ok to test 1d37a4c

@johntmyers
Copy link
Copy Markdown
Collaborator

Follow-up with a more concrete explanation and suggested fix for the provenance issue.

The risky path here is a two-step interaction between approval and merge:

  1. Approval correctly moves the advisor proposal into active policy.
  2. During merge_rules, that approved advisor proposal can be merged into an existing user-owned rule in a way that loses the advisor provenance for the new endpoint.

For example, suppose active user policy already has a rule like:

network_policies:
  app-api:
    binaries:
      - path: /usr/bin/python
    endpoints:
      - host: api.example.com
        port: 443

Then the advisor proposes, and the user approves, a chunk with the same rule/binary but a private/internal endpoint:

network_policies:
  app-api:
    binaries:
      - path: /usr/bin/python
        advisor_proposed: true
    endpoints:
      - host: internal-admin.local
        port: 443

merge_rules appends the new endpoint, but append_unique_binaries dedupes by binary path. Since /usr/bin/python already exists, the advisor-marked binary is dropped/preserved as the existing user binary. The merged active rule can then effectively look like:

network_policies:
  app-api:
    binaries:
      - path: /usr/bin/python
    endpoints:
      - host: api.example.com
        port: 443
      - host: internal-admin.local
        port: 443

At that point the new exact_declared_endpoint_host path sees a user-declared binary plus an exact endpoint match, so the proxy can skip the default RFC1918/private-IP SSRF block for internal-admin.local. But that endpoint originated from the advisor path, which this PR’s docs and tests imply should still require a separate explicit developer action such as allowed_ips.

I think the durable fix is to make provenance live at the same granularity as the trust decision: the endpoint, not only the binary. Concretely:

  • Add internal/proto provenance to NetworkEndpoint, for example advisor_proposed.
  • When policy.local / advisor-generated chunks create a proposed rule, mark every proposed endpoint as advisor-proposed.
  • Update the Rego exact-host trust check to require both a user-declared binary and a user-declared exact endpoint, e.g. not object.get(ep, "advisor_proposed", false) in _policy_has_exact_declared_endpoint or equivalent.
  • Preserve endpoint provenance through merge. When appending an incoming endpoint, keep its advisor marker. When merging into an existing matching endpoint, use conservative provenance: existing.advisor_proposed = existing.advisor_proposed || incoming.advisor_proposed unless there is a distinct explicit user-acceptance path that converts it.
  • Add a regression test where an approved advisor proposal for an internal exact hostname is merged into a rule that already contains the same binary, and assert the exact-declared-host SSRF bypass does not apply.

An alternate fix would be to prevent advisor proposals from merging into existing user-owned rules for private/internal exact endpoints, but endpoint-level provenance seems less brittle because it keeps the authorization fact attached to the object that is actually trusted.

@mjamiv mjamiv force-pushed the fix/declared-endpoint-private-ip branch from 1d37a4c to 19844d5 Compare May 26, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: auto-populate allowed_ips for explicitly declared policy endpoints resolving to private addresses

2 participants