Skip to content

slurm sglang-disagg: allow co-located proxy (xP+yD==nnodes)#148

Open
mkuznet1 wants to merge 1 commit into
developfrom
fix/slurm-sglang-disagg-co-located-proxy
Open

slurm sglang-disagg: allow co-located proxy (xP+yD==nnodes)#148
mkuznet1 wants to merge 1 commit into
developfrom
fix/slurm-sglang-disagg-co-located-proxy

Conversation

@mkuznet1

Copy link
Copy Markdown
Contributor

This pull request updates the validation logic for the SGLang Disaggregated deployment command in slurm.py to support both co-located and dedicated proxy node topologies. The new logic allows for more flexible node allocation, mirroring the layout used by vllm-disagg.

Validation logic improvements:

  • Updated the custom split validation to accept either a dedicated proxy node (prefill_nodes + decode_nodes + 1 == nnodes) or a co-located proxy/router on the first prefill node (prefill_nodes + decode_nodes == nnodes). This change enables both topologies to be valid, reflecting how the proxy is actually launched by the model script, not the launcher.

  • Improved error messaging to clearly describe the two valid configurations for node allocation, aiding in debugging and user understanding.outer on the first prefill node (prefill+decode==nnodes). xP/yD and the exported topology env are unchanged; the proxy is started by the model run.sh, so both layouts are valid.

The sglang-disagg launcher hard-required a dedicated proxy node (prefill+decode+1==nnodes), unlike the vllm launcher which leaves topology to the model script. Relax the custom-split validation to also accept a co-located proxy/router on the first prefill node (prefill+decode==nnodes). xP/yD and the exported topology env are unchanged; the proxy is started by the model run.sh, so both layouts are valid.
Copilot AI review requested due to automatic review settings June 24, 2026 16:38
@mkuznet1 mkuznet1 self-assigned this Jun 24, 2026
@mkuznet1 mkuznet1 added the enhancement New feature or request label Jun 24, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SLURM launcher’s SGLang-disaggregated custom-split validation to accept both (a) a dedicated proxy node and (b) a co-located proxy topology, aligning with the vLLM-disagg-style layout flexibility.

Changes:

  • Relax custom split validation to allow either prefill+decode==nnodes (co-located proxy) or prefill+decode+1==nnodes (dedicated proxy node).
  • Update the validation failure message to describe both valid configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +883 to +887
if (prefill_nodes + decode_nodes != nnodes
and prefill_nodes + decode_nodes + 1 != nnodes):
raise ValueError(
f"Custom split validation failed: "
f"prefill_nodes ({prefill_nodes}) + decode_nodes ({decode_nodes}) + 1 proxy "
f"must equal nnodes ({nnodes}), but got {prefill_nodes + decode_nodes + 1}"
f"Custom split validation failed: prefill_nodes ({prefill_nodes}) + "
f"decode_nodes ({decode_nodes}) must equal nnodes ({nnodes}) for a "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants