Skip to content

fix: recover mTLS auth after settings races#976

Open
EhabY wants to merge 1 commit into
mainfrom
fix/mtls-recovery-race
Open

fix: recover mTLS auth after settings races#976
EhabY wants to merge 1 commit into
mainfrom
fix/mtls-recovery-race

Conversation

@EhabY
Copy link
Copy Markdown
Collaborator

@EhabY EhabY commented May 22, 2026

  • track auth-affecting settings and silently retry 401s once when those settings changed after a request started
  • recover suspended deployments after debounced coder.headerCommand / coder.tls* changes settle
  • debounce remote settings reload prompts

Closes #973

@EhabY EhabY self-assigned this May 22, 2026
@EhabY EhabY force-pushed the fix/mtls-recovery-race branch 3 times, most recently from 4afb9d2 to 0e9e93b Compare May 22, 2026 20:18
@EhabY EhabY force-pushed the fix/mtls-recovery-race branch from 0e9e93b to 0487124 Compare May 22, 2026 20:22
Comment on lines +65 to +67
if (this.shouldRetryAfterAuthConfigChange(error)) {
return this.retryAfterAuthConfigChange(error);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel like they would be natural to be located within recoverFromUnauthorized, so we could see all the possible 401 recovery paths in one place.

That would let us avoid duplicating the _retryAttempted check as well.

return false;
}

return this.client.hasAuthConfigChangedSince(config.authConfigVersion);
Copy link
Copy Markdown
Member

@code-asher code-asher May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense but I wonder if it would be possible to compare the relevant settings instead? If any have changed, then we re-run the request.

Or I suppose the tracker could hash the values and use that as the version.

Hmm but although that might technically cover more cases, this is probably close enough, worst case is we run it once more with values that just failed (like if a user edits and then reverts the edit).

return false;
}

const config = error.config as RequestConfigWithMeta & {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but there must be a way we can convince Typescript these properties exist. I know in code-server we do something similar with Express to pass around things on the request

declare global {
  // eslint-disable-next-line @typescript-eslint/no-namespace
  namespace Express {
    export interface Request {
      args: DefaultedArgs
      heart: Heart
      settings: SettingsProvider<CoderSettings>
      updater: UpdateProvider
      cookieSessionName: string
    }
  }
}

this.logger.debug(
"Authentication settings changed after session suspended, recovering",
);
await this.setDeploymentIfValid(
Copy link
Copy Markdown
Member

@code-asher code-asher May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need some renaming because just looking at this function it kinda looks like we get the current deployment, then set the deployment to the current deployment, which looks like it should be a no-op.

I know as a side effect this might un-suspend the deployment, but just hard to tell from here.

Comment thread src/api/coderApi.ts
Comment on lines +534 to +536
for (const key of configWithAuthMeta.headerCommandKeys ?? []) {
config.headers.delete(key);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand what is happening, since this could be a retry, we are deleting headers from the previous run of the header command? Since the next run may not output the same headers.

I think this property could use a bit of documentation.

// Token from other window was invalid, ignore and keep waiting
// (or user can click Login/Cancel in the dialog)
} finally {
client.dispose();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo good catch there are probably at least a few clients that we were not disposing.

I wish we had Go's defer

Comment thread src/configWatcher.ts
* reconnect, reload prompts). Keeps rapid edits in settings.json from
* flushing each side-effect per keystroke.
*/
export const CONFIG_CHANGE_DEBOUNCE_MS = 250;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda wild to me that VS Code is not already debouncing setting updates 💀

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do but it's too short, also Netflix manually modifies the settings one by one so this coalesces them into one batch

Comment thread src/configWatcher.ts
Comment on lines +23 to +25
* opens a fixed collection window; subsequent events during the window are
* coalesced. This bounds latency even when events arrive faster than the
* window length (a reset-style debounce would starve).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, a debounce means every time the function is called, it resets the timer.

So this is not actually a debounce? A throttle then? It fires at most once per debounceMs (the window)?

What does it mean to bind latency? Is this just saying we use a throttle to avoid the case where a user keeps editing settings within 250ms over and over, keeping the function from running? Is this really a problem? If the user keeps editing, we really probably do want to avoid using the now-stale values.

Comment thread src/configWatcher.ts
if (windowTimer) {
return; // already collecting in the open window
}
windowTimer = setTimeout(() => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this looks like a throttle

I do feel like a debounce would make more sense though

Copy link
Copy Markdown
Collaborator Author

@EhabY EhabY May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it at denounce first but I worried about denouncing for a long time since this can be triggered by editing the file OR the setting through the UI.

I can bring back the denounce but should we limit this to some max time or am I being overly cautious? Do keep in mind that this is used for the CoderApi/Remote settings watch/deployment watch... Etc

Perhaps denounce but reduce it from 250ms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mTLS users signed out on remote connect when companion settings extension activates after Coder

2 participants