Conversation
Code Review Completed! 🔥The code review was successfully completed based on your current configurations. Kody Guide: Usage and ConfigurationInteracting with Kody
Current Kody ConfigurationReview OptionsThe following review options are enabled or disabled:
|
📝 WalkthroughWalkthroughAdds LiveKit-mode retry/backoff and telemetry plumbing to relay services, hard-disconnect handling in the voice modes, new resilience tests, and Linux systemd deployment updates with SIGTERM-driven console shutdown. ChangesLiveKit resilience and shutdown
Sequence Diagram(s)sequenceDiagram
participant RelayServiceBase
participant IRelayModeTelemetry
participant DispatchVoiceMode
participant session
RelayServiceBase->>IRelayModeTelemetry: ModeStarting(mode)
RelayServiceBase->>DispatchVoiceMode: ExecuteAsync(cancellationToken)
session-->>DispatchVoiceMode: ConnectionChanged(reconnecting)
DispatchVoiceMode->>DispatchVoiceMode: status.LiveKit = Degraded
session-->>DispatchVoiceMode: ConnectionChanged(hard disconnect)
DispatchVoiceMode-->>RelayServiceBase: InvalidOperationException
RelayServiceBase->>IRelayModeTelemetry: ModeRetrying(mode, ex, attempt, nextDelay)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
Resgrid.Relay.Engine/Services/RelayServiceBase.cs (1)
28-31: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueVirtual member
IsLiveKitModeis invoked from the base constructor.Reading
IsLiveKitMode(Line 30) during base construction runs the derived override before the derived constructor body executes. It's safe today since all overrides return a constant, but it's a fragile pattern (CA2214) if an override later depends on instance state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Resgrid.Relay.Engine/Services/RelayServiceBase.cs` around lines 28 - 31, The base constructor in RelayServiceBase is calling the virtual IsLiveKitMode property, which can invoke derived overrides before the derived constructor has finished. Remove the virtual dispatch from the constructor by moving the decision into a non-virtual initializer, a constructor parameter, or a static/helper method used by RelayServiceBase, so the telemetry selection no longer depends on override behavior during construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Resgrid.Relay.Engine/Telemetry/RelayModeTelemetry.cs`:
- Around line 13-16: The Create method in RelayModeTelemetry currently returns a
new SentryRelayModeTelemetry per service, which causes each instance to own
SentrySdk.Init and disposal. Refactor SentryRelayModeTelemetry so Sentry SDK
initialization, flush, and disposal are managed by a single shared process-wide
instance instead of per-service ownership, and update Create to reuse that
shared telemetry instance for all LiveKit services.
In `@Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs`:
- Around line 56-72: `DispatchVoiceMode` leaves `status.LiveKit` stuck at
Degraded after a transient reconnect because the `onConnectionChanged` handler
returns immediately when `change.Connected` is true. Update the
`ConnectionChanged` logic so the same handler that sets Degraded on the
"reconnecting" reason also restores `status.LiveKit` to Connected once the
session reports a healthy connected state, using the existing `disconnect`,
`status`, and `onConnectionChanged` flow.
In `@Resgrid.Relay.Engine/Voice/RadioMode.cs`:
- Around line 86-99: The connection-state handler in RadioMode’s
onConnectionChanged leaves status.LiveKit stuck at Degraded after an
auto-reconnect because the change.Connected path returns early. Update the
reconnect flow so that when the SDK reports a successful reconnect, the status
is restored to Connected (or equivalent recovery state) before returning, and
keep the degraded transition only for the reconnecting case; use the
onConnectionChanged handler and the status.LiveKit updates as the key
touchpoints.
In `@Resgrid.Relay.Engine/Voice/RecordMode.cs`:
- Around line 55-68: `RecordMode` leaves `status.LiveKit` stuck in `Degraded`
after an SDK reconnect because the `VoiceConnectionStateChange` handler returns
immediately on `change.Connected == true` and never restores the state. Update
the reconnect recovery path in the `handler` inside `RecordMode` so that when
the connection becomes connected again it sets `status.LiveKit` back to
`ConnectionState.Connected` before returning, while still preserving the
existing `reconnecting` degraded handling and hard-disconnect restart behavior.
- Around line 84-86: The disconnect path in RecordMode should not throw during a
concurrent shutdown; mirror the RadioMode cancellation guard. In the logic that
awaits Task.WhenAny for VoiceModeRuntime.WaitForCancellationAsync and
disconnect.Task, only throw the InvalidOperationException when
cancellationToken.IsCancellationRequested is false, so a Ctrl+C/SIGTERM shutdown
can complete cleanly without a spurious fault.
---
Nitpick comments:
In `@Resgrid.Relay.Engine/Services/RelayServiceBase.cs`:
- Around line 28-31: The base constructor in RelayServiceBase is calling the
virtual IsLiveKitMode property, which can invoke derived overrides before the
derived constructor has finished. Remove the virtual dispatch from the
constructor by moving the decision into a non-virtual initializer, a constructor
parameter, or a static/helper method used by RelayServiceBase, so the telemetry
selection no longer depends on override behavior during construction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 802dd7a2-6b94-4576-a289-3fe7aede0078
📒 Files selected for processing (19)
Resgrid.Audio.Relay.Console/Program.csResgrid.Audio.Voice.Tests/RelayResilienceTests.csResgrid.Audio.Voice.Tests/Resgrid.Audio.Voice.Tests.csprojResgrid.Relay.Engine/Configuration/RelayHostOptions.csResgrid.Relay.Engine/Properties/AssemblyInfo.csResgrid.Relay.Engine/Services/DispatchRelayService.csResgrid.Relay.Engine/Services/RadioRelayService.csResgrid.Relay.Engine/Services/RecordRelayService.csResgrid.Relay.Engine/Services/RelayServiceBase.csResgrid.Relay.Engine/Telemetry/IRelayModeTelemetry.csResgrid.Relay.Engine/Telemetry/NullRelayModeTelemetry.csResgrid.Relay.Engine/Telemetry/RelayModeTelemetry.csResgrid.Relay.Engine/Telemetry/SentryRelayModeTelemetry.csResgrid.Relay.Engine/Voice/DispatchVoiceMode.csResgrid.Relay.Engine/Voice/RadioMode.csResgrid.Relay.Engine/Voice/RecordMode.csdeploy/systemd/README.mddeploy/systemd/relay.env.exampledeploy/systemd/resgrid-relay.service
| public static IRelayModeTelemetry Create(RelayTelemetryOptions telemetry, ILogger logger) => | ||
| string.IsNullOrWhiteSpace(telemetry?.Sentry?.Dsn) | ||
| ? NullRelayModeTelemetry.Instance | ||
| : new SentryRelayModeTelemetry(telemetry.Sentry, telemetry.Environment, logger); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🌐 Web query:
In Sentry .NET 6.4.1, is SentrySdk.Init(...)process-global, and what happens if it is called multiple times and each returnedIDisposable is later disposed independently in the same process?
💡 Result:
In Sentry.NET 6.4.1, SentrySdk.Init is effectively process-global [1][2]. The Sentry.NET SDK is designed to be initialized only once during the lifetime of an application [2]. When SentrySdk.Init is called multiple times, the following behaviors occur: 1. Initialization Conflict: Calling Init multiple times is not officially supported and can lead to runtime errors, silent failures, or unexpected behavior [3][4][5]. For example, issues have been reported where double-initialization triggers exceptions (e.g., ArgumentException regarding duplicate dictionary keys in middleware) or IO errors if features like caching are enabled [4][5]. 2. Disposable Behavior: The IDisposable returned by SentrySdk.Init is intended to be used to shut down the SDK and flush queued events [6][7][8]. If you call Init multiple times, each call typically returns a new IDisposable. Disposing of these independently in the same process can cause the SDK to be "disabled" or "shut down" prematurely [9][10]. Once the underlying Hub is disabled by a disposal, subsequent attempts to capture data using the SDK may fail or throw ObjectDisposedException [10]. In most modern.NET applications, you do not need to manually dispose of the IDisposable returned by SentrySdk.Init, as the SDK includes integrations (like AppDomainProcessExitIntegration) that handle flushing and shutdown automatically when the application exits [6][7][9]. Manual disposal is recommended only in specific edge cases, such as when those automatic integrations are disabled, or you need to perform unmonitored work after shutting down Sentry [6][9].
Citations:
- 1: https://forum.sentry.io/t/accessing-and-ensuring-global-shared-scope-in-net/11709
- 2: Add SentrySdk.Init() only once for the entire application lifetime getsentry/sentry-dotnet#446
- 3: Duplicate invocations of Sentry SDK initialization conflict and fail silently getsentry/sentry-dotnet#832
- 4: bug: on 3.35.0 or newer, .NET 6 Web api breaks if sentry is initialised twice getsentry/sentry-dotnet#3042
- 5: Errors when initializing Sentry twice with caching enabled getsentry/sentry-dotnet#2033
- 6: https://github.com/getsentry/sentry-dotnet/blob/main/src/Sentry/SentrySdk.cs
- 7: https://docs.sentry.io/platforms/dotnet/configuration/draining/
- 8: https://docs.sentry.io/platforms/dotnet/migration/
- 9: Capture open transactions on disabled hubs getsentry/sentry-dotnet#2319
- 10: SentrySdk.Init should not return a
IDisposablegetsentry/sentry-dotnet#1023
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the relevant files first.
git ls-files 'Resgrid.Relay.Engine/Telemetry/*' 'Resgrid.Relay.Engine/Services/*'
echo '--- outline: RelayModeTelemetry.cs ---'
ast-grep outline Resgrid.Relay.Engine/Telemetry/RelayModeTelemetry.cs --view expanded || true
echo '--- outline: SentryRelayModeTelemetry.cs ---'
ast-grep outline Resgrid.Relay.Engine/Telemetry/SentryRelayModeTelemetry.cs --view expanded || true
echo '--- outline: RelayServiceBase.cs ---'
ast-grep outline Resgrid.Relay.Engine/Services/RelayServiceBase.cs --view expanded || true
echo '--- relevant search ---'
rg -n "SentrySdk\.Init|Dispose\(|new SentryRelayModeTelemetry|RelayModeTelemetry\.Create|class SentryRelayModeTelemetry|class NullRelayModeTelemetry" Resgrid.Relay.Engine/Telemetry Resgrid.Relay.Engine/ServicesRepository: Resgrid/Relay
Length of output: 1987
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read only the small relevant slices.
sed -n '1,220p' Resgrid.Relay.Engine/Telemetry/SentryRelayModeTelemetry.cs
printf '\n---\n'
sed -n '1,180p' Resgrid.Relay.Engine/Telemetry/RelayModeTelemetry.cs
printf '\n---\n'
sed -n '1,220p' Resgrid.Relay.Engine/Services/RelayServiceBase.csRepository: Resgrid/Relay
Length of output: 13140
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read only the small relevant slices.
sed -n '1,220p' Resgrid.Relay.Engine/Telemetry/SentryRelayModeTelemetry.cs
printf '\n---\n'
sed -n '1,180p' Resgrid.Relay.Engine/Telemetry/RelayModeTelemetry.cs
printf '\n---\n'
sed -n '1,220p' Resgrid.Relay.Engine/Services/RelayServiceBase.csRepository: Resgrid/Relay
Length of output: 13140
Share one Sentry SDK lifecycle for LiveKit services. The LiveKit path creates a new SentryRelayModeTelemetry per service, but that type calls SentrySdk.Init and later disposes the returned handle. Since Sentry’s SDK is process-wide, one service can shut down telemetry for another. Move init/flush/dispose to a single shared instance instead of per-service ownership.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Resgrid.Relay.Engine/Telemetry/RelayModeTelemetry.cs` around lines 13 - 16,
The Create method in RelayModeTelemetry currently returns a new
SentryRelayModeTelemetry per service, which causes each instance to own
SentrySdk.Init and disposal. Refactor SentryRelayModeTelemetry so Sentry SDK
initialization, flush, and disposal are managed by a single shared process-wide
instance instead of per-service ownership, and update Create to reuse that
shared telemetry instance for all LiveKit services.
| // Signals a hard LiveKit disconnect so the run throws and the resilience layer | ||
| // rejoins; a transient SDK "reconnecting" only degrades status. | ||
| var disconnect = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); | ||
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | ||
| { | ||
| if (change.Connected) | ||
| return; | ||
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| if (status != null) | ||
| status.LiveKit = ConnectionState.Degraded; | ||
| return; | ||
| } | ||
| disconnect.TrySetResult(); | ||
| }; | ||
| session.ConnectionChanged += onConnectionChanged; | ||
|
|
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor
🧩 Analysis chain
🌐 Web query:
LiveKit .NET SDK ConnectionChanged event Reconnected state fires with Connected true
💡 Result:
In the LiveKit ecosystem, the LiveKit.NET SDK (often used via the Livekit.Rtc package for real-time media) uses a ConnectionState enumeration to track the status of a room connection [1][2]. Regarding your observation that the connection appears to transition back to a "Connected" state (or indicates Connected = true) during a reconnection event, this is standard behavior for LiveKit SDKs [3]. 1. ConnectionState Enumeration: The SDK typically reports states such as Disconnected, Connecting, Reconnecting, and Connected [3][4][5]. 2. Reconnection Logic: When a connection is interrupted (e.g., due to network instability), the SDK enters a Reconnecting state [6][3]. If the SDK successfully performs a "quick reconnect" to resume the existing session, it transitions the state back to Connected [3]. 3. Why this happens: The Reconnected state you are referring to is often not a separate connection state but a conceptual outcome where the client successfully resumed the existing session without requiring a full re-join [6][3]. Because the session was successfully resumed, the official ConnectionState is Connected [3]. If you are monitoring ConnectionStateChanged events, you should expect to see the state transition from Connected to Reconnecting when the link drops, and then back to Connected once the connection is restored [3][5]. This indicates that the client has successfully regained its active session [6][3].
Citations:
- 1: https://github.com/pabloFuente/livekit-server-sdk-dotnet/blob/main/LivekitRtc/README.md
- 2: https://github.com/pabloFuente/livekit-server-sdk-dotnet
- 3: https://livekit-client-sdk-swift.mintlify.app/guides/reconnection
- 4: https://docs.livekit.io/reference/client-sdk-js/enums/ConnectionState.html
- 5: https://context7.com/livekit/client-sdk-unity/llms.txt
- 6: https://docs.livekit.io/reference/client-sdk-android/livekit-android-sdk/io.livekit.android.events/-room-event/index.html
🏁 Script executed:
# Inspect the target file and related LiveKit status updates
set -e
git ls-files 'Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs' 'Resgrid.Relay.Engine/Voice/*.cs' | sed -n '1,40p'
echo '---'
wc -l Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs
echo '---'
cat -n Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs | sed -n '1,220p'
echo '---'
rg -n "LiveKit\s*=\s*ConnectionState\.(Connected|Degraded|Disconnected|Connecting)|ConnectionChanged|reconnecting|Publish|status\." Resgrid.Relay.Engine/Voice -nRepository: Resgrid/Relay
Length of output: 14040
🏁 Script executed:
# Search for broader status transitions and any reconnect handling elsewhere
set -e
rg -n "ConnectionState\.Connected|ConnectionState\.Degraded|ConnectionState\.Disconnected|ConnectionState\.Connecting|reconnecting|ConnectionChanged" Resgrid.Relay.Engine -g '*.cs'Repository: Resgrid/Relay
Length of output: 4823
🏁 Script executed:
# Inspect the target file and related LiveKit status updates
set -e
git ls-files 'Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs' 'Resgrid.Relay.Engine/Voice/*.cs' | sed -n '1,40p'
echo '---'
wc -l Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs
echo '---'
cat -n Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs | sed -n '1,220p'
echo '---'
rg -n "LiveKit\s*=\s*ConnectionState\.(Connected|Degraded|Disconnected|Connecting)|ConnectionChanged|reconnecting|Publish|status\." Resgrid.Relay.Engine/Voice -nRepository: Resgrid/Relay
Length of output: 14040
🏁 Script executed:
# Search for broader status transitions and any reconnect handling elsewhere
set -e
rg -n "ConnectionState\.Connected|ConnectionState\.Degraded|ConnectionState\.Disconnected|ConnectionState\.Connecting|reconnecting|ConnectionChanged" Resgrid.Relay.Engine -g '*.cs'Repository: Resgrid/Relay
Length of output: 4823
Restore status.LiveKit on reconnect.
reconnecting correctly sets the pill to Degraded, but the early return on change.Connected never clears it. If no new calls are published after auto-reconnect, the status stays degraded even though LiveKit is healthy.
♻️ Reset to Connected on recovery
if (change.Connected)
- return;
+ {
+ if (status != null)
+ status.LiveKit = ConnectionState.Connected;
+ return;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Signals a hard LiveKit disconnect so the run throws and the resilience layer | |
| // rejoins; a transient SDK "reconnecting" only degrades status. | |
| var disconnect = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); | |
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | |
| { | |
| if (change.Connected) | |
| return; | |
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Degraded; | |
| return; | |
| } | |
| disconnect.TrySetResult(); | |
| }; | |
| session.ConnectionChanged += onConnectionChanged; | |
| // Signals a hard LiveKit disconnect so the run throws and the resilience layer | |
| // rejoins; a transient SDK "reconnecting" only degrades status. | |
| var disconnect = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); | |
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | |
| { | |
| if (change.Connected) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Connected; | |
| return; | |
| } | |
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Degraded; | |
| return; | |
| } | |
| disconnect.TrySetResult(); | |
| }; | |
| session.ConnectionChanged += onConnectionChanged; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs` around lines 56 - 72,
`DispatchVoiceMode` leaves `status.LiveKit` stuck at Degraded after a transient
reconnect because the `onConnectionChanged` handler returns immediately when
`change.Connected` is true. Update the `ConnectionChanged` logic so the same
handler that sets Degraded on the "reconnecting" reason also restores
`status.LiveKit` to Connected once the session reports a healthy connected
state, using the existing `disconnect`, `status`, and `onConnectionChanged`
flow.
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | ||
| { | ||
| if (change.Connected) | ||
| return; | ||
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| if (status != null) | ||
| status.LiveKit = ConnectionState.Degraded; | ||
| return; | ||
| } | ||
| disconnected = true; | ||
| try { disconnectCts.Cancel(); } | ||
| catch (ObjectDisposedException) { } | ||
| }; |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
status.LiveKit stays Degraded after a successful auto-reconnect.
On reconnecting the handler degrades status, but when the SDK reconnects (change.Connected == true) it returns early (Line 88-89) without restoring Connected. Unlike dispatch mode there's no later success path that resets it, so the radio bridge will report Degraded permanently after the first transient reconnect.
♻️ Reset to Connected on recovery
if (change.Connected)
- return;
+ {
+ if (status != null)
+ status.LiveKit = ConnectionState.Connected;
+ return;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | |
| { | |
| if (change.Connected) | |
| return; | |
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Degraded; | |
| return; | |
| } | |
| disconnected = true; | |
| try { disconnectCts.Cancel(); } | |
| catch (ObjectDisposedException) { } | |
| }; | |
| EventHandler<VoiceConnectionStateChange> onConnectionChanged = (_, change) => | |
| { | |
| if (change.Connected) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Connected; | |
| return; | |
| } | |
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| if (status != null) | |
| status.LiveKit = ConnectionState.Degraded; | |
| return; | |
| } | |
| disconnected = true; | |
| try { disconnectCts.Cancel(); } | |
| catch (ObjectDisposedException) { } | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Resgrid.Relay.Engine/Voice/RadioMode.cs` around lines 86 - 99, The
connection-state handler in RadioMode’s onConnectionChanged leaves
status.LiveKit stuck at Degraded after an auto-reconnect because the
change.Connected path returns early. Update the reconnect flow so that when the
SDK reports a successful reconnect, the status is restored to Connected (or
equivalent recovery state) before returning, and keep the degraded transition
only for the reconnecting case; use the onConnectionChanged handler and the
status.LiveKit updates as the key touchpoints.
| EventHandler<VoiceConnectionStateChange> handler = (_, change) => | ||
| { | ||
| if (change.Connected) | ||
| return; | ||
| // "reconnecting" is the SDK auto-recovering — degrade but keep running. | ||
| if (string.Equals(change.Reason, "reconnecting", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| if (status != null) | ||
| status.LiveKit = ConnectionState.Degraded; | ||
| return; | ||
| } | ||
| // Any other Connected=false is a hard disconnect — restart the recorder. | ||
| disconnect.TrySetResult(change.Reason); | ||
| }; |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
status.LiveKit not restored to Connected after reconnect.
Same pattern as the other voice modes: a reconnecting transition sets Degraded (Line 62-63) but the change.Connected == true recovery path returns early (Line 57-58) without resetting, and record mode has no later success path to clear it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Resgrid.Relay.Engine/Voice/RecordMode.cs` around lines 55 - 68, `RecordMode`
leaves `status.LiveKit` stuck in `Degraded` after an SDK reconnect because the
`VoiceConnectionStateChange` handler returns immediately on `change.Connected ==
true` and never restores the state. Update the reconnect recovery path in the
`handler` inside `RecordMode` so that when the connection becomes connected
again it sets `status.LiveKit` back to `ConnectionState.Connected` before
returning, while still preserving the existing `reconnecting` degraded handling
and hard-disconnect restart behavior.
| var completed = await Task.WhenAny(VoiceModeRuntime.WaitForCancellationAsync(cancellationToken), disconnect.Task).ConfigureAwait(false); | ||
| if (completed == disconnect.Task) | ||
| throw new InvalidOperationException($"LiveKit session disconnected ({disconnect.Task.Result}); restarting recorder"); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Guard the disconnect throw against concurrent user shutdown.
RadioMode only throws when !cancellationToken.IsCancellationRequested, but here a server-side disconnect that races with a Ctrl+C/SIGTERM shutdown can make WhenAny resolve to disconnect.Task, throwing InvalidOperationException during a clean shutdown and surfacing a spurious fault/Degraded status in the resilience layer. Mirror the radio-mode guard.
♻️ Prefer cancellation over disconnect
var completed = await Task.WhenAny(VoiceModeRuntime.WaitForCancellationAsync(cancellationToken), disconnect.Task).ConfigureAwait(false);
- if (completed == disconnect.Task)
+ if (completed == disconnect.Task && !cancellationToken.IsCancellationRequested)
throw new InvalidOperationException($"LiveKit session disconnected ({disconnect.Task.Result}); restarting recorder");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var completed = await Task.WhenAny(VoiceModeRuntime.WaitForCancellationAsync(cancellationToken), disconnect.Task).ConfigureAwait(false); | |
| if (completed == disconnect.Task) | |
| throw new InvalidOperationException($"LiveKit session disconnected ({disconnect.Task.Result}); restarting recorder"); | |
| var completed = await Task.WhenAny(VoiceModeRuntime.WaitForCancellationAsync(cancellationToken), disconnect.Task).ConfigureAwait(false); | |
| if (completed == disconnect.Task && !cancellationToken.IsCancellationRequested) | |
| throw new InvalidOperationException($"LiveKit session disconnected ({disconnect.Task.Result}); restarting recorder"); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Resgrid.Relay.Engine/Voice/RecordMode.cs` around lines 84 - 86, The
disconnect path in RecordMode should not throw during a concurrent shutdown;
mirror the RadioMode cancellation guard. In the logic that awaits Task.WhenAny
for VoiceModeRuntime.WaitForCancellationAsync and disconnect.Task, only throw
the InvalidOperationException when cancellationToken.IsCancellationRequested is
false, so a Ctrl+C/SIGTERM shutdown can complete cleanly without a spurious
fault.
| using var sigTermRegistration = PosixSignalRegistration.Create(PosixSignal.SIGTERM, context => | ||
| { | ||
| context.Cancel = true; | ||
| cancellationTokenSource.Cancel(); | ||
| }); |
There was a problem hiding this comment.
PlatformNotSupportedException thrown at startup on Windows because PosixSignalRegistration.Create(PosixSignal.SIGTERM) runs without a platform guard and sits outside the try block in Main, causing an unhandled exception since the Engine targets net10.0-windows. Guard the registration with OperatingSystem.IsLinux() or OperatingSystem.IsMacOS() so the SIGTERM handler only registers on supported platforms.
PosixSignalRegistration sigTermRegistration = null;
if (OperatingSystem.IsLinux() || OperatingSystem.IsMacOS())
{
sigTermRegistration = PosixSignalRegistration.Create(PosixSignal.SIGTERM, context =>
{
context.Cancel = true;
cancellationTokenSource.Cancel();
});
}Prompt for LLM
File Resgrid.Audio.Relay.Console/Program.cs:
Line 93 to 97:
WHAT: PosixSignalRegistration.Create(PosixSignal.SIGTERM) is called without a platform guard and sits outside any try/catch, but SIGTERM is not a supported PosixSignal on Windows — .NET throws PlatformNotSupportedException. WHY: The Engine targets net10.0-windows, so running the CLI on Windows now crashes at startup (the throw is before the try block, so Main receives an unhandled exception). HOW: Guard the registration with OperatingSystem.IsLinux() (or wrap in try/catch PlatformNotSupportedException) so the SIGTERM handler only registers on platforms that support it.
Suggested Code:
PosixSignalRegistration sigTermRegistration = null;
if (OperatingSystem.IsLinux() || OperatingSystem.IsMacOS())
{
sigTermRegistration = PosixSignalRegistration.Create(PosixSignal.SIGTERM, context =>
{
context.Cancel = true;
cancellationTokenSource.Cancel();
});
}
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
| session.ConnectionChanged += onConnectionChanged; | ||
|
|
||
| using var tts = new ResgridTtsClient(options.Tts, logger); | ||
|
|
||
| // TTS reachability is unverified until the first synthesis actually reaches the service: |
There was a problem hiding this comment.
Event handler leak on the error path: the ConnectionChanged subscription at line 71 sits outside the try block starting at line 92, so an exception during setup (new ResgridTtsClient or callsApi.GetActiveCallsAsync) bypasses the finally at line 177, leaving the handler attached and publisher.DisposeAsync() uncalled. Move the session.ConnectionChanged += onConnectionChanged subscription inside the existing try block so the finally always detaches and disposes.
using var tts = new ResgridTtsClient(options.Tts, logger);
// ...
var pollSeconds = Math.Max(5, options.DispatchVoice.PollSeconds);
logger.Information(...);
try
{
session.ConnectionChanged += onConnectionChanged;
while (!cancellationToken.IsCancellationRequested)
{Prompt for LLM
File Resgrid.Relay.Engine/Voice/DispatchVoiceMode.cs:
Line 71 to 75:
WHAT: The `ConnectionChanged` handler is subscribed at line 71 (outside the `try` block that starts at line 92), so an exception during setup (lines 73-90 — `new ResgridTtsClient` or `callsApi.GetActiveCallsAsync`, a network call) skips the `finally` at line 177, leaving the handler attached and the publisher undisposed. WHY: on a transient API failure during backlog prime, the event handler leaks onto the session and `publisher.DisposeAsync()` is never called on that error path. HOW: move the `session.ConnectionChanged += onConnectionChanged` subscription inside the existing `try` block, or extend the `try` to start immediately after the subscription so the `finally` always detaches and disposes.
Suggested Code:
using var tts = new ResgridTtsClient(options.Tts, logger);
// ...
var pollSeconds = Math.Max(5, options.DispatchVoice.PollSeconds);
logger.Information(...);
try
{
session.ConnectionChanged += onConnectionChanged;
while (!cancellationToken.IsCancellationRequested)
{
Talk to Kody by mentioning @kody
Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.
Summary
This PR hardens the relay's long-lived LiveKit voice modes (radio, record, dispatch) and adds proper systemd support for the CLI.
Graceful shutdown under systemd / Docker
SIGTERMhandling in the console host, sosystemctl stopanddocker stoptrigger the same clean teardown path as Ctrl+C instead of killing the process mid-work.Resilience layer for LiveKit voice modes
RelayServiceBasethat wraps the run for LiveKit modes. Faulted runs are automatically restarted after a jittered, capped delay; after a configurable number of consecutive quick failures the circuit opens and the service faults.ResilienceOptions.Hard disconnect detection and reconnection
Sentry telemetry for mode lifecycle
IRelayModeTelemetry) with Sentry-backed and no-op implementations. Mode start, retry, fault, and stop events are reported to Sentry when a DSN is configured.Tests