feat: add AIConfigTracker, metrics & resumption tokens (AIC-2664)#174
feat: add AIConfigTracker, metrics & resumption tokens (AIC-2664)#174ctawiah wants to merge 1 commit into
Conversation
| * Metrics a caller extracts from an AI run, supplied to | ||
| * {@link com.launchdarkly.sdk.server.ai.LDAIConfigTracker#trackMetricsOf}. | ||
| */ | ||
| public static final class Metrics { |
There was a problem hiding this comment.
Should be AIMetrics to be consistent with other SDKs and spec
| try { | ||
| metrics = metricsExtractor.apply(result); | ||
| } catch (RuntimeException e) { | ||
| trackError(); |
There was a problem hiding this comment.
This one is tricky and @mattrmc1 might have more to say here. In the other SDK's I did not throw the exception but logged that metrics could not be tracked as a warning and moved on. In general we try to avoid throwing exceptions whenever possible. Matt called out that since this is a user provided function it make sense to throw which I can see being valid as well.
One thing however is we should not trackError as that indicates the AI failed, not that metric extraction of that AI failed.
| agent.getJudgeConfiguration(), | ||
| agent.getTools(), | ||
| TRACKER_FACTORY); | ||
| trackerFactory(key, null, null, agent.getModel(), agent.getProvider(), context)); |
There was a problem hiding this comment.
We still pass the key in to trackers for defaults since that is the key you requested. It just won't have a variation.
3ace063 to
dfc1386
Compare
…IC-2664) Implements the AITRACK surface on LDAIConfigTracker: per-run UUID runId and track data, the full set of track methods (duration, time-to-first-token, success/error, feedback, tokens, tool calls, judge result) plus trackDurationOf and trackMetricsOf wrappers, and a metric summary. Record-once metrics use atomic claim-before-emit guards so exactly one event is produced under concurrency; tool-call and judge-result events are not once-only. Negative durations and token counts are clamped, and a null judge score is distinct from a legitimate 0.0. Resumption tokens are URL-safe Base64 of canonical JSON in fixed key order (runId, configKey, variationKey, version, graphKey); variationKey is always emitted for cross-SDK parity and modelName/providerName are not carried. Decode strictly type-validates each field and rejects malformed/oversized tokens. LDAIClientImpl now wires createTracker() on the config types to the real tracker and adds createTracker(token, context) to reconstruct a run across processes. Co-authored-by: Cursor <cursoragent@cursor.com>
19d0f4f to
2ca9fc8
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2ca9fc8. Configure here.
| metrics = metricsExtractor.apply(result); | ||
| } catch (RuntimeException e) { | ||
| trackError(); | ||
| throw e; |
There was a problem hiding this comment.
Extractor failure misreports generation error
Medium Severity
When trackMetricsOf’s user-supplied metrics extractor throws after the AI operation completes, the implementation calls trackError() and emits $ld:ai:generation:error. That event means the generation failed, but the model call already succeeded—only parsing or metric extraction failed—so dashboards can show false generation failures and the shared outcome guard blocks a later correct trackSuccess().
Reviewed by Cursor Bugbot for commit 2ca9fc8. Configure here.


Requirements
Related issues
AIC-2664 — Step 4: AITRACK. Stacked on #173 (Step 3).
Describe the solution you've provided
Implements the AITRACK surface to spec, thread-safe by construction:
LDAIConfigTrackergains the full method set:trackDuration,trackTimeToFirstToken,trackSuccess/trackError,trackFeedback,trackTokens,trackToolCall(s),trackJudgeResult, plus thetrackDurationOfandtrackMetricsOfwrappers,getTrackData(),getResumptionToken(), andgetSummary(). Event names match the spec and the JS/Python SDKs.LDAIConfigTrackerImpl(internal): per-run UUIDrunId; record-once metrics use atomic claim-before-emit (AtomicBoolean) so exactly one event is emitted under concurrency (trackSuccess/trackErrorshare one guard). Tool-call and judge-result events are not once-only; tool calls accumulate in aCopyOnWriteArrayListandgetSummary().getToolCalls()returns an immutable snapshot.0.0.trackMetricsOfrecords an error and rethrows on both operation and extractor failures.ResumptionTokens): URL-safe Base64 (no padding) of canonical JSON in fixed key orderrunId, configKey, variationKey, version, graphKey.variationKeyis always emitted for cross-SDK parity;modelName/providerNameare not carried (restored trackers report""). Decoding strictly type-validates each field and rejects malformed or oversized (>4 KB) tokens.LDAIClientImplnow produces real trackers fromcreateTracker()(a freshrunIdper call) and addsLDAIClient.createTracker(token, context)to reconstruct a run across process boundaries. The placeholderNoOpAIConfigTrackeris removed.LDAITrackingTypes(mirroring theLDAIConfigTypespattern).Tests
trackMetricsOfrethrow + error, summary, and concurrency (N threads → exactly one once-only event, intact tool-call list).runId, eachcreateTracker()starts a new run.Out of scope (per ticket)
No Judge/Evaluator (Step 5), no AIGRAPH
createGraphTracker, no provider-specifictrackOpenAIMetrics/trackBedrockMetrics(post-1.0).Made with Cursor
Note
Medium Risk
New public API surface and telemetry behavior on every AI config evaluation; resumption token parsing handles untrusted input with size limits and strict validation.
Overview
Replaces the placeholder no-op
LDAIConfigTrackerwith a full AITRACK implementation that emits AI run metrics throughLDClient.trackMetric, keyed by a per-runrunId.LDAIConfigTrackeris expanded from a stub to the full API: duration, time-to-first-token, success/error, feedback, tokens, tool calls, judge results,trackDurationOf/trackMetricsOf,getTrackData(),getResumptionToken(), andgetSummary().LDAITrackingTypesadds the public value types (TokenUsage,Metrics,JudgeResult, etc.).LDAIConfigTrackerImplsends spec-aligned$ld:ai:*events with shared correlation fields; record-once metrics use atomic guards for thread safety.ResumptionTokensencodes/decodes cross-process resumption (cross-SDK byte fixtures in tests).LDAIClientImplwires real trackers (new UUID percreateTracker()on configs) and addsLDAIClient.createTracker(token, context)for deferred events;NoOpAIConfigTrackeris removed.Reviewed by Cursor Bugbot for commit 19d0f4f. Bugbot is set up for automated code reviews on this repo. Configure here.