2024.9.x fix workflow error surfacing#2320
Open
Toocky wants to merge 19 commits into
Open
Conversation
When the resolution loop stalls, report every unresolved step individually with its kind, ref, ID, and the specific unresolved child/parent step IDs it is waiting for, instead of a single generic 'failed to resolve workflow step dependencies' error pointing at an arbitrary step.
Issue cortezaproject#1905: error step now supports optional title, body and severity arguments alongside the legacy message argument. Fields are attached to the returned KindAutomation error via the errors package meta bag under the workflow.error.* namespace. The errors package http.go is updated to preserve full error details (message + meta) for KindAutomation errors regardless of mask state, since workflow error steps are user-authored and explicitly safe to surface to the client. Backwards compatible: workflows configured with only 'message' produce identical wire output. Frontend rendering changes land separately.
Issue cortezaproject#1905: expose the new rich error step arguments (title, body, severity) alongside the legacy message field in the error step configurator. The canvas node label now prefers title > body > message when building its preview. Config.arguments is normalised on load so all four targets are always present (with empty exprs), keeping v-model bindings stable even for legacy workflows that only have message.
Issue cortezaproject#1905: detect the namespaced workflow.error.{title,body,severity} meta keys attached by the backend error step and render a rich toast with title/body/variant when present. The detection probes a handful of error-shape variants (direct reject, axios wrap, nested response.data.error) and returns null for anything that does not look like a workflow error step payload. When null, or when the rich rendering branch throws, the handler falls through to the existing legacy plain-toast path - so non-workflow errors and legacy workflow errors render exactly as they do today.
Issue cortezaproject#1725: workflow validation errors previously omitted which argument or result caused the failure, making them hard to debug. - parseExpressionsAs now wraps each parser/type/test failure with the category (argument vs result), the target name, the failing expression (truncated to 80 chars), and the lifecycle stage. - requiredArg includes the accepted type(s) when reporting a missing argument so designers know what to put in the field. - checkArg includes the list of accepted types when the value type is wrong, instead of just naming the unexpected type. - workflowStepDefConv wrapper messages now lead with the step kind and ref so the offending step is identifiable at a glance. All existing test fixtures continue to work since the wrapped errors remain *errors.Internal under the hood.
Issue cortezaproject#687: the workflow save handler previously crashed with 'Cannot read property workflowID of undefined' when the backend was unreachable, and masked trigger-update failures behind a misleading 'Make sure all trigger steps are properly configured' message. The handler now: - Catches trigger create/update/delete failures separately and surfaces the underlying error rather than the configure-triggers red herring. - Wraps each API stage so a network failure short-circuits with a clear 'Unable to reach the server' toast. - Defensively guards against an unexpected workflowCreate / workflowUpdate response shape so saved.workflowID never throws. - Distinguishes network errors (axios timeouts, ENOTFOUND, etc.) from backend-returned errors and renders the appropriate message.
Backend - errors/http.go bypass is now narrowed to KindAutomation errors that carry an explicit workflow.error.safe = true meta flag (set only by convErrorStep). Generic automation errors from corredor etc. keep going through the normal masking path. Flag constant lives in errors.MetaWorkflowErrorSafe so both sides share one source of truth. - exprExcerpt now truncates on rune boundaries so non-ASCII identifiers and literals can never split mid-codepoint in error messages. - Dead parseExpressions wrapper removed; only parseExpressionsAs remains. Frontend - Workflow save handler: isNetworkError now requires *positive* evidence (axios err.code, err.request && !err.response, or an explicit Network Error message). Plain JS errors thrown from inside .then() handlers no longer get misreported as "server unreachable". processingSave is cleared in finally so an unexpected sync throw can never leave the spinner stuck. - Error step configurator: severity dropdown only renders when the severity expression is a literal we can round-trip; otherwise a raw text input is shown with a "reset to literal" affordance so custom expressions (vars.level etc.) are never clobbered on re-render. Node label preview strips literal quotes for a cleaner canvas label. - toast mixin: added a comment explaining that body/title are workflow-author controlled and that BootstrapVue toasts render them as plain text — switching to HTML rendering here without sanitising would introduce a stored-XSS surface. - Removed dead notification:configure-triggers locale key.
State.results was doing double duty: it held both the previous step's output values AND the error handler's variable-name mapping (error -> myErr, errorMessage -> myMsg, errorStepID -> myID). The error-handler case merged the mapping into results, but the normal-step case blew results away with a plain assignment — so the first non-error step after an error handler registration clobbered the mapping, and any later error left errorMessage / error / errorStepID undefined in the downstream scope. This matched the reported "works for a while then stops working" and "Log Message can't find errorMessage" symptoms, and explains why the bug clustered around subworkflows and iterators (both of which have multi-step paths after the error handler where a normal step is guaranteed to execute before an error surfaces). Fix: split the two concerns onto separate State fields. - Add State.errHandlerResults holding only the name mapping. - Carry it across states in State.Next() alongside errHandler, same lifecycle (set together, cleared together). - setErrorHandlerResultsToScope now reads errHandlerResults. - Normal step path leaves the mapping untouched. - errHandlerResults is cleared alongside errHandler when a handler activates, so a subsequent handler cannot inherit stale mappings.
The original design split the error step body from the legacy message argument so authors could theoretically write a terse flat fallback and a longer rich toast body. In practice nobody would ever write two different strings and the extra field just added confusing UI surface for zero real benefit. message is now the single source of truth: required, used as both the flat error string and the rich toast content. title and severity remain as optional rich-toast decorations. Backwards compatible with legacy workflows that only configured message, since the flat string is still what it always was. - Backend convErrorStep drops body/metaBody entirely; flat message is now just the evaluated message arg. - Backend verifyStep drops body from the allowed args. - Frontend configurator drops the Body form-group and bodyArg. - toast.js mixin no longer reads workflow.error.body; the flat err.message is the body. - Locale keys for error-step.body removed and message/title labels updated to reflect the simplified semantics.
In makeGraph(), the iterator step requires both its next and exit child steps to already exist in the graph, and the join gateway requires all its parent steps to already exist. When an iterator's exit path leads to a join gateway with multiple parents, neither can resolve first — the resolution loop stalls and reports a generic 'failed to resolve workflow step dependencies' error. Fix: - Add an exported PathAdder interface and AddPath method on joinGateway so parents can be registered after construction (mutex-guarded, deduped). - Change convGateway 'join' case to create the gateway eagerly with whatever parents are already resolved, instead of returning nil, nil. This breaks the cycle: the join exists, the iterator finds its exit step, the iterator resolves, the iterator branch resolves, and the missing parents become available. - Add a backfill pass after the main resolution loop that walks every join gateway and registers any parents that resolved later via PathAdder.AddPath. The runtime check at joinGateway.Exec (paths.Contains(r.Parent)) still guards against unknown parents — the backfill pass runs synchronously to completion before any session executes so the graph is always fully wired before runtime.
joinGateway.Exec previously started its merge from paths[0]'s scope only and merged in every parent's immediate step results on top. Any variable set mid-branch in paths[1..N] lived in that branch's scope but was never in the final step's results, so it was silently dropped at the join. Whichever branch happened to be paths[0] worked; the others lost their intermediate variables. This matched the reported "variables don't always pass through the joined flow" symptom and explained why authors worked around it by moving Base64-style steps out of parallel forks entirely. Fix: merge every parent's scope (not just paths[0]'s), then merge every parent's results on top, and do both in reverse path order so that paths[0] wins conflicts deterministically. This preserves the intra-branch rule that step results overwrite earlier scope values and gives a clear precedence hierarchy: paths[0].results > ... > paths[N].results > paths[0].scope > ... > paths[N].scope No behaviour change for workflows that previously relied on paths[0] winning (that's now the explicit rule). Variables that used to be silently dropped from non-paths[0] branches are now preserved.
When two parallel branches produce conflicting values for the same variable, the merge still proceeds with paths[0] winning but the colliding branches are now reported as non-fatal warnings. - Frame gains a Warnings []string field alongside the existing Action/Error so the editor test panel and admin session log can render them in their existing per-step display. - State gains a matching warnings slice that MakeFrame copies into the emitted frame. - A new responseWithWarnings wrapper response lets a step return its normal *expr.Vars payload together with a list of warnings. The session unwraps it on dispatch, attaches warnings to the current State (for frame emission) and also mirrors each warning through zap.Warn so ops can grep the server log independently of the editor. - joinGateway.Exec runs a detectConflicts pass before merging. For every key seen across multiple branches it compares stringified values; mismatches produce a warning naming the key, the contributing branches/origins (scope vs results), and the clipped values. The merge then proceeds with paths[0] winning as before. Workflows with no conflicts get zero behaviour change — the warning path only kicks in when detectConflicts returns a non-empty slice.
Introduces a design-time analyzer that walks the workflow graph and warns when two or more parallel branches joining at the same gateway can write the same variable name. Runs during validateWorkflow alongside the existing graph conversion so warnings are populated on every save without requiring a test run. Output lives in a new Workflow.Warnings field (parallel to Issues, same WorkflowIssueSet shape) so the frontend can render warnings with the existing per-step culprit mapping and badge logic. What the analyzer considers a "branch write": - Function/Iterator/ExecWorkflow step: its declared Results targets. - Expressions step: its Arguments targets (the assignment LHS). - ErrorHandler step: its Results targets (author-chosen error var names). - Everything else (Error, Termination, Break, Continue, Debug, Delay, Prompt, Visual, plain Gateway): no scope writes. Subworkflow walking is intentionally NOT recursive. A subworkflow's effect on the calling scope is exactly the ExecWorkflow step's own declared results (see convExecWorkflowStep which Evals s.Results against the subworkflow return value). The static declaration is the full contract, so there's nothing to gain from walking inside. Dynamic writes (vars[computed] = ...) are not detected. Documented as a known limitation in the analyzer source. The analyzer is non-fatal by construction — it never adds to Issues and never blocks save/run. Pair with the runtime detectConflicts warning in joinGateway.Exec to catch what static analysis missed.
Mirrors the existing issues pattern for the new workflow.warnings payload (populated by the backend analyzer) and for the runtime Frame.Warnings emitted by joinGateway.Exec during a test run. - New this.warnings map populated from workflow.warnings, keyed by stepID, identical shape and assembly to this.issues. - hasWarnings computed + yellow header badge next to the existing red issues badge when wf.warnings is non-empty. - Yellow warning triangle icon (new public/icons/warning.svg, same silhouette as issue.svg with a warning-yellow fill) rendered on the canvas node when a step has warnings but no issues. Issues take precedence on the badge position since they are more severe. - Clicking the warning icon opens the shared issues modal which now has two sections (Issues in red, Warnings in yellow). The modal only renders a section if that list is non-empty. - Locale keys editor:detected-warnings and editor:warnings added.
When the step resolution loop stalls, a step may have no unresolved child or parent neighbours. In that case it is not actually waiting on anything — its own conversion is failing (bad ref, missing required argument, unknown function, etc.) and verifyStep or the converter error path has already recorded the specific reason in wfii. Appending a "waiting for [] and/or []" entry on top adds noise that actively misleads the user into looking for a dependency problem that does not exist. The stall reporter now skips any unresolved step whose unresolved neighbour lists are both empty; the pre-existing issues on that step are left to speak for themselves.
Review C2: replace fmt.Sprintf("%v", tv.Get()) as the detectConflicts
fingerprint with encoding/json marshalling. %v prints maps in random
iteration order and pointers as addresses, producing both false
positives (same map compared unequal because keys swapped) and false
negatives (different pointers compared equal because both printed
their address only). encoding/json sorts object keys on output so
structurally equal values serialise identically. Failed marshals
fall back to %v, still consistent within a single run.
Review C3: collectBranchWrites now stops at any join gateway other
than the home join being analyzed. In nested parallels the outer
branch's upstream chain passes through the inner join -> the inner
fork -> the outer branch; walking through the inner join used to
bleed outer-branch writes into every inner branch and flagged
spurious conflicts. The inner join's own analysis handles its
incoming branches independently so we must not cross it.
Review C4: drop iterator from stepWrites. Iterator loop vars (item,
counter, isFirst, ...) are injected into the body scope per
iteration and never survive the iterator's exit path, so two parallel
iterators sharing the idiomatic "item" target are not actually in
conflict at the downstream join. Steps inside the iterator body
still get walked normally via parent edges.
Review M1: warning messages no longer refer to the paths[i]
abstraction that authors cannot correlate with the canvas. They now
name the winning branch by its parent step ID and list each
contributing branch by its parent step ID, origin (scope/results)
and value fingerprint.
Review m5: truncateForWarning is now rune-aware so non-ASCII values
cannot be split mid-codepoint.
Review m11: isNetworkError recognises EAI_AGAIN (DNS transient) and
axios >= 1.x ERR_NETWORK generic network codes.
Review m13: drop the redundant .Wrap(err) on errors.Internal calls
in workflowStepDefConv. parseExpressionsAs already wraps its inner
error through fmt.Errorf("%w") so the outer message already contains
the full chain; the extra .Wrap doubled the inner error in both the
formatted text and the wrap chain.
Review m14: verifyStep now validates severity as a save-time error
when the expression is a simple quoted literal ("error", "warning",
"info"). Non-literal expressions (e.g. vars.level) fall through and
are normalised at runtime as before. Catches authors typing e.g.
"danger" without running the workflow first.
Review M3: Error.vue computed getters (messageArg/titleArg/severityArg) no longer mutate item.config.arguments. They were previously calling findOrCreateArg, which pushed a new arg onto the array and invoked this.\$set — a side effect inside a computed getter that happened to work today only because created() pre-populated every target and the create branch was never hit at runtime. Any future filter on the arguments array would re-enter the mutating getter during reactivity graph evaluation. The getters are now pure findArg + makeArg fallback reads; mutation is hoisted into a new ensureArgs() method called from created() and implicitly by findOrCreateArg for write-path callers. Review M8: when workflowCreate/workflowUpdate returns an unexpected shape without a workflowID, the save handler now console.warns the raw payload before toasting. The toast still shows a user-facing message but the malformed payload is no longer silently discarded, so the path is actually diagnosable from the browser console. Review m2: document the concurrency model of State.warnings instead of guarding it. Each *State is owned by a single step-execution goroutine for its entire lifecycle — Session.exec and the corresponding eventHandler run on the same goroutine — and sibling parallel branches use distinct *State values. No cross-state sharing means no mutex is required, just the documentation of the invariant so future code doesn't accidentally cross-post. Review m9 (no change, verified): the title-only severity default allegedly mismatches between Go default and frontend variant. Verified this is not a real mismatch. Backend attaches workflow.error.severity meta whenever title or severity is non-default; when only title is set, severity falls through as "error", which the frontend maps to variant "danger". When neither is set, no meta attaches and the legacy plain-toast path runs. All three cases are consistent.
Review M6: add Test_writeHttpJSON_workflowErrorSafeBypass to lock the
four-cell matrix for KindAutomation error masking so the opt-in
MetaWorkflowErrorSafe bypass semantics cannot be silently changed by
future refactors. The test exercises:
1. automation + safe flag + masked → meta preserved
2. automation + no flag + masked → meta stripped
3. non-automation + safe flag + masked → meta stripped
(flag is a no-op on non-automation kinds — defensive
regression check against a future author bypassing the
kind gate)
4. automation + safe flag + unmasked → meta preserved
All four pass. The flat message is always visible in all cases,
matching the pre-existing contract that Error.Error() is always
sent to the client even under mask.
Review M2: add a prominent doc block on State.errHandlerResults
explaining the intentional non-propagation to the running scope.
An Expressions step downstream of an error handler that references
a to-be-populated error variable name will not see it until the
error actually fires and the injection path runs. This is by
design — silently seeding author-chosen variable names into scope
would pollute the namespace and make debugging harder (a nil from
a reference cannot be distinguished from a typo). Documenting the
contract so future readers do not "fix" it into scope pollution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The following changes are implemented
Changes in the user interface:
Extended workflow Error Step

More meaning full errors:

Checklist when submitting a final (!draft) PR