2024.9.x fix workflow error surfacing by Toocky · Pull Request #2320 · cortezaproject/corteza

Toocky · 2026-04-11T23:55:38Z

The following changes are implemented

Improvement - Workflow errors #2319 Placing an iterator inside a parallel fork/join produced a bogus failed to resolve workflow step dependencies error and the workflow couldn't be saved or run. Iterator conversion required its next and exit children in the graph first; join gateway required all parents in the graph first. When an iterator's exit path led to a join gateway, they formed a circular dependency and the resolution loop stalled.

Fix: Join gateway is now created eagerly with whatever parents are already resolved; a new PathAdder interface + joinGateway.AddPath method lets the converter backfill missing parents in a second pass after the main resolution loop finishes. Breaks the cycle without affecting runtime correctness.

Improvement - Workflow errors #2319 After an error handler was registered, author-chosen variables like errorMessage / errStep became unavailable to downstream steps the moment any non-erroring step ran between the handler registration and the actual error. Manifested as "works initially, stops working after some time/uses", clustered in subworkflows and inside exclusive/inclusive gateways.

Fix: Split the two concerns. Added a new State.errHandlerResults field holding only the name mapping, carried through State.Next() alongside errHandler with the same lifecycle (set together, cleared together). results goes back to its proper role of only holding the previous step's output.

Improvement - Workflow errors #2319 Any variable set by a step inside a parallel branch could silently vanish at the join if that branch wasn't paths[0].

Fix: Merge every parent's full scope (not just paths[0]'s), then merge every parent's results on top, both in reverse path order so paths[0] deterministically wins conflicts. Gives a clean precedence hierarchy: paths[0].results > … > paths[N].results > paths[0].scope > … > paths[N].scope.

Improvement - Workflow errors #2319 Previously, when makeGraph stalled it emitted a single "failed to resolve workflow step dependencies" error attached to whatever step happened to be iterated last. The user couldn't tell which step was stuck or why

Fix: When the loop stalls, every unresolved step is now reported individually with its kind, ref, ID, and the specific unresolved child/parent step IDs it is waiting for.

Show appropriate error from workflow editor when backend is unaccessible #687 Saving a workflow when the backend was unreachable produced Cannot read property 'workflowID' of undefined in the browser console. And when the real cause was a trigger update failure, the user saw the misleading "Make sure all trigger steps are properly configured" message even if the actual error was totally unrelated.

Fix: Wrapped each save stage in its own try/catch. Network errors produce "Unable to reach the server. Check your connection and try again." with positive-evidence detection (axios err.code matching ECONN|ETIMEDOUT|ENETUNREACH|ENOTFOUND, or err.request && !err.response, or explicit message regex). Trigger update failures surface the actual underlying error instead of the generic configure-triggers red herring. Defensive guard against unexpected saved.workflowID shapes. processingSave = false moved into finally.

Add more information to workflow errors by including which parameter causes the error #1725 Expression parse failures returned bare parser output like "syntax error near X" with no indication of which argument or result failed. Type mismatches named the wrong type but not what was accepted. Missing required args didn't say what type was expected.

Fix: New parseExpressionsAs wrapper takes a category ("argument" / "result") and enriches parse failures with the target name, the failing expression (rune-aware truncated to 80 chars), and which lifecycle stage produced the error (parse / type / test parse). requiredArg now says "expected type: X or Y" when reporting a missing argument. checkArg lists the accepted types when the value type is wrong. workflowStepDefConv wrapper messages now lead with the step kind and ref so the offending step is identifiable at a glance.

Improve error message display in workflows #1905 The error step could only carry a single flat message. End users saw generic red toasts with no title, no severity distinction, no structure.

Fix: Added optional title and severity (error/warning/info) arguments to the error step alongside the existing required message. Runtime attaches them to the resulting KindAutomation error via the errors package's meta bag under a namespaced workflow.error.* key, gated by a specific workflow.error.safe = true meta flag so only user-authored workflow errors (not every KindAutomation error) bypass production masking. Frontend's shared toastErrorHandler mixin detects the meta keys, renders rich toasts with title/variant/body, and falls back to the legacy plain-toast path for old workflows and non-workflow errors. Configurator gets dedicated Title/Message/Severity fields; severity dropdown falls back to a raw text input with a reset button when the severity expression is not a literal, so custom expressions (vars.level) aren't clobbered.

Changes in the user interface:

Extended workflow Error Step

More meaning full errors:

Checklist when submitting a final (!draft) PR

Commits are tidied up, squashed if needed and follow guidelines in CONTRIBUTING.md
Code builds
All existing tests pass
All new critical code is covered by tests
PR is linked to the relevant issue(s)
Rebased with the target branch

When the resolution loop stalls, report every unresolved step individually with its kind, ref, ID, and the specific unresolved child/parent step IDs it is waiting for, instead of a single generic 'failed to resolve workflow step dependencies' error pointing at an arbitrary step.

Issue cortezaproject#1905: error step now supports optional title, body and severity arguments alongside the legacy message argument. Fields are attached to the returned KindAutomation error via the errors package meta bag under the workflow.error.* namespace. The errors package http.go is updated to preserve full error details (message + meta) for KindAutomation errors regardless of mask state, since workflow error steps are user-authored and explicitly safe to surface to the client. Backwards compatible: workflows configured with only 'message' produce identical wire output. Frontend rendering changes land separately.

Issue cortezaproject#1905: expose the new rich error step arguments (title, body, severity) alongside the legacy message field in the error step configurator. The canvas node label now prefers title > body > message when building its preview. Config.arguments is normalised on load so all four targets are always present (with empty exprs), keeping v-model bindings stable even for legacy workflows that only have message.

Issue cortezaproject#1905: detect the namespaced workflow.error.{title,body,severity} meta keys attached by the backend error step and render a rich toast with title/body/variant when present. The detection probes a handful of error-shape variants (direct reject, axios wrap, nested response.data.error) and returns null for anything that does not look like a workflow error step payload. When null, or when the rich rendering branch throws, the handler falls through to the existing legacy plain-toast path - so non-workflow errors and legacy workflow errors render exactly as they do today.

Issue cortezaproject#1725: workflow validation errors previously omitted which argument or result caused the failure, making them hard to debug. - parseExpressionsAs now wraps each parser/type/test failure with the category (argument vs result), the target name, the failing expression (truncated to 80 chars), and the lifecycle stage. - requiredArg includes the accepted type(s) when reporting a missing argument so designers know what to put in the field. - checkArg includes the list of accepted types when the value type is wrong, instead of just naming the unexpected type. - workflowStepDefConv wrapper messages now lead with the step kind and ref so the offending step is identifiable at a glance. All existing test fixtures continue to work since the wrapped errors remain *errors.Internal under the hood.

Issue cortezaproject#687: the workflow save handler previously crashed with 'Cannot read property workflowID of undefined' when the backend was unreachable, and masked trigger-update failures behind a misleading 'Make sure all trigger steps are properly configured' message. The handler now: - Catches trigger create/update/delete failures separately and surfaces the underlying error rather than the configure-triggers red herring. - Wraps each API stage so a network failure short-circuits with a clear 'Unable to reach the server' toast. - Defensively guards against an unexpected workflowCreate / workflowUpdate response shape so saved.workflowID never throws. - Distinguishes network errors (axios timeouts, ENOTFOUND, etc.) from backend-returned errors and renders the appropriate message.

Backend - errors/http.go bypass is now narrowed to KindAutomation errors that carry an explicit workflow.error.safe = true meta flag (set only by convErrorStep). Generic automation errors from corredor etc. keep going through the normal masking path. Flag constant lives in errors.MetaWorkflowErrorSafe so both sides share one source of truth. - exprExcerpt now truncates on rune boundaries so non-ASCII identifiers and literals can never split mid-codepoint in error messages. - Dead parseExpressions wrapper removed; only parseExpressionsAs remains. Frontend - Workflow save handler: isNetworkError now requires *positive* evidence (axios err.code, err.request && !err.response, or an explicit Network Error message). Plain JS errors thrown from inside .then() handlers no longer get misreported as "server unreachable". processingSave is cleared in finally so an unexpected sync throw can never leave the spinner stuck. - Error step configurator: severity dropdown only renders when the severity expression is a literal we can round-trip; otherwise a raw text input is shown with a "reset to literal" affordance so custom expressions (vars.level etc.) are never clobbered on re-render. Node label preview strips literal quotes for a cleaner canvas label. - toast mixin: added a comment explaining that body/title are workflow-author controlled and that BootstrapVue toasts render them as plain text — switching to HTML rendering here without sanitising would introduce a stored-XSS surface. - Removed dead notification:configure-triggers locale key.

State.results was doing double duty: it held both the previous step's output values AND the error handler's variable-name mapping (error -> myErr, errorMessage -> myMsg, errorStepID -> myID). The error-handler case merged the mapping into results, but the normal-step case blew results away with a plain assignment — so the first non-error step after an error handler registration clobbered the mapping, and any later error left errorMessage / error / errorStepID undefined in the downstream scope. This matched the reported "works for a while then stops working" and "Log Message can't find errorMessage" symptoms, and explains why the bug clustered around subworkflows and iterators (both of which have multi-step paths after the error handler where a normal step is guaranteed to execute before an error surfaces). Fix: split the two concerns onto separate State fields. - Add State.errHandlerResults holding only the name mapping. - Carry it across states in State.Next() alongside errHandler, same lifecycle (set together, cleared together). - setErrorHandlerResultsToScope now reads errHandlerResults. - Normal step path leaves the mapping untouched. - errHandlerResults is cleared alongside errHandler when a handler activates, so a subsequent handler cannot inherit stale mappings.

The original design split the error step body from the legacy message argument so authors could theoretically write a terse flat fallback and a longer rich toast body. In practice nobody would ever write two different strings and the extra field just added confusing UI surface for zero real benefit. message is now the single source of truth: required, used as both the flat error string and the rich toast content. title and severity remain as optional rich-toast decorations. Backwards compatible with legacy workflows that only configured message, since the flat string is still what it always was. - Backend convErrorStep drops body/metaBody entirely; flat message is now just the evaluated message arg. - Backend verifyStep drops body from the allowed args. - Frontend configurator drops the Body form-group and bodyArg. - toast.js mixin no longer reads workflow.error.body; the flat err.message is the body. - Locale keys for error-step.body removed and message/title labels updated to reflect the simplified semantics.

In makeGraph(), the iterator step requires both its next and exit child steps to already exist in the graph, and the join gateway requires all its parent steps to already exist. When an iterator's exit path leads to a join gateway with multiple parents, neither can resolve first — the resolution loop stalls and reports a generic 'failed to resolve workflow step dependencies' error. Fix: - Add an exported PathAdder interface and AddPath method on joinGateway so parents can be registered after construction (mutex-guarded, deduped). - Change convGateway 'join' case to create the gateway eagerly with whatever parents are already resolved, instead of returning nil, nil. This breaks the cycle: the join exists, the iterator finds its exit step, the iterator resolves, the iterator branch resolves, and the missing parents become available. - Add a backfill pass after the main resolution loop that walks every join gateway and registers any parents that resolved later via PathAdder.AddPath. The runtime check at joinGateway.Exec (paths.Contains(r.Parent)) still guards against unknown parents — the backfill pass runs synchronously to completion before any session executes so the graph is always fully wired before runtime.

joinGateway.Exec previously started its merge from paths[0]'s scope only and merged in every parent's immediate step results on top. Any variable set mid-branch in paths[1..N] lived in that branch's scope but was never in the final step's results, so it was silently dropped at the join. Whichever branch happened to be paths[0] worked; the others lost their intermediate variables. This matched the reported "variables don't always pass through the joined flow" symptom and explained why authors worked around it by moving Base64-style steps out of parallel forks entirely. Fix: merge every parent's scope (not just paths[0]'s), then merge every parent's results on top, and do both in reverse path order so that paths[0] wins conflicts deterministically. This preserves the intra-branch rule that step results overwrite earlier scope values and gives a clear precedence hierarchy: paths[0].results > ... > paths[N].results > paths[0].scope > ... > paths[N].scope No behaviour change for workflows that previously relied on paths[0] winning (that's now the explicit rule). Variables that used to be silently dropped from non-paths[0] branches are now preserved.

When two parallel branches produce conflicting values for the same variable, the merge still proceeds with paths[0] winning but the colliding branches are now reported as non-fatal warnings. - Frame gains a Warnings []string field alongside the existing Action/Error so the editor test panel and admin session log can render them in their existing per-step display. - State gains a matching warnings slice that MakeFrame copies into the emitted frame. - A new responseWithWarnings wrapper response lets a step return its normal *expr.Vars payload together with a list of warnings. The session unwraps it on dispatch, attaches warnings to the current State (for frame emission) and also mirrors each warning through zap.Warn so ops can grep the server log independently of the editor. - joinGateway.Exec runs a detectConflicts pass before merging. For every key seen across multiple branches it compares stringified values; mismatches produce a warning naming the key, the contributing branches/origins (scope vs results), and the clipped values. The merge then proceeds with paths[0] winning as before. Workflows with no conflicts get zero behaviour change — the warning path only kicks in when detectConflicts returns a non-empty slice.

Introduces a design-time analyzer that walks the workflow graph and warns when two or more parallel branches joining at the same gateway can write the same variable name. Runs during validateWorkflow alongside the existing graph conversion so warnings are populated on every save without requiring a test run. Output lives in a new Workflow.Warnings field (parallel to Issues, same WorkflowIssueSet shape) so the frontend can render warnings with the existing per-step culprit mapping and badge logic. What the analyzer considers a "branch write": - Function/Iterator/ExecWorkflow step: its declared Results targets. - Expressions step: its Arguments targets (the assignment LHS). - ErrorHandler step: its Results targets (author-chosen error var names). - Everything else (Error, Termination, Break, Continue, Debug, Delay, Prompt, Visual, plain Gateway): no scope writes. Subworkflow walking is intentionally NOT recursive. A subworkflow's effect on the calling scope is exactly the ExecWorkflow step's own declared results (see convExecWorkflowStep which Evals s.Results against the subworkflow return value). The static declaration is the full contract, so there's nothing to gain from walking inside. Dynamic writes (vars[computed] = ...) are not detected. Documented as a known limitation in the analyzer source. The analyzer is non-fatal by construction — it never adds to Issues and never blocks save/run. Pair with the runtime detectConflicts warning in joinGateway.Exec to catch what static analysis missed.

Mirrors the existing issues pattern for the new workflow.warnings payload (populated by the backend analyzer) and for the runtime Frame.Warnings emitted by joinGateway.Exec during a test run. - New this.warnings map populated from workflow.warnings, keyed by stepID, identical shape and assembly to this.issues. - hasWarnings computed + yellow header badge next to the existing red issues badge when wf.warnings is non-empty. - Yellow warning triangle icon (new public/icons/warning.svg, same silhouette as issue.svg with a warning-yellow fill) rendered on the canvas node when a step has warnings but no issues. Issues take precedence on the badge position since they are more severe. - Clicking the warning icon opens the shared issues modal which now has two sections (Issues in red, Warnings in yellow). The modal only renders a section if that list is non-empty. - Locale keys editor:detected-warnings and editor:warnings added.

When the step resolution loop stalls, a step may have no unresolved child or parent neighbours. In that case it is not actually waiting on anything — its own conversion is failing (bad ref, missing required argument, unknown function, etc.) and verifyStep or the converter error path has already recorded the specific reason in wfii. Appending a "waiting for [] and/or []" entry on top adds noise that actively misleads the user into looking for a dependency problem that does not exist. The stall reporter now skips any unresolved step whose unresolved neighbour lists are both empty; the pre-existing issues on that step are left to speak for themselves.

Review C2: replace fmt.Sprintf("%v", tv.Get()) as the detectConflicts fingerprint with encoding/json marshalling. %v prints maps in random iteration order and pointers as addresses, producing both false positives (same map compared unequal because keys swapped) and false negatives (different pointers compared equal because both printed their address only). encoding/json sorts object keys on output so structurally equal values serialise identically. Failed marshals fall back to %v, still consistent within a single run. Review C3: collectBranchWrites now stops at any join gateway other than the home join being analyzed. In nested parallels the outer branch's upstream chain passes through the inner join -> the inner fork -> the outer branch; walking through the inner join used to bleed outer-branch writes into every inner branch and flagged spurious conflicts. The inner join's own analysis handles its incoming branches independently so we must not cross it. Review C4: drop iterator from stepWrites. Iterator loop vars (item, counter, isFirst, ...) are injected into the body scope per iteration and never survive the iterator's exit path, so two parallel iterators sharing the idiomatic "item" target are not actually in conflict at the downstream join. Steps inside the iterator body still get walked normally via parent edges. Review M1: warning messages no longer refer to the paths[i] abstraction that authors cannot correlate with the canvas. They now name the winning branch by its parent step ID and list each contributing branch by its parent step ID, origin (scope/results) and value fingerprint. Review m5: truncateForWarning is now rune-aware so non-ASCII values cannot be split mid-codepoint. Review m11: isNetworkError recognises EAI_AGAIN (DNS transient) and axios >= 1.x ERR_NETWORK generic network codes. Review m13: drop the redundant .Wrap(err) on errors.Internal calls in workflowStepDefConv. parseExpressionsAs already wraps its inner error through fmt.Errorf("%w") so the outer message already contains the full chain; the extra .Wrap doubled the inner error in both the formatted text and the wrap chain. Review m14: verifyStep now validates severity as a save-time error when the expression is a simple quoted literal ("error", "warning", "info"). Non-literal expressions (e.g. vars.level) fall through and are normalised at runtime as before. Catches authors typing e.g. "danger" without running the workflow first.

Review M3: Error.vue computed getters (messageArg/titleArg/severityArg) no longer mutate item.config.arguments. They were previously calling findOrCreateArg, which pushed a new arg onto the array and invoked this.\$set — a side effect inside a computed getter that happened to work today only because created() pre-populated every target and the create branch was never hit at runtime. Any future filter on the arguments array would re-enter the mutating getter during reactivity graph evaluation. The getters are now pure findArg + makeArg fallback reads; mutation is hoisted into a new ensureArgs() method called from created() and implicitly by findOrCreateArg for write-path callers. Review M8: when workflowCreate/workflowUpdate returns an unexpected shape without a workflowID, the save handler now console.warns the raw payload before toasting. The toast still shows a user-facing message but the malformed payload is no longer silently discarded, so the path is actually diagnosable from the browser console. Review m2: document the concurrency model of State.warnings instead of guarding it. Each *State is owned by a single step-execution goroutine for its entire lifecycle — Session.exec and the corresponding eventHandler run on the same goroutine — and sibling parallel branches use distinct *State values. No cross-state sharing means no mutex is required, just the documentation of the invariant so future code doesn't accidentally cross-post. Review m9 (no change, verified): the title-only severity default allegedly mismatches between Go default and frontend variant. Verified this is not a real mismatch. Backend attaches workflow.error.severity meta whenever title or severity is non-default; when only title is set, severity falls through as "error", which the frontend maps to variant "danger". When neither is set, no meta attaches and the legacy plain-toast path runs. All three cases are consistent.

Review M6: add Test_writeHttpJSON_workflowErrorSafeBypass to lock the four-cell matrix for KindAutomation error masking so the opt-in MetaWorkflowErrorSafe bypass semantics cannot be silently changed by future refactors. The test exercises: 1. automation + safe flag + masked → meta preserved 2. automation + no flag + masked → meta stripped 3. non-automation + safe flag + masked → meta stripped (flag is a no-op on non-automation kinds — defensive regression check against a future author bypassing the kind gate) 4. automation + safe flag + unmasked → meta preserved All four pass. The flat message is always visible in all cases, matching the pre-existing contract that Error.Error() is always sent to the client even under mask. Review M2: add a prominent doc block on State.errHandlerResults explaining the intentional non-propagation to the running scope. An Expressions step downstream of an error handler that references a to-be-populated error variable name will not see it until the error actually fires and the injection path runs. This is by design — silently seeding author-chosen variable names into scope would pollute the namespace and make debugging harder (a nil from a reference cannot be distinguished from a typo). Documenting the contract so future readers do not "fix" it into scope pollution.

Toocky added 19 commits April 11, 2026 10:28

update gitignore

19c7a63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024.9.x fix workflow error surfacing#2320

2024.9.x fix workflow error surfacing#2320
Toocky wants to merge 19 commits into
cortezaproject:2024.9.xfrom
Toocky:2024.9.x-fix-workflow-error-surfacing

Toocky commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Toocky commented Apr 11, 2026

The following changes are implemented

Changes in the user interface:

Checklist when submitting a final (!draft) PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant