fix(logs-cleanup): listing active workspaces into mem + download time streaming lims by icecrasher321 · Pull Request #4692 · simstudioai/sim

icecrasher321 · 2026-05-21T03:46:40Z

Summary

WIP

Fixes #(issue)

Type of Change

Testing

How has this been tested? What should reviewers focus on?

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

… streaming lims

vercel · 2026-05-21T03:46:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	May 21, 2026 3:50am

cursor · 2026-05-21T03:46:49Z

PR Summary

Medium Risk
Touches multiple upload/download and tool proxy routes to enforce size limits and new 411/413 behaviors; incorrect thresholds or limit propagation could break legitimate large-file workflows. Also changes retention job dispatch chunking/concurrency which could affect cleanup coverage or scheduling if paging logic is wrong.

Overview
Adds a shared stream-limits utility (PayloadSizeLimitError, content-length preflight, streamed read-to-buffer/text/json helpers) and wires it through many routes to cap downloads/uploads and fail fast (often 411 when Content-Length is missing and 413 on oversized payloads).

/api/files/parse now validates file-reference shape (rejects data:/inline/binary-ish strings), limits multi-file requests (schema caps array to 10), enforces per-file download limits with a combined 5MB parsed-output cap across multi-file parsing, and propagates maxBytes into StorageService.downloadFile/external fetch streaming reads.

Multipart upload and CSV import endpoints (/api/files/upload, /api/v1/files, /api/table/*import*) now enforce request-body size via content-length, read File bodies with limits before materializing, and standardize oversized responses to 413 (CSV max reduced to 25MB). Tool proxies (image, TTS, video, DocuSign) similarly add bounded response parsing/buffering and return 413 on oversized upstream payloads; DocuSign document download can now optionally store the downloaded document as an execution file and return a file output.

Background/infra changes: retention cleanup dispatch now pages active workspaces to avoid loading all into memory, adds per-job concurrencyKey, and supports inline chunk execution; log cleanup’s retained-reference query is now scoped by workspace IDs. Execution logs are compacted before DB write by summarizing oversized trace IO/output to stay under a 500KB cap, with new metadata fields describing truncation.

^{Reviewed by Cursor Bugbot for commit f79c50f. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f79c50f. Configure here.}

cursor · 2026-05-21T03:53:59Z

+      for (const [idx, ws] of planChunks.entries()) {
        chunks.push({
-          plan: housekeepingPlan,
-          workspaceIds: [],


Cleanup chunk labels collide across pagination pages

Low Severity

The paginated buildCleanupChunks generates chunk labels independently per page, so the same plan appearing across multiple pages produces duplicate labels. For example, if page 1 yields one "free" chunk and page 2 also yields one "free" chunk, both get label: 'free'. The old non-paginated code processed all workspaces at once, ensuring unique labels. This causes duplicate inline job IDs (inline:${jobType}:${payload.label}) and ambiguous log entries.

^{Reviewed by Cursor Bugbot for commit f79c50f. Configure here.}

cursor · 2026-05-21T03:53:59Z

+        runGlobalHousekeeping: true,
+      })
+    }
+  }


Housekeeping chunk always created separately, never piggybacks on existing

Low Severity

The refactored housekeeping logic always creates a new synthetic chunk with empty workspaceIds and runGlobalHousekeeping: true. The old code first checked if a chunk for the housekeeping plan already existed and would mark that existing chunk instead, avoiding a redundant job. Now an extra housekeeping-only chunk is always dispatched even when plan chunks already exist, creating unnecessary job overhead.

^{Reviewed by Cursor Bugbot for commit f79c50f. Configure here.}

greptile-apps · 2026-05-21T03:55:59Z

Greptile Summary

This PR introduces two broad improvements: (1) a stream-limits utility that enforces byte caps on all HTTP response and storage downloads throughout the codebase, and (2) a paginated workspace loading strategy in the cleanup dispatcher to avoid loading all active workspaces into memory at once.

Stream size enforcement: A new readStreamToBufferWithLimit / readResponseToBufferWithLimit family replaces bare response.arrayBuffer() calls across file parsing, tool execution, DocuSign, Google Slides, S3/Blob downloads, and HTTP tool requests. Multi-file parse output is also capped at 5 MB with a sequential loop replacing the previous parallel Promise.all.
Execution log compaction: A three-tier compaction strategy (summarize trace I/O → strip I/O → metadata-only) is applied before writing to workflow_execution_logs, capped at 500 KB, and the DB update now runs inside a transaction with explicit statement/lock/idle timeouts.
Cleanup dispatcher refactor: listActiveWorkspaceCleanupScopeRows is replaced by a cursor-paginated loop (WORKSPACE_SCOPE_PAGE_SIZE = 500), and the inline runner path is promoted to run before Trigger.dev.

Confidence Score: 3/5

The stream-limit and log-compaction changes are safe, but the cleanup dispatcher has behavioral regressions in its scheduled dispatch path.

The refactored dispatcher inverts execution priority so in-process inline runs before Trigger.dev, changes job-queue enqueuing from parallel to sequential, and makes chunk labels non-unique across pagination pages — all in scheduled code that directly controls how log retention jobs are distributed.

apps/sim/lib/billing/cleanup-dispatcher.ts requires the most attention: the inline-before-Trigger.dev ordering, non-unique page-scoped chunk labels, and sequential enqueue loop all need review before this ships to production.

Important Files Changed

Filename	Overview
apps/sim/lib/billing/cleanup-dispatcher.ts	Refactored to paginate workspace loading with cursor-based iteration. Introduces precedence inversion (inline runner now runs before Trigger.dev), non-unique chunk labels across pages, sequential job-queue enqueuing, and always-separate housekeeping chunk.
apps/sim/background/cleanup-logs.ts	Now selects workspaceId in the batch query and passes a deduplicated workspaceIds set to the reference check to scope the SQL scan to relevant rows only. workspaceId is NOT NULL in the schema, so the array is always non-empty. Logic looks correct.
apps/sim/lib/core/utils/stream-limits.ts	New utility providing size-bounded stream reading for both Web Streams and Node.js streams. Has a minor double-consume risk in readResponseToBufferWithLimit when body is null and arrayBuffer() returns empty, then text() is called.
apps/sim/lib/logs/execution/logger.ts	Adds multi-tier execution data compaction capped at 500 KB, wraps the DB update in a transaction with statement/lock/idle timeouts, and redacts workflowInput before storage.
apps/sim/app/api/files/parse/route.ts	Switches multi-file parsing from parallel Promise.all to sequential loop with aggregate output-size budget (5 MB), adds file reference shape validation, and threads download/parse size limits through all handlers.
apps/sim/tools/index.ts	Adds 10 MB response body size limit, introduces shouldRetryWithoutReadingBody to skip consuming bodies that will be retried, and refactors error response parsing.
apps/sim/lib/uploads/providers/blob/client.ts	Replaces the private streamToBuffer helper with readNodeStreamToBufferWithLimit, adding optional maxBytes enforcement both on the Content-Length header and during streaming.
apps/sim/lib/uploads/providers/s3/client.ts	Adds maxBytes parameter to downloadFromS3, checks ContentLength early, streams with readNodeStreamToBufferWithLimit.
apps/sim/app/api/tools/docusign/route.ts	Adds size limits (25 MB document, 2 MB JSON), structured error extraction, upload-to-storage path for downloaded documents when execution context is present.
apps/sim/tools/google_slides/export_presentation.ts	Exports are now stored as execution files when context is available (up to 10 MB); falls back to legacy base64 inline for smaller exports when context is absent.
apps/sim/lib/core/security/input-validation.server.ts	Replaces the generic Error thrown when Content-Length exceeds maxResponseBytes with PayloadSizeLimitError, and returns a null-body response for retryable statuses instead of rejecting.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[dispatchCleanupJobs] --> B[buildCleanupChunks paginated loop]
    B --> C{more pages?}
    C -- yes --> D[listActiveWorkspaceCleanupScopeRowsPage afterCursor]
    D --> E[resolvePlanTypes for page rows]
    E --> F[push non-enterprise chunks per page]
    F --> G[push enterprise chunks per page]
    G --> C
    C -- no --> H[append housekeeping empty chunk]
    H --> I{shouldExecuteInline?}
    I -- yes NEW --> J[run inline sequentially return early]
    I -- no --> K{isTriggerAvailable?}
    K -- yes --> L[batchTrigger with concurrencyKey]
    K -- no --> M[jobQueue sequential loop no runner]

Comments Outside Diff (4)

apps/sim/lib/billing/cleanup-dispatcher.ts, line 181-193 (link)

Chunk labels are not globally unique across pages

When there are more active workspaces than WORKSPACE_SCOPE_PAGE_SIZE (500), each page independently computes planChunks and assigns indices starting at 1. Two pages that each produce multiple free-plan chunks will both emit labels like free/1 and free/2. In the inline path, these duplicate labels are pushed as inline:${jobType}:free/1, making jobIds non-unique and log correlation ambiguous. The old code computed indices over the full workspace list and produced globally unique labels.
apps/sim/lib/billing/cleanup-dispatcher.ts, line 291-308 (link)

Fallback job-queue enqueuing changed from parallel to sequential

The old code used Promise.allSettled(chunks.map(...enqueue...)) to fire all enqueue requests concurrently. The new code iterates sequentially with await inside a for loop. For deployments with many cleanup chunks — e.g. thousands of enterprise workspaces — this multiplies dispatch latency by the number of chunks and holds the caller for the full duration. Restoring parallelism (or at least batching) would avoid this regression for large deployments.
apps/sim/lib/billing/cleanup-dispatcher.ts, line 210-226 (link)

Global housekeeping now unconditionally creates an extra empty chunk

The previous logic searched for an existing plan chunk and set runGlobalHousekeeping = true on it, only synthesizing a new empty chunk when none existed. The refactored code always appends a separate empty synthetic chunk with runGlobalHousekeeping = true, even when real workspace chunks for the same plan already exist. For Trigger.dev dispatch this means an extra job is triggered on every run. While the correctness is preserved, the waste is worth confirming as intentional.
apps/sim/lib/core/utils/stream-limits.ts, line 529-545 (link)

arrayBuffer() then text() on the same response object

When !response.body && response.arrayBuffer is true and arrayBuffer() returns a zero-length buffer, the code falls through and calls response.text(). For real Response objects, calling arrayBuffer() marks bodyUsed = true; a subsequent text() call throws TypeError: body stream already read. This path is reachable on responses with a null body stream (e.g., 204, 304) in environments that implement the Fetch spec strictly. If the intent is just an empty-body fallback, returning the empty buffer directly (or using the preferTextFallback path) is safer.

_{Reviews (1): Last reviewed commit: "fix merge conflict" | Re-trigger Greptile}

greptile-apps · 2026-05-21T03:56:02Z

+  const inlineRunner = shouldExecuteInline() ? await buildCleanupRunner(jobType) : undefined
+  if (inlineRunner) {
+    let succeeded = 0
+    let failed = 0
+
+    for (const payload of chunks) {
+      try {
+        await inlineRunner(payload, new AbortController().signal)
+        jobIds.push(`inline:${jobType}:${payload.label}`)
+        succeeded++
+      } catch (error) {
+        failed++
+        logger.error(`[${jobType}] Inline cleanup chunk failed:`, {
+          plan: payload.plan,
+          label: payload.label,
+          error,
+        })
+      }
+    }
+
+    logger.info(`[${jobType}] Inline cleanup chunks: ${succeeded} succeeded, ${failed} failed`)
+    return { jobIds, jobCount: jobIds.length, chunkCount: chunks.length, workspaceCount }
+  }
+
  if (isTriggerAvailable()) {


Inline runner now takes precedence over Trigger.dev

The execution order was reversed: inline execution is now checked and returned from before isTriggerAvailable(), whereas the old code checked Trigger.dev first and only fell back to the inline/job-queue path when Trigger.dev was unavailable. In any environment where both shouldExecuteInline() and isTriggerAvailable() return true simultaneously (e.g., database async backend with TRIGGER_SECRET_KEY set), cleanup jobs will now run synchronously in-process rather than through Trigger.dev, losing distributed concurrency control, visibility, and retry semantics.

icecrasher321 added 2 commits May 20, 2026 20:39

fix(logs-cleanup): listing active workspaces into mem + download time…

9c91cd8

… streaming lims

fix merge conflict

f79c50f

vercel Bot deployed to Preview May 21, 2026 03:50 View deployment

cursor Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692
icecrasher321 wants to merge 2 commits into
stagingfrom
improvement/cleanup-logs-mem-reads

icecrasher321 commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 21, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 21, 2026

Uh oh!

cursor Bot May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026 •

edited

Loading

Comments Outside Diff (4)

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icecrasher321 commented May 21, 2026

Summary

Type of Change

Testing

Checklist

Screenshots/Videos

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 21, 2026

Choose a reason for hiding this comment

Cleanup chunk labels collide across pagination pages

Uh oh!

cursor Bot May 21, 2026

Choose a reason for hiding this comment

Housekeeping chunk always created separately, never piggybacks on existing

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (4)

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 21, 2026 •

edited

Loading

cursor Bot commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented May 21, 2026 •

edited

Loading