Skip to content

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692

Open
icecrasher321 wants to merge 2 commits into
stagingfrom
improvement/cleanup-logs-mem-reads
Open

fix(logs-cleanup): listing active workspaces into mem + download time streaming lims#4692
icecrasher321 wants to merge 2 commits into
stagingfrom
improvement/cleanup-logs-mem-reads

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

WIP

Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

How has this been tested? What should reviewers focus on?

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 21, 2026 3:50am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

Medium Risk
Touches multiple upload/download and tool proxy routes to enforce size limits and new 411/413 behaviors; incorrect thresholds or limit propagation could break legitimate large-file workflows. Also changes retention job dispatch chunking/concurrency which could affect cleanup coverage or scheduling if paging logic is wrong.

Overview
Adds a shared stream-limits utility (PayloadSizeLimitError, content-length preflight, streamed read-to-buffer/text/json helpers) and wires it through many routes to cap downloads/uploads and fail fast (often 411 when Content-Length is missing and 413 on oversized payloads).

/api/files/parse now validates file-reference shape (rejects data:/inline/binary-ish strings), limits multi-file requests (schema caps array to 10), enforces per-file download limits with a combined 5MB parsed-output cap across multi-file parsing, and propagates maxBytes into StorageService.downloadFile/external fetch streaming reads.

Multipart upload and CSV import endpoints (/api/files/upload, /api/v1/files, /api/table/*import*) now enforce request-body size via content-length, read File bodies with limits before materializing, and standardize oversized responses to 413 (CSV max reduced to 25MB). Tool proxies (image, TTS, video, DocuSign) similarly add bounded response parsing/buffering and return 413 on oversized upstream payloads; DocuSign document download can now optionally store the downloaded document as an execution file and return a file output.

Background/infra changes: retention cleanup dispatch now pages active workspaces to avoid loading all into memory, adds per-job concurrencyKey, and supports inline chunk execution; log cleanup’s retained-reference query is now scoped by workspace IDs. Execution logs are compacted before DB write by summarizing oversized trace IO/output to stay under a 500KB cap, with new metadata fields describing truncation.

Reviewed by Cursor Bugbot for commit f79c50f. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f79c50f. Configure here.

for (const [idx, ws] of planChunks.entries()) {
chunks.push({
plan: housekeepingPlan,
workspaceIds: [],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup chunk labels collide across pagination pages

Low Severity

The paginated buildCleanupChunks generates chunk labels independently per page, so the same plan appearing across multiple pages produces duplicate labels. For example, if page 1 yields one "free" chunk and page 2 also yields one "free" chunk, both get label: 'free'. The old non-paginated code processed all workspaces at once, ensuring unique labels. This causes duplicate inline job IDs (inline:${jobType}:${payload.label}) and ambiguous log entries.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f79c50f. Configure here.

runGlobalHousekeeping: true,
})
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Housekeeping chunk always created separately, never piggybacks on existing

Low Severity

The refactored housekeeping logic always creates a new synthetic chunk with empty workspaceIds and runGlobalHousekeeping: true. The old code first checked if a chunk for the housekeeping plan already existed and would mark that existing chunk instead, avoiding a redundant job. Now an extra housekeeping-only chunk is always dispatched even when plan chunks already exist, creating unnecessary job overhead.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f79c50f. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR introduces two broad improvements: (1) a stream-limits utility that enforces byte caps on all HTTP response and storage downloads throughout the codebase, and (2) a paginated workspace loading strategy in the cleanup dispatcher to avoid loading all active workspaces into memory at once.

  • Stream size enforcement: A new readStreamToBufferWithLimit / readResponseToBufferWithLimit family replaces bare response.arrayBuffer() calls across file parsing, tool execution, DocuSign, Google Slides, S3/Blob downloads, and HTTP tool requests. Multi-file parse output is also capped at 5 MB with a sequential loop replacing the previous parallel Promise.all.
  • Execution log compaction: A three-tier compaction strategy (summarize trace I/O → strip I/O → metadata-only) is applied before writing to workflow_execution_logs, capped at 500 KB, and the DB update now runs inside a transaction with explicit statement/lock/idle timeouts.
  • Cleanup dispatcher refactor: listActiveWorkspaceCleanupScopeRows is replaced by a cursor-paginated loop (WORKSPACE_SCOPE_PAGE_SIZE = 500), and the inline runner path is promoted to run before Trigger.dev.

Confidence Score: 3/5

The stream-limit and log-compaction changes are safe, but the cleanup dispatcher has behavioral regressions in its scheduled dispatch path.

The refactored dispatcher inverts execution priority so in-process inline runs before Trigger.dev, changes job-queue enqueuing from parallel to sequential, and makes chunk labels non-unique across pagination pages — all in scheduled code that directly controls how log retention jobs are distributed.

apps/sim/lib/billing/cleanup-dispatcher.ts requires the most attention: the inline-before-Trigger.dev ordering, non-unique page-scoped chunk labels, and sequential enqueue loop all need review before this ships to production.

Important Files Changed

Filename Overview
apps/sim/lib/billing/cleanup-dispatcher.ts Refactored to paginate workspace loading with cursor-based iteration. Introduces precedence inversion (inline runner now runs before Trigger.dev), non-unique chunk labels across pages, sequential job-queue enqueuing, and always-separate housekeeping chunk.
apps/sim/background/cleanup-logs.ts Now selects workspaceId in the batch query and passes a deduplicated workspaceIds set to the reference check to scope the SQL scan to relevant rows only. workspaceId is NOT NULL in the schema, so the array is always non-empty. Logic looks correct.
apps/sim/lib/core/utils/stream-limits.ts New utility providing size-bounded stream reading for both Web Streams and Node.js streams. Has a minor double-consume risk in readResponseToBufferWithLimit when body is null and arrayBuffer() returns empty, then text() is called.
apps/sim/lib/logs/execution/logger.ts Adds multi-tier execution data compaction capped at 500 KB, wraps the DB update in a transaction with statement/lock/idle timeouts, and redacts workflowInput before storage.
apps/sim/app/api/files/parse/route.ts Switches multi-file parsing from parallel Promise.all to sequential loop with aggregate output-size budget (5 MB), adds file reference shape validation, and threads download/parse size limits through all handlers.
apps/sim/tools/index.ts Adds 10 MB response body size limit, introduces shouldRetryWithoutReadingBody to skip consuming bodies that will be retried, and refactors error response parsing.
apps/sim/lib/uploads/providers/blob/client.ts Replaces the private streamToBuffer helper with readNodeStreamToBufferWithLimit, adding optional maxBytes enforcement both on the Content-Length header and during streaming.
apps/sim/lib/uploads/providers/s3/client.ts Adds maxBytes parameter to downloadFromS3, checks ContentLength early, streams with readNodeStreamToBufferWithLimit.
apps/sim/app/api/tools/docusign/route.ts Adds size limits (25 MB document, 2 MB JSON), structured error extraction, upload-to-storage path for downloaded documents when execution context is present.
apps/sim/tools/google_slides/export_presentation.ts Exports are now stored as execution files when context is available (up to 10 MB); falls back to legacy base64 inline for smaller exports when context is absent.
apps/sim/lib/core/security/input-validation.server.ts Replaces the generic Error thrown when Content-Length exceeds maxResponseBytes with PayloadSizeLimitError, and returns a null-body response for retryable statuses instead of rejecting.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[dispatchCleanupJobs] --> B[buildCleanupChunks paginated loop]
    B --> C{more pages?}
    C -- yes --> D[listActiveWorkspaceCleanupScopeRowsPage afterCursor]
    D --> E[resolvePlanTypes for page rows]
    E --> F[push non-enterprise chunks per page]
    F --> G[push enterprise chunks per page]
    G --> C
    C -- no --> H[append housekeeping empty chunk]
    H --> I{shouldExecuteInline?}
    I -- yes NEW --> J[run inline sequentially return early]
    I -- no --> K{isTriggerAvailable?}
    K -- yes --> L[batchTrigger with concurrencyKey]
    K -- no --> M[jobQueue sequential loop no runner]
Loading

Comments Outside Diff (4)

  1. apps/sim/lib/billing/cleanup-dispatcher.ts, line 181-193 (link)

    P2 Chunk labels are not globally unique across pages

    When there are more active workspaces than WORKSPACE_SCOPE_PAGE_SIZE (500), each page independently computes planChunks and assigns indices starting at 1. Two pages that each produce multiple free-plan chunks will both emit labels like free/1 and free/2. In the inline path, these duplicate labels are pushed as inline:${jobType}:free/1, making jobIds non-unique and log correlation ambiguous. The old code computed indices over the full workspace list and produced globally unique labels.

  2. apps/sim/lib/billing/cleanup-dispatcher.ts, line 291-308 (link)

    P2 Fallback job-queue enqueuing changed from parallel to sequential

    The old code used Promise.allSettled(chunks.map(...enqueue...)) to fire all enqueue requests concurrently. The new code iterates sequentially with await inside a for loop. For deployments with many cleanup chunks — e.g. thousands of enterprise workspaces — this multiplies dispatch latency by the number of chunks and holds the caller for the full duration. Restoring parallelism (or at least batching) would avoid this regression for large deployments.

  3. apps/sim/lib/billing/cleanup-dispatcher.ts, line 210-226 (link)

    P2 Global housekeeping now unconditionally creates an extra empty chunk

    The previous logic searched for an existing plan chunk and set runGlobalHousekeeping = true on it, only synthesizing a new empty chunk when none existed. The refactored code always appends a separate empty synthetic chunk with runGlobalHousekeeping = true, even when real workspace chunks for the same plan already exist. For Trigger.dev dispatch this means an extra job is triggered on every run. While the correctness is preserved, the waste is worth confirming as intentional.

  4. apps/sim/lib/core/utils/stream-limits.ts, line 529-545 (link)

    P2 arrayBuffer() then text() on the same response object

    When !response.body && response.arrayBuffer is true and arrayBuffer() returns a zero-length buffer, the code falls through and calls response.text(). For real Response objects, calling arrayBuffer() marks bodyUsed = true; a subsequent text() call throws TypeError: body stream already read. This path is reachable on responses with a null body stream (e.g., 204, 304) in environments that implement the Fetch spec strictly. If the intent is just an empty-body fallback, returning the empty buffer directly (or using the preferTextFallback path) is safer.

Reviews (1): Last reviewed commit: "fix merge conflict" | Re-trigger Greptile

Comment on lines +308 to 332
const inlineRunner = shouldExecuteInline() ? await buildCleanupRunner(jobType) : undefined
if (inlineRunner) {
let succeeded = 0
let failed = 0

for (const payload of chunks) {
try {
await inlineRunner(payload, new AbortController().signal)
jobIds.push(`inline:${jobType}:${payload.label}`)
succeeded++
} catch (error) {
failed++
logger.error(`[${jobType}] Inline cleanup chunk failed:`, {
plan: payload.plan,
label: payload.label,
error,
})
}
}

logger.info(`[${jobType}] Inline cleanup chunks: ${succeeded} succeeded, ${failed} failed`)
return { jobIds, jobCount: jobIds.length, chunkCount: chunks.length, workspaceCount }
}

if (isTriggerAvailable()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Inline runner now takes precedence over Trigger.dev

The execution order was reversed: inline execution is now checked and returned from before isTriggerAvailable(), whereas the old code checked Trigger.dev first and only fell back to the inline/job-queue path when Trigger.dev was unavailable. In any environment where both shouldExecuteInline() and isTriggerAvailable() return true simultaneously (e.g., database async backend with TRIGGER_SECRET_KEY set), cleanup jobs will now run synchronously in-process rather than through Trigger.dev, losing distributed concurrency control, visibility, and retry semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant