Skip to content

fix: reset decoder state after decode errors#3

Merged
nic-6443 merged 2 commits into
mainfrom
codex/fix-decode-error-state
May 28, 2026
Merged

fix: reset decoder state after decode errors#3
nic-6443 merged 2 commits into
mainfrom
codex/fix-decode-error-state

Conversation

@jarvis9443
Copy link
Copy Markdown

@jarvis9443 jarvis9443 commented May 27, 2026

This fixes a decoder state leak after simdjson on-demand parsing errors.

When simdjson_ffi_next() hit an error while walking nested arrays or objects, it returned the error but left state.frames intact. Reusing the same parser for a later valid JSON body could then continue from stale iterator frames and eventually report simdjson: error: trailing content found.

The fix resets decoder state before starting a new parse and also resets it in the parse/next error paths. This keeps parser reuse safe after failed decodes, which matches the documented usage pattern of reusing a parser instance across multiple decode() calls.

A regression case was added for the failure sequence:

  1. decode a nested string error: {"model":["\\uD800"]}
  2. decode valid OpenAI-compatible JSON
  3. decode another valid OpenAI-compatible JSON

Before this fix, the later valid decode failed with trailing content found.

Validation:

PATH=/usr/local/openresty/nginx/sbin:$PATH prove -v t/02-decode.t
PATH=/usr/local/openresty/nginx/sbin:$PATH make test

Summary by CodeRabbit

  • Bug Fixes

    • Parser now fully resets its internal decode state after encountering a decoding error, preventing leftover state from affecting subsequent parses.
  • Tests

    • Added a test to verify recovery from a nested string decoding failure and successful reuse of the same parser for later valid payloads.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62c59a94-958a-49c2-a844-f104c03b81ad

📥 Commits

Reviewing files that changed from the base of the PR and between d5439cc and 979e314.

📒 Files selected for processing (1)
  • t/02-decode.t

📝 Walkthrough

Walkthrough

New decoder-state reset helper clears opcode counter and frame stack; the helper is called before parse initialization and in parse/next error paths. A test is added to verify the parser recovers after a nested string decode error and can decode subsequent valid payloads.

Changes

Decoder State Recovery on Parse Errors

Layer / File(s) Summary
Decoder state reset helper
src/simdjson_ffi.cpp
New static helper simdjson_ffi_reset_decoder_state() zeroes the opcode batch counter and clears the decoder frames stack.
Parser integration of state reset
src/simdjson_ffi.cpp
Helper called before state->document initialization in simdjson_ffi_parse(), and added to error handlers in simdjson_ffi_parse() and simdjson_ffi_next() to clear state on failure.
Test coverage for parser recovery
t/02-decode.t
Adds TEST 12: parser recovers after nested string decode error that asserts an initial decode error returns nil and STRING_ERROR, then reuses the same parser to decode valid payloads; also updates nearby test numbering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • nic-6443
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: reset decoder state after decode errors' accurately and concisely describes the main change across both files: implementing decoder state reset logic in error paths and before parsing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
E2e Test Quality Review ✅ Passed Real E2E test using Nginx/Lua server and simdjson library validates parser recovery after nested errors; all assertions clear, no dependencies, perfect PR alignment.
Security Check ✅ Passed PR is a JSON parser bugfix (state reset) with no logging/credential/auth/TLS/DB code. All 7 security categories show no vulnerabilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-decode-error-state

Comment @coderabbitai help to get the list of available commands and usage tips.

@nic-6443 nic-6443 requested a review from membphis May 28, 2026 10:04
Copy link
Copy Markdown

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread t/02-decode.t Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@nic-6443 nic-6443 merged commit 4c52029 into main May 28, 2026
7 checks passed
@nic-6443 nic-6443 deleted the codex/fix-decode-error-state branch May 28, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants