Fix: Extended Query Protocol Parse packets corrupted by interceptor by nik-localstack · Pull Request #18 · localstack/postgresql-proxy

nik-localstack · 2026-05-22T13:20:03Z

Motivation

psycopg v3 (and asyncpg) use the Extended Query Protocol, sending a Parse packet whose body ends with binary uint32 parameter-type OIDs.
The interceptor sliced the body with data[1:-2], leaking those OID bytes into a UTF-8 decode.
The jsonb OID (3802 = 0x0EDA) contains 0xDA which is invalid UTF-8 and as a result crashing the interceptor and leaving the connection in a broken state.

Fix

Use find(b"\x00") to locate the true null-terminators of the statement name and query text, so the binary OID suffix is never fed to the text decoder.

Tests

A new e2e regression test that connects via psycopg v3 through the proxy and executes a parameterized INSERT with jsonb and text arguments.

…y OID suffix The Parse packet body (type 'P') has the format: statement_name\x00 + query\x00 + int16(param_count) + uint32[] OIDs The old handler used data[1:-2] / data[-2:] which treated only the last 2 bytes as the param-count field, leaking binary OID bytes into the query slice fed to _intercept_query(). When a parameter type OID contains a byte >= 0x80 (e.g. jsonb OID 3802 = 0x00000EDA), the subsequent UTF-8 decode raised UnicodeDecodeError, causing the connection to drop or hang. Fix: use find(b'\x00') to locate the true boundaries of statement name and query text, so the binary suffix is never touched by the text decoder. Adds a regression test using psycopg v3 (Extended Query Protocol) with a jsonb+text parameterized INSERT through the proxy. Co-authored-by: GitHub Copilot <copilot@github.com>

bentsku

Ah, that's interesting! Nice find!

I looked a bit in the PG docs to find more info about the structure of the Parse message to understand it.

I mostly have a comment about the nested find() usage, and about the testing, if we could add some Interceptors in our tests to be able to assert what is intercepted so we could make sure we're properly understanding the requests.

Would that make sense?

bentsku · 2026-05-26T11:27:45Z

+                # Parse packet body:
+                #   statement_name\x00 + query\x00 + int16(param_count) + uint32[]
+                # Keep the binary suffix untouched (count + OID array).
+                statement_end = data.find(b"\x00")


I wonder if find() is the best tool here when we need to find multiple times? split might be easier but I can see the null byte being part of the query being an expectation of _intercept_query, so this isn't great as it removes it...

split_data = data.split(b"\x00") if len(split_data) == 3: statement, raw_query, params = split_data query = self._intercept_query(raw_query, ic_queries) # I would have preferred to use `b"\x00".join` but `_intercept_query` adds the null byte at the end and is patched in LocalStack, so better not data = statement + b"\x00" + query + params

And in _intercept_query, we could do this:

# Remove null byte at the end - query = query[:-1].decode(codec) + query = query.rstrip(b"\x00").decode(codec)

Or if we keep the find() one, I would maybe simplify it a little:

statement_end = data.find(b"\x00") query_end = data.rfind(b"\x00") if statement_end != -1 and query_end != -1: query_start = statement_end + 1 # _intercept_query expects the null byte separator ending the query part query = self._intercept_query( data[query_start : query_end + 1], ic_queries ) # return statement + rewritten query + parameters data = data[:query_start] + query + data[query_end + 1 :]

bentsku · 2026-05-26T11:28:57Z

-                params = data[-2:]
-                data = statement + query + params
+                # Parse packet body:
+                #   statement_name\x00 + query\x00 + int16(param_count) + uint32[]


I found this for the Parse message type:

Parse (F)

Byte1('P')
Identifies the message as a Parse command.

Int32
Length of message contents in bytes, including self.

String
The name of the destination prepared statement (an empty string selects the unnamed prepared statement).

String
The query string to be parsed.

Int16
The number of parameter data types specified (may be zero). Note that this is not an indication of the number of parameters that might appear in the query string, only the number that the frontend wants to prespecify types for.

Then, for each parameter, there is the following:

Int32
Specifies the object ID of the parameter data type. Placing a zero here is equivalent to leaving the type unspecified.

And we can see String is separated by null byte here:

String(s)
A null-terminated string (C-style string). There is no specific length limitation on strings. If s is specified it is the exact value that will appear, otherwise the value is variable. Eg. String, String("user").

bentsku · 2026-05-26T11:50:02Z

related to the issue, I wonder if we should add a dummy QueryInterceptor here in the proxy and store what we intercept, and assert on the intercepter data, that we have exactly what we are expecting? This way, we could really assert that the interceptor is working as expected, if that's easy enough to add

bentsku · 2026-05-26T12:16:38Z

                # Query, ends with b'\x00'
                data = self._intercept_query(data, ic_queries)
            elif packet_type == b"P":
-                # Statement that needs parsing.


might be worth keeping/updating the comment that this is a Parse message with statements?

nik-localstack self-assigned this May 22, 2026

nik-localstack force-pushed the fix/parametrized-upsert branch from 2964192 to a143e8b Compare May 23, 2026 10:10

nik-localstack mentioned this pull request May 23, 2026

Fix query encoding #19

Open

nik-localstack marked this pull request as ready for review May 23, 2026 10:27

nik-localstack requested review from bentsku and cloutierMat May 25, 2026 07:25

bentsku reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Extended Query Protocol Parse packets corrupted by interceptor#18

Fix: Extended Query Protocol Parse packets corrupted by interceptor#18
nik-localstack wants to merge 1 commit into
masterfrom
fix/parametrized-upsert

nik-localstack commented May 22, 2026 •

edited

Loading

Uh oh!

bentsku left a comment

Uh oh!

bentsku May 26, 2026

Uh oh!

bentsku May 26, 2026

Uh oh!

bentsku May 26, 2026

Uh oh!

bentsku May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nik-localstack commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Fix

Tests

Uh oh!

bentsku left a comment

Choose a reason for hiding this comment

Uh oh!

bentsku May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bentsku May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bentsku May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bentsku May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nik-localstack commented May 22, 2026 •

edited

Loading