Skip to content

Implement zero-dependency Wasm binary parser#754

Open
bbyalcinkaya wants to merge 18 commits into
masterfrom
binary-parser
Open

Implement zero-dependency Wasm binary parser#754
bbyalcinkaya wants to merge 18 commits into
masterfrom
binary-parser

Conversation

@bbyalcinkaya

@bbyalcinkaya bbyalcinkaya commented Apr 3, 2026

Copy link
Copy Markdown
Member

This PR adds a new zero-dependency binary parser directly in pykwasm that parses Wasm bytecode and produces KAst terms of wasm-semantics. Previously, our parser relied on a py-wasm fork, as described in issue #524
. That dependency has been unmaintained for years and limited our ability to implement fixes, add new features, and stay up-to-date with the Wasm specification. This new parser eliminates the py-wasm dependency.

Changes

  • Implement the Wasm binary parser
  • Enable all binary parsing integration tests for the new parser
  • Add unit tests for individual parsing components
  • Removed the unused BlockMetaData field, which was previously used for coverage tracking
  • Implement differential testing to compare the old py-wasm-based parser with the new parser

Spec Divergences in the Binary Parser

The parser follows the current WebAssembly spec (WASM 3.0) for decoding, but the K semantics predates several additions. In the places below, bytes are decoded correctly but information is discarded because the semantics has no representation for it.

Blocktype type-index form (instructions.py:47) — A blocktype can be a type index encoded as i33. Only the empty and single-valtype cases are handled; the i33 case raises a parse error. The K semantics block types cannot reference the type section.

Typed select (instructions.py:83) — The typed select variant (0x1C) carries a value-type vector. It is parsed and discarded; the untyped SELECT node is emitted. The K semantics has a single untyped select.

Memory load/store memarg (instructions.py:170) — memarg encodes an alignment hint and an optional memory index alongside the offset. Only the offset is forwarded to the AST; alignment and memory index are discarded. Affects all 23 load/store opcodes. The K semantics models a single memory with offset-only instructions.

memory.size / memory.grow memory index (instructions.py:241) — Both instructions encode a memory index with the multi-memory proposal. It is parsed and discarded; the K semantics has a single implicit memory.

Recursive types (module.py:78) — The spec wraps all type definitions in rectype groups that can be mutually recursive. We only accept singleton groups; actual recursion raises a parse error. The K semantics has no recursive type construct.

Table inline initializer (module.py:119) — The spec allows an optional inline initializer for tables (prefix 0x40 0x00). Encountering it raises a parse error. No counterpart exists in the K semantics.

Address type in limits (types.py:117) — Limit encodings 0x02/0x03 signal 64-bit addressing. The address type is parsed but discarded by every caller (tablesec, memsec, externtype_as_import_desc). The K semantics assumes 32-bit addressing throughout.

Import section externtype (types.py:174) — Import descriptors are encoded as WASM 3.0 externtype but returned as WASM 1.0 ImportDesc nodes. The tag type variant (0x04) raises a parse error. The K ImportDefn sort still matches the WASM 1.0 structure.

@bbyalcinkaya bbyalcinkaya marked this pull request as ready for review June 25, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants