Implement zero-dependency Wasm binary parser#754
Open
bbyalcinkaya wants to merge 18 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a new zero-dependency binary parser directly in
pykwasmthat parses Wasm bytecode and produces KAst terms of wasm-semantics. Previously, our parser relied on a py-wasm fork, as described in issue #524. That dependency has been unmaintained for years and limited our ability to implement fixes, add new features, and stay up-to-date with the Wasm specification. This new parser eliminates the py-wasm dependency.
Changes
BlockMetaDatafield, which was previously used for coverage trackingImplement differential testing to compare the old py-wasm-based parser with the new parserSpec Divergences in the Binary Parser
The parser follows the current WebAssembly spec (WASM 3.0) for decoding, but the K semantics predates several additions. In the places below, bytes are decoded correctly but information is discarded because the semantics has no representation for it.
Blocktype type-index form (
instructions.py:47) — Ablocktypecan be a type index encoded asi33. Only the empty and single-valtype cases are handled; thei33case raises a parse error. The K semantics block types cannot reference the type section.Typed select (
instructions.py:83) — The typedselectvariant (0x1C) carries a value-type vector. It is parsed and discarded; the untypedSELECTnode is emitted. The K semantics has a single untypedselect.Memory load/store memarg (
instructions.py:170) —memargencodes an alignment hint and an optional memory index alongside the offset. Only the offset is forwarded to the AST; alignment and memory index are discarded. Affects all 23 load/store opcodes. The K semantics models a single memory with offset-only instructions.memory.size/memory.growmemory index (instructions.py:241) — Both instructions encode a memory index with the multi-memory proposal. It is parsed and discarded; the K semantics has a single implicit memory.Recursive types (
module.py:78) — The spec wraps all type definitions inrectypegroups that can be mutually recursive. We only accept singleton groups; actual recursion raises a parse error. The K semantics has no recursive type construct.Table inline initializer (
module.py:119) — The spec allows an optional inline initializer for tables (prefix0x40 0x00). Encountering it raises a parse error. No counterpart exists in the K semantics.Address type in limits (
types.py:117) — Limit encodings0x02/0x03signal 64-bit addressing. The address type is parsed but discarded by every caller (tablesec,memsec,externtype_as_import_desc). The K semantics assumes 32-bit addressing throughout.Import section externtype (
types.py:174) — Import descriptors are encoded as WASM 3.0externtypebut returned as WASM 1.0ImportDescnodes. The tag type variant (0x04) raises a parse error. The KImportDefnsort still matches the WASM 1.0 structure.