Skip to content

validate utf-16 surrogate halves in decodeUnicodeCodePoint#1698

Open
SABITHSAHEB wants to merge 1 commit into
open-source-parsers:masterfrom
SABITHSAHEB:validate-surrogate-halves
Open

validate utf-16 surrogate halves in decodeUnicodeCodePoint#1698
SABITHSAHEB wants to merge 1 commit into
open-source-parsers:masterfrom
SABITHSAHEB:validate-surrogate-halves

Conversation

@SABITHSAHEB

Copy link
Copy Markdown
  1. after a high surrogate (D800-DBFF) the next \u escape is consumed and combined without checking it is a low surrogate, so "\uD801\u0041" or "\uD801\uD801" parse to a wrong astral code point and the second escape's real value is lost.
  2. a low surrogate that appears on its own (e.g. "\uDC00") is passed straight through and gets written out as the invalid UTF-8 bytes ED B0 80.
    Validated the low-surrogate range when completing a pair and rejected an unpaired low surrogate, in both Reader and OurReader. Added reader tests for the two new rejection paths; existing valid-pair cases are unaffected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant