Skip to content

fix(parser): #283 collect errors from all erroneous statements in multi-statement input#470

Merged
Cythia828 merged 2 commits into
DTStack:nextfrom
liuxy0551:fix_283
May 20, 2026
Merged

fix(parser): #283 collect errors from all erroneous statements in multi-statement input#470
Cythia828 merged 2 commits into
DTStack:nextfrom
liuxy0551:fix_283

Conversation

@liuxy0551
Copy link
Copy Markdown
Collaborator

@liuxy0551 liuxy0551 commented May 12, 2026

  1. 某些语句遇到错误没有全部捕获 #283

简介

问题原因

当输入 SELEC * from table1; SELEC * from table2; 时:

  1. SELEC 不是合法关键字,被词法分析器识别为 ID 类型(token 889)
  2. while 循环检查 LA(1)(即当前 token),发现 ID(889)不在位掩码中
  3. 循环体根本不执行,直接跳到 match(EOF)
  4. match(EOF) 发现当前 token 不是 EOF,抛出异常
  5. catch 块捕获异常,调用 reportError() 报告一个错误
  6. 调用 recover() 消费掉所有剩余 token 直到 EOF
  7. 方法返回,第二个 SELEC 语句的错误被完全吞掉了

简单来说:当 SQL 语句以拼写错误的关键词开头时,解析器根本不尝试解析后续语句,直接把后面所有内容都跳过了,所以只能报出一个错误。

修复方案

validate() 方法中增加了一个"分而治之"的策略:

  1. 先用原始方式解析整个输入,拿到初始错误列表
  2. 如果有错误,且输入中包含多个 ; 分隔的语句,则按 ; 拆分输入
  3. 对每个语句片段独立创建解析器进行验证
  4. 只有当拆分验证发现的错误多于原始解析时,才使用拆分的结果

这样做的好处是:

  • 修复了问题:SELEC * from t1; SELEC * from t2; 现在能正确报出 2 个错误
  • 不影响正常 SQL:原始解析成功时(0 个错误)不会触发拆分逻辑
  • 兼容 BEGIN...END 块:不会错误地拆分 BEGIN STATEMENT SET; ... END; 这类跨分号的语法结构

monaco-sql-languages 使用 dt-sql-parser@4.5.0-beta.3 即可预览效果,因此不再单独提供 PR 和预览地址。

image

遗留点

借助分号进行切分,如果 SQL 语句没有写分号,依旧会报一个错。没有分号的多条语句此时校验优先级不高,暂不处理。

@liuxy0551 liuxy0551 requested a review from Cythia828 May 12, 2026 03:34
@Cythia828 Cythia828 added the 5.29 label May 19, 2026
@Cythia828 Cythia828 merged commit 0041ed2 into DTStack:next May 20, 2026
6 checks passed
Cythia828 added a commit that referenced this pull request May 20, 2026
* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* chore: remove duplicate changelog in v4.4.1

* chore(release): 4.5.0-beta.0

* Next merge main (#468)

* fix(flink): #455 fix json functions' params problem in flink

* fix(flink): some grammar rules (#465)

* fix: #464 order by + expression

* fix: #464 EXTRACT function

* test: #464 flink JSON_VALUE RETURNING

* chore(release): 4.4.2

---------

Co-authored-by: zhaoge <>
Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input (#470)

* test(parser): #283 add multi-statement error validation tests for all dialects

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input

* feat: add generic SQL language support (#469)

* fix(generic): fix INTERSECT/EXCEPT support, trim keywords to ~90

- Add INTERSECT and EXCEPT to queryNoWith rule for set operations
- Remove 173 unused KW_* lexer rules for removed features (views, indexes,
  grants, transactions, stored procedures, window functions, triggers, etc.)
- Trim nonReserved list to only keywords actually used in parser rules
- Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules
- Keyword count reduced from 263 to 90 (close to ~100 target)
- All 197 test suites pass (5627 tests)

* fix(generic): reserve core structural keywords and add DIGIT_IDENTIFIER

- Remove core structural keywords from nonReserved so they cannot be
  used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT,
  UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON,
  UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE,
  IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT,
  PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN,
  RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF
- Add DIGIT_IDENTIFIER lexer token for identifiers starting with a
  digit (e.g. 123abc, 1st_column)
- Include DIGIT_IDENTIFIER in identifier rule alternatives

* fix(generic): add missing Listener/Visitor exports and diagnostics option

- Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts
- Add GenericSQLOptions interface with configurable diagnostics flag
- Override validate() to return empty array when diagnostics disabled
- Export GenericSQLOptions type from src/index.ts

* fix(generic): add QUERY_RESULT and SELECT column entity collection

- Add exitQuerySpecification for QUERY_RESULT entity tracking
- Add exitSelectItem for column entity collection in SELECT clauses
- Track wildcard columns (ColumnDeclareType.ALL) for * and table.*
- Track expression columns with alias support (ColumnDeclareType.EXPRESSION)
- Stage previously untracked files (errorListener, splitListener, semanticContextCollector)

* test: add GenericSQL tests

* test: ensure all dialect tests pass with GenericSQL

* test(generic): add more sql test

- Add comprehensive syntax tests for all supported statement types
- Add context collect tests for entity and semantic collectors
- Add suggestion tests for token, syntax, and multi-statement scenarios
- Add error strategy, listener, visitor, and validation tests
- Fix entity collector to distinguish simple columns from expressions

* feat: match empty column when in entityCollecting context (#457) (#472)

* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants