Skip to content

djdevpro/doclith

Repository files navigation

doclith

TypeScript/JavaScript implementation of DocLang v0.6: lossless XML parsing, typed AST model, serialization, validation (XSD- + Schematron-equivalent) and CLI — plus converters Markdown / HTML / PDF → DocLang.

✅ Core MVP (core, xml, validator, cli) and converters (markdown, html, pdf) fully functional and tested (clean architecture, TDD).

Packages

Package Role Status
@doclith/core Pure domain: lossless AST, constants, errors, traversal
@doclith/xml XML parsing (anti-XXE) + serialization (round-trip)
@doclith/validator XSD-equivalent + Schematron-equivalent validation (13 rules)
@doclith/cli CLI doclang validate (text/json, exit codes 0/1/2)
@doclith/markdown Markdown (GFM) → DocLang
@doclith/html HTML → DocLang (parse5)
@doclith/pdf PDF → DocLang (pdfjs, text+structure, no OCR)

API

import { parseDocLang, serializeDocLang } from "@doclith/xml";
import { validateDocLang } from "@doclith/validator";

const doc = parseDocLang(xml); // lossless AST (node order + mixed text preserved)
const result = validateDocLang(doc, { allowEmptyNamespace: false });
if (!result.valid) console.error(result.issues); // structured issues (code, source, path, …)

const out = serializeDocLang(doc, { pretty: true }); // lossless round-trip

CLI

doclang validate document.dclg.xml            # ✓/✗ + issues, exit 0/1
doclang validate document.dclg.xml -f json    # machine-readable JSON output
doclang validate document.dclg.xml --xsd-only # structure only
doclang validate document.dclg.xml -n         # tolerate missing namespace

Exit codes: 0 valid · 1 invalid · 2 usage/file error.

Converters → DocLang

import { markdownToDocLangXml } from "@doclith/markdown";
import { htmlToDocLangXml } from "@doclith/html";
import { pdfToDocLang } from "@doclith/pdf"; // async — text + structure, no OCR

const xml1 = markdownToDocLangXml("# Titre\n\n- a\n- b");
const xml2 = htmlToDocLangXml("<h1>Titre</h1><ul><li>a</li></ul>");
const doc = await pdfToDocLang(new Uint8Array(pdfBytes));

Each converter produces a valid document (verified in tests against XSD + Schematron).

Development

pnpm install
pnpm typecheck && pnpm lint && pnpm test && pnpm build

pnpm monorepo, strict TypeScript (ESM, NodeNext), Vitest. CI: lint + typecheck + build + test. License: Apache-2.0.

About

TypeScript/JavaScript implementation of DocLang v0.6 lossless XML parsing, typed AST model, serialization, validation (XSD- + Schematron-equivalent) and CLI plus converters Markdown / HTML / PDF → DocLang.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors