Skip to content

feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100

Open
WhereIs38 wants to merge 1 commit into
NVIDIA:mainfrom
WhereIs38:feature/multilingual-batch-scanner
Open

feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100
WhereIs38 wants to merge 1 commit into
NVIDIA:mainfrom
WhereIs38:feature/multilingual-batch-scanner

Conversation

@WhereIs38

@WhereIs38 WhereIs38 commented Jun 18, 2026

Copy link
Copy Markdown

Closes #98

Summary

Adds contrib/multilingual/ — a multilingual batch scanner that scans directories of AI agent skills in parallel, with automatic language detection and targeted LLM gap-fill for non-English skills.

Zero changes to src/skillspector/. All integration is via import-time patches that wrap upstream constructors without modifying any source file.

Why this module exists

The upstream project scans one skill at a time — great for depth, but serial execution means LLM latency stacks linearly. I needed to scan many skills quickly, so this module avoids serial bottlenecks by design.

Scale. Each skill runs in an isolated thread via ThreadPoolExecutor. With enough API keys, adding workers cuts total scan time proportionally — 23 skills finish in ~2 minutes at 8 workers, roughly one human-agent conversation round. The ceiling is the user's key count, not the code: 100 keys scanning 2000 skills still finish in minutes.

Cost. Parallel scanning means high token throughput, so I chose DeepSeek — the cheapest per-token option — for development and testing. The module itself is provider-agnostic: any OpenAI-compatible endpoint works. I couldn't test local models due to hardware constraints (Mac with limited RAM, a 4 GB VRAM Windows machine). That remains a known gap I hope someone with better hardware can fill.

Compatibility. The module is tested on macOS and Windows. runner.py applies a small set of import-time patches so DeepSeek works out of the box; the patches follow standard OpenAI-compatible protocol, so Ollama and other endpoints should work as well. All patches are non-invasive and self-contained within contrib/multilingual/.

In short: upstream provides the detection algorithms; this contrib provides the reach. If accepted, I'm interested in continuing to improve scalability and external provider compatibility upstream.

What It Does

  1. Discovery — recursively finds all SKILL.md directories under input root
  2. Language detection — Unicode script-ratio heuristic, extending support to Chinese, Japanese, and Korean
  3. Parallel scanThreadPoolExecutor runs graph.invoke() per skill, configurable --workers
  4. Gap-fill — targeted LLM pass for 8 rules with no semantic-analyzer equivalent (P5, P6-P8, MP1-MP3, RA1-RA2)
  5. Aggregated report — terminal / JSON / Markdown, sorted by risk score
  6. Multi-key API pool — rate-limit-aware scheduler with exponential backoff

Evidence (23 built-in fixtures, 8 workers)

Skill --no-llm LLM mode
ssd1_semantic_injection 0/100 100/100
ssd3_nl_exfiltration 0/100 60/100
ssd4_narrative_deception 10/100 100/100
sdi4_divergence 13/100 100/100
safe_skill 0/100 0/100 ✓
ssd_clean 0/100 0/100 ✓

LLM semantic analyzers catch entire vulnerability categories invisible to static patterns. Clean skills remain clean — zero false-positive inflation.

How to verify

Prerequisites

Create .env in the repo root with 10 different DeepSeek API keys (the ApiKeyPool rotates across keys to avoid rate-limiting):

cp contrib/multilingual/.env.example .env

Edit .env and fill in:

SKILLSPECTOR_PROVIDER=openai
SKILLSPECTOR_MODEL=deepseek-v4-flash
OPENAI_BASE_URL=https://api.deepseek.com/v1

SKILLSPECTOR_API_KEYS="
  sk-or-xxx1|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx2|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx3|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx4|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx5|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx6|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx7|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx8|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx9|https://api.deepseek.com/v1|deepseek-v4-flash
  sk-or-xxx10|https://api.deepseek.com/v1|deepseek-v4-flash
"

Activation

source .venv/bin/activate

Unit tests (no API keys needed, < 2s)

pytest contrib/multilingual/tests/ -v

Test 1 — Static mode (no LLM required, ~0.7s, default 4 workers)

python -m contrib.multilingual.batch_scan tests/fixtures/ --no-llm -f terminal

Expected: 23/23 skills, ~0.7 s, 8 CRITICAL / HIGH findings.

Test 2 — LLM parallel mode (requires API keys, ~2 min)

python -m contrib.multilingual.batch_scan tests/fixtures/ -f terminal --workers 8

Expected: 23/23 skills, ~2 min, 15 CRITICAL / HIGH findings (LLM catches semantic injection, narrative deception, and other vulnerabilities that static patterns miss).

Test 3 — Single-worker mode (for free-tier API keys)

python -m contrib.multilingual.batch_scan tests/fixtures/ -f terminal --workers 1

Testing

18 unit tests in contrib/multilingual/tests/ cover discovery, language detection, JSON / Markdown report formatting, and an end-to-end --no-llm scan. Deterministic components are fully covered. LLM-dependent output is inherently non-deterministic and requires live API keys — the static-vs-LLM comparison in README provides more meaningful evidence for those paths than any mock-based test could. make lint passes on the upstream codebase.


🤖 Generated with Claude Code

Signed-off-by: WhereIs38 CinderellaDoyle@icloud.com

README.md
DESIGN.md
CONTRIBUTING.md

@WhereIs38 WhereIs38 force-pushed the feature/multilingual-batch-scanner branch 2 times, most recently from e4ecae7 to a32aa67 Compare June 19, 2026 08:06
…gap-fill

- Parallel scan via ThreadPoolExecutor, configurable --workers
- Unicode-based language detection (zh, ja, ko)
- LLM gap-fill for 8 rules with no semantic-analyzer equivalent
- Aggregated terminal / JSON / Markdown reports
- Multi-key API pool with exponential backoff
- Zero changes to src/skillspector/
- Cross-platform: macOS + Windows

Signed-off-by: WhereIs38 <CinderellaDoyle@icloud.com>
@WhereIs38 WhereIs38 force-pushed the feature/multilingual-batch-scanner branch from a32aa67 to 22de8d6 Compare June 19, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: multilingual batch scanner with parallel execution and LLM gap-fill

1 participant