feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100
Open
WhereIs38 wants to merge 1 commit into
Open
feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100WhereIs38 wants to merge 1 commit into
WhereIs38 wants to merge 1 commit into
Conversation
e4ecae7 to
a32aa67
Compare
…gap-fill - Parallel scan via ThreadPoolExecutor, configurable --workers - Unicode-based language detection (zh, ja, ko) - LLM gap-fill for 8 rules with no semantic-analyzer equivalent - Aggregated terminal / JSON / Markdown reports - Multi-key API pool with exponential backoff - Zero changes to src/skillspector/ - Cross-platform: macOS + Windows Signed-off-by: WhereIs38 <CinderellaDoyle@icloud.com>
a32aa67 to
22de8d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #98
Summary
Adds
contrib/multilingual/— a multilingual batch scanner that scans directories of AI agent skills in parallel, with automatic language detection and targeted LLM gap-fill for non-English skills.Zero changes to
src/skillspector/. All integration is via import-time patches that wrap upstream constructors without modifying any source file.Why this module exists
The upstream project scans one skill at a time — great for depth, but serial execution means LLM latency stacks linearly. I needed to scan many skills quickly, so this module avoids serial bottlenecks by design.
Scale. Each skill runs in an isolated thread via
ThreadPoolExecutor. With enough API keys, adding workers cuts total scan time proportionally — 23 skills finish in ~2 minutes at 8 workers, roughly one human-agent conversation round. The ceiling is the user's key count, not the code: 100 keys scanning 2000 skills still finish in minutes.Cost. Parallel scanning means high token throughput, so I chose DeepSeek — the cheapest per-token option — for development and testing. The module itself is provider-agnostic: any OpenAI-compatible endpoint works. I couldn't test local models due to hardware constraints (Mac with limited RAM, a 4 GB VRAM Windows machine). That remains a known gap I hope someone with better hardware can fill.
Compatibility. The module is tested on macOS and Windows.
runner.pyapplies a small set of import-time patches so DeepSeek works out of the box; the patches follow standard OpenAI-compatible protocol, so Ollama and other endpoints should work as well. All patches are non-invasive and self-contained withincontrib/multilingual/.In short: upstream provides the detection algorithms; this contrib provides the reach. If accepted, I'm interested in continuing to improve scalability and external provider compatibility upstream.
What It Does
SKILL.mddirectories under input rootThreadPoolExecutorrunsgraph.invoke()per skill, configurable--workersEvidence (23 built-in fixtures, 8 workers)
--no-llmssd1_semantic_injectionssd3_nl_exfiltrationssd4_narrative_deceptionsdi4_divergencesafe_skillssd_cleanLLM semantic analyzers catch entire vulnerability categories invisible to static patterns. Clean skills remain clean — zero false-positive inflation.
How to verify
Prerequisites
Create
.envin the repo root with 10 different DeepSeek API keys (theApiKeyPoolrotates across keys to avoid rate-limiting):Edit
.envand fill in:Activation
source .venv/bin/activateUnit tests (no API keys needed, < 2s)
Test 1 — Static mode (no LLM required, ~0.7s, default 4 workers)
Expected: 23/23 skills, ~0.7 s, 8 CRITICAL / HIGH findings.
Test 2 — LLM parallel mode (requires API keys, ~2 min)
Expected: 23/23 skills, ~2 min, 15 CRITICAL / HIGH findings (LLM catches semantic injection, narrative deception, and other vulnerabilities that static patterns miss).
Test 3 — Single-worker mode (for free-tier API keys)
Testing
18 unit tests in
contrib/multilingual/tests/cover discovery, language detection, JSON / Markdown report formatting, and an end-to-end--no-llmscan. Deterministic components are fully covered. LLM-dependent output is inherently non-deterministic and requires live API keys — the static-vs-LLM comparison in README provides more meaningful evidence for those paths than any mock-based test could.make lintpasses on the upstream codebase.🤖 Generated with Claude Code
Signed-off-by: WhereIs38 CinderellaDoyle@icloud.com
README.md
DESIGN.md
CONTRIBUTING.md