feat(scanner): derive skill/plugin source from install manifests#90
Open
SherlockSalvatore wants to merge 5 commits into
Open
feat(scanner): derive skill/plugin source from install manifests#90SherlockSalvatore wants to merge 5 commits into
SherlockSalvatore wants to merge 5 commits into
Conversation
… repo A skill installed into the shared ~/.agents/skills and symlinked into an agent home (e.g. ~/.claude/skills) was mis-attributed when that home sits inside a dotfiles git repo: detect_source walked the symlink's textual parents, hit ~/.claude/.git, and recorded the dotfiles repo as the source. The git-source backfill then stamped install_type='git' + pack, forking one on-disk skill into two Extension rows (marketplace vs dotfiles). - scanner: canonicalize the skill path before detect_source so a symlinked skill is attributed to its real content's source; plain skills resolve to the same tree and are unchanged. - store: self-heal existing DBs by clearing the bogus git install_meta (and the pack derived from it) for skill rows whose on-disk entry is a symlink and whose freshly-scanned source is non-git; real git installs are plain files, never symlinks, so they are untouched. Run before backfill_packs so cleared rows do not re-acquire a pack. Fixes RealZST#87
HarnessKit inferred an extension's source by walking up to the nearest enclosing .git, mis-attributing everything under a dotfiles-managed agent home (e.g. ~/.claude) to that backup repo. Read each tool's own install manifest instead, falling back to .git detection only when absent. - Skills: override detect_source with <root>/.skill-lock.json (the skills CLI lockfile), matched by on-disk folder name, cached per lockfile. - Plugins: PluginEntry gains source_url; the Claude adapter fills it from plugins/known_marketplaces.json (github marketplaces -> owner/repo); scan_plugins prefers it over the .git walk. Other adapters unchanged. Regression tests: lockfile beats a populated enclosing repo (and a sibling skill absent from the lock falls back); a plugin is attributed to its marketplace repo. Refs RealZST#89. Builds on RealZST#88 canonicalize in scan_skill_dir.
The scanner now resolves the real source from install manifests, but the git-source backfill only writes install_meta when install_type IS NULL, so a row stamped in an earlier sync (e.g. a plugin first attributed to the enclosing dotfiles repo) keeps its stale install_url. deriveExtensionUrl prefers install_url, so the corrected source_json.url stayed shadowed and the extension lingered in the wrong group. Add refresh_stale_git_install_meta: for install_type='git' skill/plugin rows whose install_url owner/repo differs from the authoritative source_json.url owner/repo, realign install_url/revision and clear the now-stale branch/subpath + pack (re-derived by backfill_packs). Compare by pack, not raw URL string, so a legitimate install recorded as ".../repo" is not churned against the scanner's ".../repo.git" remote every sync (which would wipe its pinned revision and check state). Runs in both sync paths after the symlink heal, before backfill_packs. This is the store half of the manifest source-resolution fix (parallel to the RealZST#88 self-heal).
61c6665 to
d6d3b8d
Compare
….md skills Two review findings on the manifest source-resolution change: - refresh_stale_git_install_meta compared pack owner/repo case-sensitively, so a stored `Owner/Repo` vs a scanned `owner/repo` (GitHub is case-insensitive) would churn the row every sync, wiping its pinned revision/check state. Compare lowercased. Add a same-repo case-only regression row that must stay intact. - scan_skill_dir's lockfile key used path.file_name() unconditionally, so a standalone `<name>.md` skill would key on `name.md` and miss its lock entry. Key dir skills by folder name (file_name, dot-safe) and standalone .md by stem.
The web and desktop `update_extension` endpoints validated install_meta and install_type but not kind. The update path clones a repo and deploys it as a skill, so passing a non-skill id ran the skill-update logic on it. This became reachable once manifest source-resolution stamps plugins with install_type='git'. Add the same `is_update_eligible` gate the bulk update path already uses (rejects non-skill kinds), so a plugin id can no longer enter the skill clone+deploy flow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #89.
Problem
scanner::detect_sourceinfers an extension's source by walking up to the nearest enclosing.git. When an agent home (e.g.~/.claude) is kept under the user's own dotfiles repo, everything beneath it — plugins cached in~/.claude/plugins/cache/<marketplace>/..., skills, etc. — is mis-attributed to that backup remote, collapsing many unrelated sources into one bogus group. PR #88 fixed only the symlinked-skill slice; this addresses the root cause for skills and plugins.Fix
Read each tool's own install manifest as the authoritative source; fall back to
detect_sourceonly when there is no manifest entry.<root>/.skill-lock.json(the skills CLI lockfile), matched by the on-disk folder name (how the CLI keys it), parsed once per lockfile and cached. A real git checkout's commit hash is preserved.PluginEntrygainssource_url; the Claude adapter fills it fromplugins/known_marketplaces.json(github marketplaces →owner/repo);scan_pluginsprefers it over the.gitwalk. Non-github marketplaces are skipped so we never fabricate a wrong URL. Other adapters passNone(behaviour unchanged).Testing
cargo test -p hk-core— 531 passed, incl. two new regression tests:test_skill_lock_overrides_enclosing_git_source— the lockfile beats a populated enclosing repo; a sibling skill absent from the lock still falls back; the override matches by folder name even when frontmatternamediffers.test_scan_plugins_attributes_to_marketplace_repo— a plugin is attributed to its marketplace's upstream repo.cargo clippy -p hk-core— clean for the touched files.Notes
canonicalizeinscan_skill_dir). Until fix(skills): attribute symlinked skills to real source, not enclosing repo #88 merges this PR's range includes fix(skills): attribute symlinked skills to real source, not enclosing repo #88's commit; I'll rebase ontomainonce fix(skills): attribute symlinked skills to real source, not enclosing repo #88 lands.source_urlresolution later; once manifest resolution is in, the store's git-source backfill can be narrowed.Store self-heal (existing DBs)
The scanner change fixes attribution going forward, but on an existing DB the git-source backfill had already stamped
install_url/packfrom the old (wrong) source, and the backfill only runs oninstall_type IS NULLso it never refreshes them — andderiveExtensionUrlprefersinstall_url, so the correctedsource_json.urlstays shadowed.refresh_stale_git_install_meta(parallel to #88's self-heal) realignsinstall_type='git'skill/plugin rows whose recorded owner/repo differs from the now-authoritativesource_json.urlowner/repo, clearing the stale branch/subpath + pack (re-derived bybackfill_packs). It compares by pack, not raw URL string, so a legitimate install recorded as…/repois not churned against the scanner's…/repo.gitremote every sync (which would wipe its pinned revision/check state). Verified on a real polluted DB: 13 plugins re-attributed to their marketplaces, same-repo and self-authored skills left untouched.Known properties / limitations
refresh_stale_git_install_metapersists the realignment, so reverting the code does not roll the data back (the old backfill only fires oninstall_type IS NULL). In practice the only values it overwrites are ones whose owner/repo genuinely changed — i.e. stale pollution — and pack-gating leaves same-repo rows (incl..git-suffix and case-only variants) untouched, so a legitimately pinned revision is not lost.owner/repo(e.g. gitlab→github) is treated as the same repo and not realigned; the stored URL keeps the old host. This is intentional (it's what avoids churning.git-suffix/case/scheme variants) and harmless for grouping, which keys on owner/repo.detect_source(no fabricated URL); plugins from such markets under a git-managed home stay mis-attributed. Tracked for follow-up in Derive extension source from the tool's install manifest, not the enclosing .git #89, alongside Codex's per-plugin manifest.Known interactions
skills-CLI-installed skill resolves to its real upstream, it becomes update-eligible, and the update path clones that repo anddeploy_skills it over every same-named installed copy — including the canonical~/.agents/skills/<name>that the externalskillsCLI manages (and any agent home symlinked to it). The files get updated but the CLI's.skill-lock.jsonhash/commit is not, so the two can drift. This is pre-existing behaviour (the marketplace-attributed sibling copies already triggered the same clone+deploy), widened here because the symlinked copy is now also an entry point. Flagged for maintainer awareness; not changed in this PR.Decision for maintainer
This PR's blast radius is intentionally bounded: it changes attribution (and a one-time, idempotent, pack-gated DB realignment), and the one potentially destructive interaction — updating an externally-managed skill — is user-triggered (nothing happens until someone clicks Update) and pre-existing (marketplace-attributed copies already cloned + deployed over the same files). So it is safe to ship as-is.
What's left is a policy choice that should be yours, not silently encoded here: should HarnessKit offer Update/Delete on skills that an external tool (the
skillsCLI, tracked in~/.agents/.skill-lock.json) manages? Now that those skills resolve to a real upstream, they become update-eligible, anddeploy_skillwrites through the agent-home symlink onto the CLI-managed~/.agents/skills/<name>, leaving the CLI's lockfile hash stale.Two directions, for you to pick before any further change:
skillsCLI" notice. Minimal, safe, hands-off. (My recommendation.).skill-lock.json(or shells out to theskillsCLI) after an update. Complete, but couples HarnessKit to the external tool's format and lifecycle.This PR deliberately stays at attribution only. If you favour a direction, I'll do it as a separate follow-up (alongside a small, safe hardening: matching
find_skill_in_repoby folder as well as name, so a skill whose frontmatter name differs from its repo folder still updates). Tracked with the manifest follow-ups in #89.