Skip to content

fix(supply-chain): relative edit distance for SC6 typosquat detection#104

Open
yonatangross wants to merge 1 commit into
NVIDIA:mainfrom
yonatangross:fix/sc6-typosquat-relative-distance
Open

fix(supply-chain): relative edit distance for SC6 typosquat detection#104
yonatangross wants to merge 1 commit into
NVIDIA:mainfrom
yonatangross:fix/sc6-typosquat-relative-distance

Conversation

@yonatangross

Copy link
Copy Markdown

What

Fixes one of the false positives reported in #103: SC6 typosquat detection
uses an absolute edit-distance threshold of 2, which collides on short,
legitimate package names. task is a real package and is edit-distance 2
from flask, so it was flagged as a "possible typosquat".

Fix

Add a relative-distance guard to _is_typosquat: a genuine typosquat perturbs
only a small fraction of the name, so require dist / shorter_len <= 1/3.
Short names then need an all-but-one-character match, while longer names may
still differ by two characters.

candidate → popular len dist before after
reqeustsrequests 8 2 flag flag (unchanged)
expreessexpress 7/8 1 flag flag (unchanged)
taskflask 4/5 2 flag not flagged (FP fixed)

Test plan

  • Added a regression test (test_is_typosquat_short_distinct_name_not_flagged)
    asserting task/flask is not flagged while reqeusts/requests still is.
  • pytest tests/unit/test_patterns_new.py177 passed (all existing
    SC4/SC5/SC6 assertions preserved).
  • End-to-end: scanning a package.json with task + expreess deps now
    reports SC6 for expreess only.
  • ruff check src/ tests/ clean.

Signed off per DCO. Refs #103.

…etection

Absolute edit-distance <= 2 produces false positives on short, legitimate
package names: "task" is a real package and is edit-distance 2 from "flask",
yet was flagged as a possible typosquat (one of the cases reported in NVIDIA#103).

Add a relative-distance guard (dist/shorter_len <= 1/3) so short names need an
all-but-one-character match while longer names may still differ by two
characters. Existing behaviour is preserved:
  - "reqeusts" -> "requests" (len 8, dist 2) still flagged
  - "expreess" -> "express"  (dist 1)         still flagged
  - "task"     -> "flask"    (len 4, dist 2)  no longer flagged

Adds a regression test; all existing SC4/SC5/SC6 unit tests still pass and
ruff is clean.

Refs NVIDIA#103

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yonatan Gross <yonatan2gross@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant