Skip to content

fix(catalogs): validate extension and preset catalog payload shape#2621

Open
Quratulain-bilal wants to merge 2 commits into
github:mainfrom
Quratulain-bilal:fix/catalog-fetch-validates-payload-type
Open

fix(catalogs): validate extension and preset catalog payload shape#2621
Quratulain-bilal wants to merge 2 commits into
github:mainfrom
Quratulain-bilal:fix/catalog-fetch-validates-payload-type

Conversation

@Quratulain-bilal
Copy link
Copy Markdown
Contributor

What

ExtensionCatalog._fetch_single_catalog and PresetCatalog._fetch_single_catalog only check that the extensions /
presets key is present in the parsed catalog JSON. They don't check that the value is a JSON object, and they don't
check that the root is a JSON object at all. A malformed (or compromised) upstream catalog returning:

{"schema_version": "1.0", "extensions": []}

passes both the "extensions" not in catalog_data check and the JSON parse, gets cached on disk, and then crashes deep
inside _get_merged_extensions (resp. _get_merged_packs) with:

AttributeError: 'list' object has no attribute 'items'

instead of the existing user-facing ExtensionError("Invalid catalog format from <url>") / PresetError("Invalid preset catalog format") that the surrounding code is clearly trying to produce.

Reproducer

from specify_cli.extensions import ExtensionCatalog
# point an ExtensionCatalog at a URL that returns:
#   {"schema_version": "1.0", "extensions": []}
catalog._get_merged_extensions(force_refresh=True)
# AttributeError: 'list' object has no attribute 'items'

Why this matters

  • The error surfaces as a raw Python traceback (AttributeError) instead of the existing ExtensionError /
    PresetError. Users see a Python stack trace and assume the CLI itself is broken, not their catalog.
  • A multi-catalog stack (SPECKIT_CATALOG_URL env var, project-level extension-catalogs.yml) can include
    community-hosted catalogs. A single misedited entry on any catalog in the stack crashes the whole resolve path.
  • The sibling integration-catalog reader at src/specify_cli/integrations/catalog.py:175,186 already validates both the
    root object and the nested mapping — the three catalog fetchers had drifted out of sync.

The change

  • src/specify_cli/extensions.py — add isinstance(catalog_data, dict) and isinstance(catalog_data.get("extensions"), dict) guards in _fetch_single_catalog before iteration, raising the existing ExtensionError("Invalid catalog format from ...") with a more specific suffix when the type is wrong.
  • src/specify_cli/presets.py — same guards in _fetch_single_catalog, raising PresetError.
  • Both call sites mirror the existing integrations/catalog.py pattern exactly, with a short comment explaining the
    precedent so the rationale is visible to the next reader.

Tests

Parametrized regression tests in both tests/test_extensions.py::TestExtensionCatalog and
tests/test_presets.py::TestPresetCatalog cover:

  • root payload is not a JSON object: [], "oops", 42, null
  • root is a dict but extensions / presets value is the wrong type: [], "oops", null, 42

All 8 bad-payload shapes per file (16 total) now raise the expected catalog error.

$ python -m pytest tests/test_extensions.py::TestExtensionCatalog tests/test_extensions.py::TestCatalogStack
tests/test_presets.py::TestPresetCatalog -q
83 passed in 4.63s
$ python -m ruff check src/
All checks passed!

All checks passed!


## Scope

Intentionally narrow: no behaviour change for any valid catalog (real `dict` payloads continue to work). Only malformed
payloads that previously crashed with `AttributeError` now raise the existing `ExtensionError` / `PresetError` that the
surrounding code is already designed for.

`ExtensionCatalog._fetch_single_catalog` and
`PresetCatalog._fetch_single_catalog` only check that the `extensions` /
`presets` key is *present* in the parsed catalog JSON. They don't check
that the value is a JSON object, and they don't check that the root is
a JSON object at all. A malformed (or compromised) upstream catalog
returning:

    {"schema_version": "1.0", "extensions": []}

passes both `"extensions" not in catalog_data` and the subsequent
`response.read()` JSON parse, gets cached on disk, and then crashes
deep inside `_get_merged_extensions` (resp. `_get_merged_packs`) with:

    AttributeError: 'list' object has no attribute 'items'

instead of the existing user-facing
`ExtensionError("Invalid catalog format from <url>")` /
`PresetError("Invalid preset catalog format")` that the surrounding
code is clearly trying to produce.

The sibling integration-catalog reader already validates this — see
`src/specify_cli/integrations/catalog.py` where the fetch path
explicitly checks both `isinstance(catalog_data, dict)` and
`isinstance(catalog_data.get("integrations"), dict)` before returning.
This change mirrors that pattern in the extension and preset readers so
the three catalog fetchers stay consistent and a malformed upstream
surfaces as the user-facing error instead of a raw Python traceback.

Adds parametrized regression tests covering:
- root payload is not a JSON object (list, str, int, null)
- root is a dict but `extensions` / `presets` value is the wrong type
  (list, str, null, int)

All eight bad-payload shapes now raise the expected catalog error.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the extension and preset catalog fetchers so malformed upstream JSON payloads fail fast with the intended user-facing ExtensionError / PresetError instead of crashing later with raw Python exceptions during merge/iteration.

Changes:

  • Add root-object and nested extensions / presets type validation in _fetch_single_catalog() for extensions and presets.
  • Add regression tests covering non-dict roots and wrong-type extensions / presets values to ensure the expected catalog errors are raised.
  • Align error behavior more closely with the existing integration catalog reader’s payload-shape validation approach.
Show a summary per file
File Description
src/specify_cli/extensions.py Adds JSON payload shape validation for extension catalogs before caching/merging.
src/specify_cli/presets.py Adds JSON payload shape validation for preset catalogs before caching/merging.
tests/test_extensions.py Adds parametrized regression tests for malformed extension catalog payload shapes.
tests/test_presets.py Adds parametrized regression tests for malformed preset catalog payload shapes.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

raise ExtensionError(
f"Invalid catalog format from {entry.url}: "
"'extensions' must be a JSON object"
)
Comment on lines +2048 to +2072
# Validate payload shape before iteration. Checking only key
# presence would let a payload like ``{"presets": []}`` or
# ``{"presets": null}`` slip through here and then crash with
# ``AttributeError: 'list' object has no attribute 'items'`` deep
# inside ``_get_merged_packs``. The sibling integration catalog
# reader already guards both the root object and the nested
# mapping (see ``integrations/catalog.py``); the preset catalog
# must stay consistent so a malformed upstream surfaces as the
# user-facing ``Invalid preset catalog format`` error instead of
# a raw Python traceback.
if not isinstance(catalog_data, dict):
raise PresetError(
f"Invalid preset catalog format from {entry.url}: "
"expected a JSON object"
)
if (
"schema_version" not in catalog_data
or "presets" not in catalog_data
):
raise PresetError("Invalid preset catalog format")
if not isinstance(catalog_data.get("presets"), dict):
raise PresetError(
f"Invalid preset catalog format from {entry.url}: "
"'presets' must be a JSON object"
)
…erge

Addresses Copilot review feedback on this PR.

`_fetch_single_catalog` now validates that the ``extensions`` / ``presets``
value is a mapping, but it doesn't (and shouldn't) validate every entry
inside that mapping. A payload like:

    {"schema_version": "1.0", "extensions": {"good": {...}, "bad": []}}

passes the fetch-level guard, then later crashes inside
``_get_merged_extensions`` (resp. ``_get_merged_packs``) at
``{**ext_data, ...}`` with ``TypeError: 'list' object is not a mapping``.

The sibling integration-catalog reader at
``src/specify_cli/integrations/catalog.py:245`` handles this with a
per-entry ``isinstance(integ_data, dict)`` skip during merge, so one
malformed entry doesn't poison an otherwise valid catalog. This change
mirrors that pattern in the extension and preset mergers and adds
regression tests asserting that valid entries continue to merge while
malformed siblings are silently dropped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants