Skip to content

feat(waterdata): add get_cql for generalized CQL2 queries#284

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:worktree-get-waterdata-cql
May 30, 2026
Merged

feat(waterdata): add get_cql for generalized CQL2 queries#284
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:worktree-get-waterdata-cql

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 19, 2026

Summary

Adds get_cql(service, cql, ...) — a single public entry point for querying any Water Data OGC API collection with an arbitrary CQL2 filter, for predicates the typed getters (get_daily, get_continuous, …) can't express: a top-level or, like with % wildcards, comparison operators (</>/between), nested boolean trees, and geometry predicates beyond a bounding box.

The CQL2 body (a Python dict or a pre-serialized JSON str) is POSTed verbatim against the collection; the result is shaped exactly like the typed getters — wire id renamed to the service's id column, columns ordered/sorted, dtypes coerced — and returned as (DataFrame, BaseMetadata).

Like get_stats_data (and unlike the chunked typed getters), this is a single request: the CQL body is opaque, so there's nothing to chunk; server-side CQL errors surface as the module's standard typed errors. CQL2 grammar: https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/

API

def get_cql(
    service: WATERDATA_SERVICES,
    cql: str | dict,
    *,
    properties: str | Iterable[str] | None = None,
    bbox: list[float] | None = None,
    limit: int | None = None,
    skip_geometry: bool | None = None,
    convert_type: bool = True,
) -> tuple[pd.DataFrame, BaseMetadata]:
from dataretrieval import waterdata

# Monitoring locations whose HUC starts with "02070010" — LIKE with a % wildcard.
df, md = waterdata.get_cql(
    service="monitoring-locations",
    cql={"op": "like", "args": [{"property": "hydrologic_unit_code"}, "02070010%"]},
)

service is validated against WATERDATA_SERVICES (the 11 OGC time-series collections); properties honors the same "id" → service id-column rewrite as the typed getters.

Implementation

Reuses existing module machinery rather than duplicating it:

  • _construct_cql_request (utils): POST/CQL2 builder for a verbatim body; shares the skipGeometry/limit/bbox/properties URL block with _construct_api_requests via the extracted _ogc_query_params.
  • Fetch goes through the same non-chunked anyio-portal path as get_stats_data, extracted as _run_sync; result shaping reuses the existing _finalize_ogc hook.
  • _OUTPUT_ID_BY_SERVICE / WATERDATA_SERVICES: the 11 OGC collections and their id columns, kept in sync (guarded by a test).

Also in this PR: adopt _OUTPUT_ID_BY_SERVICE across the typed getters

The 11 typed getters each hardcoded their output_id (e.g. daily"daily_id") and passed it to get_ogc_data, duplicating the service→id map. get_ogc_data now derives output_id from service via the map (single source of truth); output_id becomes an optional override that get_reference_table (metadata collections, not in the map) still passes. Drops the 11 hardcoded assignments.

Tests

  • Unit: _construct_cql_request builds the right POST/URL/verbatim body; skip_geometry=None omits the param; an unknown service raises; the WATERDATA_SERVICES_OUTPUT_ID_BY_SERVICE sync invariant.
  • Live: compound AND-of-INs, str/dict body equivalence, "id"output_id translation, LIKE wildcard — plus all typed getters re-verified after the refactor.

ruff clean; verified against the live API.

Addresses #198

🤖 Generated with Claude Code

thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request May 19, 2026
Code-review pass on PR DOI-USGS#284.

- Lift ``WATERDATA_SERVICES`` Literal into ``types.py``. Use it as
  the ``service`` arg type of ``get_waterdata`` so editors offer
  completion and type-checkers catch typos. The runtime source of
  truth (``_OUTPUT_ID_BY_SERVICE`` in utils.py) is unchanged; the
  Literal is kept in sync by hand and a comment notes that.

- Extract ``_ogc_query_params(properties, bbox, limit, skip_geometry)``
  in utils.py. The same ``skipGeometry``/``limit``/``bbox``/``properties``
  block previously appeared twice — once in ``_construct_api_requests``
  and once in the new ``_construct_cql_request`` — and is now built
  in one place.

- Extract ``_finalize_ogc_frame(df, response, properties, service,
  output_id, convert_type)`` for the post-processing tail
  (``_deal_with_empty`` -> ``_type_cols`` -> ``_arrange_cols`` ->
  ``_sort_rows`` -> ``BaseMetadata``). Both ``get_ogc_data`` and
  ``get_waterdata`` route through it now, so the typed-kwargs and
  raw-CQL2 paths produce identically-shaped DataFrames by
  construction rather than by parallel maintenance.

- Drop the ``client`` kwarg from ``get_waterdata``. None of the
  other public ``get_*`` getters expose it, and the rationale (HTTP
  session reuse) applies to all of them or none. If we want to
  expose session reuse, that's a separate PR that touches the whole
  family.

- Collapse the ``properties`` normalization block to None-first
  ordering so the common case (no properties) reads first.

- Drop the docstring breadcrumb to ``utils._OUTPUT_ID_BY_SERVICE``;
  point readers at ``types.WATERDATA_SERVICES`` (the user-facing
  Literal) instead.

All 148 unit tests pass; ``_construct_api_requests`` and
``_construct_cql_request`` produce byte-identical requests to before.
@thodson-usgs
Copy link
Copy Markdown
Collaborator Author

do we need to shield this against string comparisons as we do in filters.py?

@thodson-usgs thodson-usgs added the enhancement New feature or request label May 25, 2026
@thodson-usgs thodson-usgs force-pushed the worktree-get-waterdata-cql branch 5 times, most recently from e778508 to a795fa7 Compare May 30, 2026 17:39
@thodson-usgs thodson-usgs changed the title feat(waterdata): add get_waterdata for generalized CQL2 queries feat(waterdata): add get_cql for generalized CQL2 queries May 30, 2026
Adds get_cql(service, cql, ...), a single public entry point for querying
any Water Data OGC API collection with an arbitrary CQL2 filter — for
predicates the typed getters (get_daily, get_continuous, …) can't express:
a top-level or, like with % wildcards, comparison operators, nested boolean
trees, and geometry predicates beyond a bounding box. The CQL2 body (str or
dict) is POSTed verbatim; the result is shaped like the typed getters (wire
id renamed, columns ordered/sorted, dtypes coerced) and returned as
(DataFrame, BaseMetadata).

Like get_stats_data, it's a single request (the CQL body is opaque, so
nothing to chunk); server-side CQL errors surface as the module's standard
typed errors. Reuses existing machinery rather than duplicating it:
_construct_cql_request shares the skipGeometry/limit/bbox/properties URL
block with _construct_api_requests via _ogc_query_params; the non-chunked
anyio-portal fetch path (_run_sync) is shared with get_stats_data; result
shaping goes through the existing _finalize_ogc hook. WATERDATA_SERVICES
enumerates the 11 OGC collections, kept in sync with _OUTPUT_ID_BY_SERVICE
(guarded by a test).

Also adopts _OUTPUT_ID_BY_SERVICE across the typed getters: get_ogc_data
derives output_id from service via the map (single source of truth) instead
of each getter hardcoding it; output_id becomes an optional override that
get_reference_table still passes for its metadata collections.

Tests: unit (request construction, skip_geometry omission, service
validation, the WATERDATA_SERVICES/_OUTPUT_ID_BY_SERVICE sync invariant) +
live (compound AND/IN, str/dict equivalence, id translation, LIKE wildcard);
all typed getters re-verified against the live API after the refactor.

Addresses DOI-USGS#198

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs marked this pull request as ready for review May 30, 2026 17:57
@thodson-usgs thodson-usgs force-pushed the worktree-get-waterdata-cql branch from a795fa7 to c0be417 Compare May 30, 2026 17:57
@thodson-usgs thodson-usgs merged commit ee653e5 into DOI-USGS:main May 30, 2026
8 checks passed
@thodson-usgs thodson-usgs deleted the worktree-get-waterdata-cql branch May 30, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant