feat(waterdata): add get_cql for generalized CQL2 queries#284
Merged
thodson-usgs merged 1 commit intoMay 30, 2026
Merged
Conversation
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
May 19, 2026
Code-review pass on PR DOI-USGS#284. - Lift ``WATERDATA_SERVICES`` Literal into ``types.py``. Use it as the ``service`` arg type of ``get_waterdata`` so editors offer completion and type-checkers catch typos. The runtime source of truth (``_OUTPUT_ID_BY_SERVICE`` in utils.py) is unchanged; the Literal is kept in sync by hand and a comment notes that. - Extract ``_ogc_query_params(properties, bbox, limit, skip_geometry)`` in utils.py. The same ``skipGeometry``/``limit``/``bbox``/``properties`` block previously appeared twice — once in ``_construct_api_requests`` and once in the new ``_construct_cql_request`` — and is now built in one place. - Extract ``_finalize_ogc_frame(df, response, properties, service, output_id, convert_type)`` for the post-processing tail (``_deal_with_empty`` -> ``_type_cols`` -> ``_arrange_cols`` -> ``_sort_rows`` -> ``BaseMetadata``). Both ``get_ogc_data`` and ``get_waterdata`` route through it now, so the typed-kwargs and raw-CQL2 paths produce identically-shaped DataFrames by construction rather than by parallel maintenance. - Drop the ``client`` kwarg from ``get_waterdata``. None of the other public ``get_*`` getters expose it, and the rationale (HTTP session reuse) applies to all of them or none. If we want to expose session reuse, that's a separate PR that touches the whole family. - Collapse the ``properties`` normalization block to None-first ordering so the common case (no properties) reads first. - Drop the docstring breadcrumb to ``utils._OUTPUT_ID_BY_SERVICE``; point readers at ``types.WATERDATA_SERVICES`` (the user-facing Literal) instead. All 148 unit tests pass; ``_construct_api_requests`` and ``_construct_cql_request`` produce byte-identical requests to before.
Collaborator
Author
|
do we need to shield this against string comparisons as we do in filters.py? |
e778508 to
a795fa7
Compare
Adds get_cql(service, cql, ...), a single public entry point for querying any Water Data OGC API collection with an arbitrary CQL2 filter — for predicates the typed getters (get_daily, get_continuous, …) can't express: a top-level or, like with % wildcards, comparison operators, nested boolean trees, and geometry predicates beyond a bounding box. The CQL2 body (str or dict) is POSTed verbatim; the result is shaped like the typed getters (wire id renamed, columns ordered/sorted, dtypes coerced) and returned as (DataFrame, BaseMetadata). Like get_stats_data, it's a single request (the CQL body is opaque, so nothing to chunk); server-side CQL errors surface as the module's standard typed errors. Reuses existing machinery rather than duplicating it: _construct_cql_request shares the skipGeometry/limit/bbox/properties URL block with _construct_api_requests via _ogc_query_params; the non-chunked anyio-portal fetch path (_run_sync) is shared with get_stats_data; result shaping goes through the existing _finalize_ogc hook. WATERDATA_SERVICES enumerates the 11 OGC collections, kept in sync with _OUTPUT_ID_BY_SERVICE (guarded by a test). Also adopts _OUTPUT_ID_BY_SERVICE across the typed getters: get_ogc_data derives output_id from service via the map (single source of truth) instead of each getter hardcoding it; output_id becomes an optional override that get_reference_table still passes for its metadata collections. Tests: unit (request construction, skip_geometry omission, service validation, the WATERDATA_SERVICES/_OUTPUT_ID_BY_SERVICE sync invariant) + live (compound AND/IN, str/dict equivalence, id translation, LIKE wildcard); all typed getters re-verified against the live API after the refactor. Addresses DOI-USGS#198 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a795fa7 to
c0be417
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
get_cql(service, cql, ...)— a single public entry point for querying any Water Data OGC API collection with an arbitrary CQL2 filter, for predicates the typed getters (get_daily,get_continuous, …) can't express: a top-levelor,likewith%wildcards, comparison operators (</>/between), nested boolean trees, and geometry predicates beyond a bounding box.The CQL2 body (a Python
dictor a pre-serialized JSONstr) is POSTed verbatim against the collection; the result is shaped exactly like the typed getters — wireidrenamed to the service's id column, columns ordered/sorted, dtypes coerced — and returned as(DataFrame, BaseMetadata).Like
get_stats_data(and unlike the chunked typed getters), this is a single request: the CQL body is opaque, so there's nothing to chunk; server-side CQL errors surface as the module's standard typed errors. CQL2 grammar: https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/API
serviceis validated againstWATERDATA_SERVICES(the 11 OGC time-series collections);propertieshonors the same"id"→ service id-column rewrite as the typed getters.Implementation
Reuses existing module machinery rather than duplicating it:
_construct_cql_request(utils): POST/CQL2 builder for a verbatim body; shares theskipGeometry/limit/bbox/propertiesURL block with_construct_api_requestsvia the extracted_ogc_query_params.anyio-portal path asget_stats_data, extracted as_run_sync; result shaping reuses the existing_finalize_ogchook._OUTPUT_ID_BY_SERVICE/WATERDATA_SERVICES: the 11 OGC collections and their id columns, kept in sync (guarded by a test).Also in this PR: adopt
_OUTPUT_ID_BY_SERVICEacross the typed gettersThe 11 typed getters each hardcoded their
output_id(e.g.daily→"daily_id") and passed it toget_ogc_data, duplicating the service→id map.get_ogc_datanow derivesoutput_idfromservicevia the map (single source of truth);output_idbecomes an optional override thatget_reference_table(metadata collections, not in the map) still passes. Drops the 11 hardcoded assignments.Tests
_construct_cql_requestbuilds the right POST/URL/verbatim body;skip_geometry=Noneomits the param; an unknown service raises; theWATERDATA_SERVICES↔_OUTPUT_ID_BY_SERVICEsync invariant.str/dictbody equivalence,"id"→output_idtranslation, LIKE wildcard — plus all typed getters re-verified after the refactor.ruffclean; verified against the live API.Addresses #198
🤖 Generated with Claude Code