-
Notifications
You must be signed in to change notification settings - Fork 3
feat: add spcs_pat for Snowflake SPCS gateway auth (DM-3656) #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,13 @@ Authentication is performed on the first request if ``authenticate()`` is not ca | |
| and is automatically retried once on a 401 response. | ||
| ``client.healthcheck()`` is available as a lightweight readiness probe that does not consume credentials. | ||
|
|
||
| For a DataMasque instance hosted behind Snowflake SPCS (Snowpark Container Services) app ingress | ||
| (a ``*.snowflakecomputing.app`` ``base_url``), | ||
| pass a Snowflake Programmatic Access Token as ``spcs_pat`` on ``DataMasqueInstanceConfig``; | ||
| the client sends it on the ``X-SF-SPCS-Authorization`` header to clear the Snowflake gateway, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. more implementation detail that doesn't need to be in the README, if even present at all |
||
| which strips it before forwarding so your DataMasque auth is unaffected. | ||
| See the `usage docs <https://datamasque-python.readthedocs.io/en/latest/usage.html>`_ for details. | ||
|
|
||
| Error handling | ||
| ============== | ||
|
|
||
|
|
@@ -60,6 +67,9 @@ All methods raise subclasses of ``DataMasqueException`` on failure: | |
| raised by ``start_masking_run`` when the server rejects the run. | ||
| - ``DataMasqueUserError`` — | ||
| raised by user-management methods when the input is invalid. | ||
| - ``SpcsGatewayAuthError`` — | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not to say the snowflake stuff isn't valuable, but is this "worthy" (mainly, frequently-used) enough of being included in top level README? |
||
| raised when a Snowflake SPCS app gateway rejects the configured ``spcs_pat`` | ||
| before the request reaches DataMasque. | ||
|
|
||
| Documentation | ||
| ============= | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,6 +84,22 @@ class IfmAuthError(DataMasqueIfmError): | |
| """Raised when the IFM client cannot obtain or refresh a JWT (e.g. invalid credentials, missing scope).""" | ||
|
|
||
|
|
||
| class SpcsGatewayAuthError(DataMasqueException): | ||
| """ | ||
| Raised when a Snowflake SPCS app gateway rejects the configured ``spcs_pat``. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sembr. Good comment explaining why this doesn't inherit from DataMasqueApiError. |
||
|
|
||
| Only relevant when the client is configured with ``spcs_pat`` for an | ||
| instance behind Snowflake SPCS app ingress. The message includes the | ||
| Snowflake-provided detail, request id, and a hint at the likely cause | ||
| (e.g. an expired PAT or a network policy that excludes your IP). | ||
|
|
||
| Deliberately a direct subclass of `DataMasqueException` rather than | ||
| `DataMasqueApiError`: the client's 401 re-authenticate-and-retry path keys | ||
| off `DataMasqueApiError`/HTTP status, so keeping this outside that subtree | ||
| ensures a gateway rejection aborts immediately instead of looping. | ||
| """ | ||
|
|
||
|
|
||
| class RunNotCancellableError(DataMasqueUserError): | ||
| """ | ||
| Raised when `cancel_run` is called against a run that is no longer eligible for cancellation. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,14 @@ class DataMasqueInstanceConfig(BaseModel): | |
| the client prepends it with `Token ` when sending the `Authorization` header. | ||
| The client calls `token_source` on each authentication attempt, | ||
| so the callable is free to fetch and refresh tokens out-of-band (e.g. from a secrets manager). | ||
|
|
||
| `spcs_pat` is an optional Snowflake Programmatic Access Token for reaching a | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tell your claude to look at CONTRIBUTING.rst, where you'll see guidelines for semantic breaking. Do we need to talk about the implementation details? |
||
| DataMasque instance hosted behind Snowflake SPCS (Snowpark Container Services) | ||
| app ingress, where `base_url` ends in `.snowflakecomputing.app`. It is sent on | ||
| every request via the `X-SF-SPCS-Authorization` header to clear the Snowflake | ||
| gateway, which strips it before forwarding — so it is independent of, and | ||
| layers underneath, whichever DataMasque auth method (`password` or | ||
| `token_source`) you choose. | ||
| """ | ||
|
|
||
| model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
|
@@ -29,6 +37,16 @@ class DataMasqueInstanceConfig(BaseModel): | |
| password: Optional[str] = None | ||
| verify_ssl: bool = True | ||
| token_source: Optional[Callable[[], str]] = None | ||
| spcs_pat: Optional[str] = None | ||
| """Snowflake Programmatic Access Token for a DataMasque instance hosted behind | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. D213 and sembr. Ruff should have yelled at you. Same thoughts re implementation details may be unnecessary.
|
||
| Snowflake SPCS app ingress (a ``*.snowflakecomputing.app`` ``base_url``). | ||
|
|
||
| Mint the PAT in Snowsight (User profile → Programmatic access tokens) for an | ||
| account that can reach the SPCS app. The client sends it on the | ||
| ``X-SF-SPCS-Authorization`` header so the Snowflake gateway lets the request | ||
| through to DataMasque; the gateway strips the header before forwarding, leaving | ||
| DataMasque's own ``Authorization`` flow untouched. Leave it unset for | ||
| instances that are not behind an SPCS gateway.""" | ||
|
|
||
| @model_validator(mode="after") | ||
| def _validate_auth_source(self) -> "DataMasqueInstanceConfig": | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| """ | ||
| Snowflake SPCS app gateway authentication for :class:`DataMasqueClient`. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guess what two things I'm going to say again. |
||
|
|
||
| When a DataMasque instance is hosted behind Snowflake SPCS (Snowpark Container | ||
| Services) app ingress (``*.snowflakecomputing.app``), every request must first | ||
| clear the Snowflake gateway. We authenticate to the gateway with a Programmatic | ||
| Access Token (PAT) sent on ``X-SF-SPCS-Authorization: Snowflake Token="<PAT>"``. | ||
| The gateway accepts the PAT on this alternate header and strips it before | ||
| forwarding to the container, so DataMasque's own ``Authorization: Token <key>`` | ||
| flow rides through untouched. | ||
|
|
||
| :func:`install_spcs_gateway_auth` attaches this behaviour to a client's | ||
| ``requests.Session``: it sets the header on the session (so it is sent on every | ||
| request, including the unauthenticated login) and registers a response hook that | ||
| turns a gateway-originated rejection into a clear :class:`SpcsGatewayAuthError`. | ||
| """ | ||
|
|
||
| import re | ||
| from typing import Any, Optional | ||
|
|
||
| import requests | ||
|
|
||
| from datamasque.client.exceptions import SpcsGatewayAuthError | ||
|
|
||
| SPCS_GATEWAY_AUTH_HEADER = "X-SF-SPCS-Authorization" | ||
|
|
||
| # Body-shape discriminators for SPCS gateway error responses. | ||
| # The gateway emits JSON with `responseType` (ERROR_<UPPER_SNAKE>), `requestId` | ||
| # (canonical UUID), and `detail` (free text). All three must be present and | ||
| # match these patterns for the body to count as gateway-originated. | ||
| _GATEWAY_RESPONSE_TYPE_RE = re.compile(r"^ERROR_[A-Z][A-Z0-9_]+$") | ||
| _UUID_RE = re.compile( | ||
| r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", | ||
| re.IGNORECASE, | ||
| ) | ||
|
|
||
| # Header-shape discriminators for "this response transited a Snowflake SPCS | ||
| # gateway". The Server header and the `sfc-ss-` cookie name prefix both appear | ||
| # on every gateway-handled response (success and error alike) and aren't | ||
| # plausible to spoof by accident. | ||
| _SPCS_GATEWAY_SERVER_VALUE = "_" | ||
| _SPCS_COOKIE_PREFIX = "sfc-ss-" | ||
|
|
||
|
|
||
| def _has_spcs_gateway_header_signature(response: requests.Response) -> bool: | ||
| """ | ||
| True if at least one header-level Snowflake gateway marker is present. | ||
|
|
||
| Looks for either ``Server: _`` (the gateway's literal Server header value) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sembr again... and the comment I'm not going to comment any more individual instances, have your claude fix them all up please |
||
| or any ``Set-Cookie`` carrying the ``sfc-ss-`` cookie name prefix. Either is | ||
| sufficient — both indicate the response was emitted by, or transited, | ||
| Snowflake's SPCS ingress. | ||
| """ | ||
| if response.headers.get("server", "").strip() == _SPCS_GATEWAY_SERVER_VALUE: | ||
| return True | ||
| # `Set-Cookie` may appear multiple times; `requests` flattens duplicates | ||
| # via a comma-separated value in `.headers`, but our prefix substring | ||
| # check is order- and count-insensitive. | ||
| if _SPCS_COOKIE_PREFIX in response.headers.get("set-cookie", ""): | ||
| return True | ||
| return False | ||
|
|
||
|
|
||
| def _is_spcs_gateway_error_body(response: requests.Response) -> Optional[dict]: | ||
| """ | ||
| Return the parsed body iff it is a structurally-valid gateway error. | ||
|
|
||
| All four conditions must hold: | ||
| 1. The body parses as JSON and is a dict. | ||
| 2. Keys ``responseType``, ``requestId``, ``detail`` are all present and string-typed. | ||
| 3. ``responseType`` matches ``^ERROR_<UPPER_SNAKE>$``. | ||
| 4. ``requestId`` is a canonical 8-4-4-4-12 UUID. | ||
|
|
||
| Returns the parsed dict (truthy) on match, ``None`` on miss. | ||
| """ | ||
| try: | ||
| data = response.json() | ||
| except ValueError: | ||
| return None | ||
| if not isinstance(data, dict): | ||
| return None | ||
| response_type = data.get("responseType") | ||
| request_id = data.get("requestId") | ||
| detail = data.get("detail") | ||
| if not (isinstance(response_type, str) and isinstance(request_id, str) and isinstance(detail, str)): | ||
| return None | ||
| if not _GATEWAY_RESPONSE_TYPE_RE.match(response_type): | ||
| return None | ||
| if not _UUID_RE.match(request_id): | ||
| return None | ||
|
Comment on lines
+85
to
+90
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe combine into a single |
||
| return data | ||
|
|
||
|
|
||
| def _hint_for_gateway_detail(detail: str) -> str: | ||
| """Map common Snowflake gateway ``detail`` strings to a one-line cause hint.""" | ||
| d = (detail or "").lower() | ||
| if "network policy" in d: | ||
| return ( | ||
| "PAT requires a network policy attached to the user (or account) " | ||
| "that permits your current public IP. Run `CREATE NETWORK POLICY " | ||
| "... ALLOWED_IP_LIST = ('<your.ip>/32')` and `ALTER USER <pat-user> " | ||
| "SET NETWORK_POLICY = <policy>`." | ||
| ) | ||
| if "invalid" in d and "token" in d: | ||
| return ( | ||
| "PAT is malformed, expired, or revoked. Re-mint a PAT in Snowsight " | ||
| "(User profile -> Programmatic access tokens) and update `spcs_pat`." | ||
| ) | ||
| if "expired" in d: | ||
| return "PAT has expired. Mint a fresh one in Snowsight and update `spcs_pat`." | ||
| if "authentication" in d or "unauthorized" in d: | ||
| return ( | ||
| "Generic auth rejection. Verify the PAT was minted by a user that " | ||
| "has access to this SPCS app, and that any account-level network " | ||
| "policy includes your current public IP." | ||
| ) | ||
| return "Unknown gateway rejection — see the Snowflake `detail` string above and the Snowflake PAT docs." | ||
|
|
||
|
|
||
| def _check_spcs_gateway_response(response: requests.Response) -> None: | ||
| """ | ||
| Raise :class:`SpcsGatewayAuthError` iff ``response`` is a gateway-originated rejection. | ||
|
|
||
| Two-layer discriminator — both must hold: | ||
| * **Body originated at the gateway**: strict shape match on the JSON body | ||
| (multiple fields, typed, with format constraints) via | ||
| :func:`_is_spcs_gateway_error_body`. | ||
| * **Response transited an SPCS gateway**: header signature confirms via | ||
| :func:`_has_spcs_gateway_header_signature`. | ||
|
|
||
| Either layer alone could in principle false-positive on an unrelated | ||
| upstream that happened to emit one of those signals; the conjunction is what | ||
| makes the check robust. Legitimate DataMasque 401s (DRF ``{"detail": "..."}``) | ||
| have the gateway header signature but fail the body shape — so they correctly | ||
| flow through to the client's normal re-auth-and-retry path untouched. | ||
| """ | ||
| if response.status_code not in (401, 403): | ||
| return | ||
| if not _has_spcs_gateway_header_signature(response): | ||
| return | ||
| data = _is_spcs_gateway_error_body(response) | ||
| if data is None: | ||
| return | ||
|
|
||
| response_type = data["responseType"] | ||
| request_id = data["requestId"] | ||
| detail = data["detail"] | ||
| hint = _hint_for_gateway_detail(detail) | ||
| raise SpcsGatewayAuthError( | ||
| f"SPCS gateway rejected the PAT (HTTP {response.status_code}, " | ||
| f"{response_type}). The request never reached DataMasque.\n" | ||
| f" Snowflake said: {detail!r}\n" | ||
| f" Snowflake reqId: {request_id}\n" | ||
| f" Likely cause: {hint}\n" | ||
| f" Fix in Snowsight on the account hosting this SPCS app, then retry." | ||
| ) | ||
|
|
||
|
|
||
| def _spcs_gateway_response_hook(response: requests.Response, *args: Any, **kwargs: Any) -> None: | ||
| """``requests`` response hook: raise on a gateway-originated auth rejection.""" | ||
| _check_spcs_gateway_response(response) | ||
|
|
||
|
|
||
| def install_spcs_gateway_auth(session: requests.Session, pat: str) -> None: | ||
| """ | ||
| Configure ``session`` to authenticate to a Snowflake SPCS app gateway. | ||
|
|
||
| Sets the ``X-SF-SPCS-Authorization`` header on the session (so it rides on | ||
| every request, including the unauthenticated login) and registers a response | ||
| hook that raises :class:`SpcsGatewayAuthError` on a gateway rejection. | ||
|
|
||
| Scoping is automatic: the client's session only ever talks to its own | ||
| ``base_url``, so there is no need to match per-request hosts. | ||
| """ | ||
| session.headers[SPCS_GATEWAY_AUTH_HEADER] = f'Snowflake Token="{pat}"' | ||
| session.hooks["response"].append(_spcs_gateway_response_hook) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,3 +19,31 @@ To use DataMasque Python in a project: | |
|
|
||
| for connection in client.list_connections(): | ||
| print(connection.name) | ||
|
|
||
| Connecting to an SPCS-hosted instance | ||
| ===================================== | ||
|
|
||
| When DataMasque is hosted behind Snowflake SPCS (Snowpark Container Services) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, sembr applies to docs as well |
||
| app ingress, its ``base_url`` ends in ``.snowflakecomputing.app`` and every | ||
| request must first clear the Snowflake gateway. Pass a Snowflake Programmatic | ||
| Access Token (PAT) as ``spcs_pat`` and the client sends it on the | ||
| ``X-SF-SPCS-Authorization`` header automatically; the gateway strips that header | ||
| before forwarding, so your DataMasque ``username``/``password`` (or | ||
| ``token_source``) auth is unaffected. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| config = DataMasqueInstanceConfig( | ||
| base_url="https://my-app.snowflakecomputing.app", | ||
| username="api_user", | ||
| password="api_password", | ||
| spcs_pat="<snowflake-programmatic-access-token>", | ||
| ) | ||
| client = DataMasqueClient(config) | ||
| client.authenticate() | ||
|
|
||
| Mint the PAT in Snowsight (User profile → Programmatic access tokens) for an | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/Mint/Create/ |
||
| account that can reach the SPCS app. If the gateway rejects the PAT (for example | ||
| it has expired, or a network policy excludes your IP), the client raises | ||
| ``SpcsGatewayAuthError`` with the Snowflake-provided detail and a hint at the | ||
| likely cause. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| [project] | ||
| name = "datamasque-python" | ||
| version = "1.1.1" | ||
| version = "1.2.0" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
| description = "Official Python client for the DataMasque data-masking API." | ||
| authors = [ | ||
| { name = "DataMasque Ltd" }, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow existing style - nowhere near as verbose. And, you guessed it, sembr.
Could also structure as a "Added support for DataMasque deployments on Snowpark Container Services (SPCS)" heading with the three bullets nested below it.