From d415a48032f46fcb002cbbb52ed788a198b1de46 Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 18:55:57 +1000 Subject: [PATCH 1/6] feat(v2): EQL v3 reference section (CIP-3326) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Seven pages replacing the v2-era EQL reference, written against the eql_v3 branch of cipherstash/encrypt-query-language (3.0.0): - index: what EQL is, the v3 domain-variant model, install (single SQL script, idempotent), dbdev, Docker, migration/runtime permission split, managed-Postgres rationale - types: 10 scalar families ร— variants matrix; bool storage-only; _ord/_ord_ore twins; index terms per variant - operators: per-variant support matrix, typed-operand rule, no-LIKE, fail-loud blockers, query shapes, function-form equivalents - indexes: functional indexes on term extractors, engagement requirements, sort-key form for index-streamed ORDER BY, EXPLAIN checklist, large-table build guidance - json: ste_vec model, per-node-type terms (hm XOR oc), containment + GIN, field access, path queries, blocked native jsonb operators - functions: comparisons, extractors, min/max only (no SUM/AVG), version() - payload-format: v/i/c envelope (wire version still v:2), hm/ob/bf term keys, sv document shape, annotated examples (absorbs the legacy CipherCell page) Cross-page consistency verified against the shipped SQL: equality on _ord variants compares ORE terms (no hm in _ord payloads), and bare ORDER BY is correct but extractor-form sort keys stream from the index. Claude-Session: https://claude.ai/code/session_01ACPpFPHvKtrV48nbEYuv7P --- IA.md | 14 +- content/docs/reference/eql/functions.mdx | 112 +++++++++ content/docs/reference/eql/index.mdx | 135 ++++++++++- content/docs/reference/eql/indexes.mdx | 182 ++++++++++++++ content/docs/reference/eql/json.mdx | 228 ++++++++++++++++++ content/docs/reference/eql/meta.json | 9 +- content/docs/reference/eql/operators.mdx | 153 ++++++++++++ content/docs/reference/eql/payload-format.mdx | 123 ++++++++++ content/docs/reference/eql/types.mdx | 95 ++++++++ 9 files changed, 1040 insertions(+), 11 deletions(-) create mode 100644 content/docs/reference/eql/functions.mdx create mode 100644 content/docs/reference/eql/indexes.mdx create mode 100644 content/docs/reference/eql/json.mdx create mode 100644 content/docs/reference/eql/operators.mdx create mode 100644 content/docs/reference/eql/payload-format.mdx create mode 100644 content/docs/reference/eql/types.mdx diff --git a/IA.md b/IA.md index f9c6f9e..b51956c 100644 --- a/IA.md +++ b/IA.md @@ -131,13 +131,13 @@ live at `/docs/errors/` โ€” permanent, never restructured (CIP-3338). - [x] Section scaffold ๐Ÿšง (eql, stack, auth, cli, proxy, workspace) - **EQL (v3 rewrite โ€” CIP-3326):** -- [ ] `/reference/eql` โ€” overview + install (single SQL file, permissions split, dbdev, Docker) -- [ ] `/reference/eql/types` โ€” 10 scalar families ร— variants + `eql_v3.json` -- [ ] `/reference/eql/operators` โ€” per-variant matrix incl. what RAISES; typed-operand rule -- [ ] `/reference/eql/indexes` โ€” functional indexes on extractors; Supabase-compatible -- [ ] `/reference/eql/json` โ€” ste_vec, path queries -- [ ] `/reference/eql/functions` โ€” incl. aggregates (min/max only) -- [ ] `/reference/eql/payload-format` โ€” v/i/c envelope, hm/ob/bf (absorbs cipher-cell) +- [x] `/reference/eql` โ€” overview + install (single SQL file, permissions split, dbdev, Docker) +- [x] `/reference/eql/types` โ€” 10 scalar families ร— variants + `eql_v3.json` +- [x] `/reference/eql/operators` โ€” per-variant matrix incl. what RAISES; typed-operand rule +- [x] `/reference/eql/indexes` โ€” functional indexes on extractors; Supabase-compatible +- [x] `/reference/eql/json` โ€” ste_vec, path queries +- [x] `/reference/eql/functions` โ€” incl. aggregates (min/max only) +- [x] `/reference/eql/payload-format` โ€” v/i/c envelope, hm/ob/bf (absorbs cipher-cell) - **Stack SDK:** - [ ] `/reference/stack` โ€” client + configuration (port encryption/* pages) - [ ] `/reference/stack/schema` diff --git a/content/docs/reference/eql/functions.mdx b/content/docs/reference/eql/functions.mdx new file mode 100644 index 0000000..210ca31 --- /dev/null +++ b/content/docs/reference/eql/functions.mdx @@ -0,0 +1,112 @@ +--- +title: Functions +description: "The eql_v3 function surface: comparison functions, index-term extractors, MIN/MAX aggregates, JSON functions, and version reporting." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Everything EQL exposes lives in the `eql_v3` schema. Most functions are generated per [domain variant](/reference/eql/types), so PostgreSQL's overload resolution picks the right implementation from the argument type. As with operators, arguments must be typed โ€” see [the typed-operand rule](/reference/eql/operators). + +## Comparison functions + +Function forms of the comparison operators, for platforms that disallow custom operators. Each is generated per capable domain variant, with overloads accepting the domain on either side and `jsonb` on the other: + +```sql +eql_v3.eq(a, b) RETURNS boolean -- = on _eq / _ord / _ord_ore / text_search +eql_v3.neq(a, b) RETURNS boolean -- <> +eql_v3.lt(a, b) RETURNS boolean -- < on _ord / _ord_ore / text_search +eql_v3.lte(a, b) RETURNS boolean -- <= +eql_v3.gt(a, b) RETURNS boolean -- > +eql_v3.gte(a, b) RETURNS boolean -- >= +eql_v3.contains(a, b) RETURNS boolean -- @> on text_match / text_search / eql_v3.json +eql_v3.contained_by(a, b) RETURNS boolean -- <@ +``` + +```sql +SELECT * FROM users WHERE eql_v3.eq(email, $1::eql_v3.text_eq); +SELECT * FROM users WHERE eql_v3.lt(created_at, $1::eql_v3.timestamp_ord); +``` + +Calling a comparison function a variant doesn't support resolves to a blocker that raises `operator โ€ฆ is not supported` โ€” the same [fail-loud behavior](/reference/eql/operators) as the operators. There are no `like` / `ilike` functions: text matching is `eql_v3.contains` on a `text_match` value. + +## Index-term extractors + +These extract the encrypted index term from a domain value. They're generated per eq-, ord-, and match-capable variant of every scalar type, and they return the self-contained `eql_v3` index-term types: + +```sql +-- Equality term (hm) +eql_v3.eq_term(a eql_v3._eq) RETURNS eql_v3.hmac_256 + +-- Ordering term (ob) +eql_v3.ord_term(a eql_v3._ord) RETURNS eql_v3.ore_block_256 +eql_v3.ord_term(a eql_v3._ord_ore) RETURNS eql_v3.ore_block_256 + +-- Text-match term (bf) +eql_v3.match_term(a eql_v3.text_match) RETURNS eql_v3.bloom_filter +``` + +`eql_v3.text_search` carries all three terms, so all three extractors work on it. + +The extractors exist for **indexing**: EQL indexes through a functional index on the extractor, never an operator class on the column. The extractors are inlinable, so bare-form predicates (`WHERE email = $1`) engage the index without rewriting. Sort keys are the exception โ€” see [Range and ORDER BY](/reference/eql/indexes#range-and-order-by): + +```sql +CREATE INDEX users_email_eq ON users USING hash (eql_v3.eq_term(email)); +CREATE INDEX users_salary_ord ON users USING btree (eql_v3.ord_term(salary)); +CREATE INDEX users_name_match ON users USING gin (eql_v3.match_term(name)); +``` + +See [Indexes](/reference/eql/indexes) for the full recipes and performance guidance. + +## Aggregates: `eql_v3.min` and `eql_v3.max` + +`MIN` / `MAX` over encrypted values, defined per ord-capable variant of every scalar type. The input type selects the aggregate; the return type matches the input: + +```sql +eql_v3.min(eql_v3._ord) RETURNS eql_v3._ord +eql_v3.max(eql_v3._ord) RETURNS eql_v3._ord +eql_v3.min(eql_v3._ord_ore) RETURNS eql_v3._ord_ore +eql_v3.max(eql_v3._ord_ore) RETURNS eql_v3._ord_ore +``` + +Comparison routes through the variant's `<` / `>` operator, which uses the ORE block term โ€” no decryption happens in the database. `NULL` inputs are skipped, and an all-`NULL` input set returns `NULL`. + +```sql +SELECT eql_v3.min(salary) FROM users; +SELECT eql_v3.max(salary) FROM users WHERE department = 'engineering'; + +-- On a generic jsonb column, cast to the right domain at the call site +SELECT eql_v3.min(salary_jsonb::eql_v3.int8_ord) FROM users; +``` + + +**`SUM`, `AVG`, and other arithmetic aggregates are not supported** on encrypted columns โ€” they would require homomorphic encryption. `MIN` / `MAX` work because they only need comparison. For sums and averages, decrypt at the application boundary and aggregate client-side. + + +## JSON functions + +The encrypted-JSON document type `eql_v3.json` has its own function surface: + +- `eql_v3.jsonb_path_query(doc, selector)` โ€” set-returning path query yielding encrypted entries; also `jsonb_path_query_first` and `jsonb_path_exists` +- `eql_v3.jsonb_array_length` / `jsonb_array_elements` / `jsonb_array_elements_text` โ€” array helpers +- `eql_v3.to_ste_vec_query(doc)` โ€” builds the GIN-indexable containment query form +- Entry-level term extractors: `eql_v3.eq_term(eql_v3.ste_vec_entry)` and `eql_v3.ore_cllw(eql_v3.ste_vec_entry)` + +These are documented with worked examples in [JSON support](/reference/eql/json). + +## `eql_v3.version()` + +Returns the installed EQL version string, baked in at build time: + +```sql +SELECT eql_v3.version(); +-- '3.0.0' +``` + +The same version string is mirrored as a comment on the `eql_v3` schema, so you can read it without calling a function: + +```sql +SELECT obj_description('eql_v3'::regnamespace); +-- '3.0.0' +``` diff --git a/content/docs/reference/eql/index.mdx b/content/docs/reference/eql/index.mdx index 551c51e..68b95c2 100644 --- a/content/docs/reference/eql/index.mdx +++ b/content/docs/reference/eql/index.mdx @@ -1,8 +1,137 @@ --- title: EQL -description: "EQL documentation โ€” being built as part of the docs V2 overhaul." +description: "Encrypt Query Language (EQL) installs encrypted column types and operators into Postgres as plain SQL โ€” encryption itself happens in your client." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" --- -This section is being built as part of the docs V2 overhaul ([CIP-3307](https://linear.app/cipherstash/issue/CIP-3307)). Track progress in [IA.md](https://github.com/cipherstash/docs/blob/v2/IA.md). +Encrypt Query Language (EQL) is a set of types, operators, and functions for storing and querying encrypted data in PostgreSQL. It installs as a single plain-SQL script โ€” no extension packaging, no superuser, no operator classes โ€” so it runs on Supabase, RDS, Cloud SQL, and self-hosted Postgres alike. -Until it lands, current documentation lives in the [existing docs](/stack). +EQL itself never encrypts anything. Encryption and decryption happen in the client, using the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy). EQL provides the database-side surface those clients query against: encrypted column types, the operators that compare them, and the term-extractor functions that make indexes work. + +## The v3 model + +Every encrypted column is a `jsonb`-backed **domain type** in the `eql_v3` schema. The domain variant you choose declares the column's searchable capability: `eql_v3.text_eq` supports equality (`=` / `<>`), `eql_v3.text_match` supports encrypted text containment (`@>` / `<@`), `eql_v3.int4_ord` adds range comparisons, `ORDER BY`, and `MIN` / `MAX`. Each domain carries a `CHECK` constraint that validates the encrypted payload on insert, so a malformed or wrong-version value is rejected at write time rather than surfacing at query time. + +There is no database-side configuration table. Earlier EQL versions tracked encryption config in the database (`config_add_table`, `config_add_column`, and friends) โ€” those are gone in v3. The searchable surface of a column is fixed by the domain variant you type it as, and which index terms travel in a value's payload is decided by the encryption client. Operators that a variant doesn't support raise an "operator not supported" error rather than silently falling through to native `jsonb` semantics โ€” and `LIKE` / `ILIKE` are blocked on every encrypted column. + +```sql +CREATE TABLE users ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_eq, -- equality only + salary eql_v3.int4_ord, -- equality + range + ORDER BY + created_at eql_v3.timestamp_ord +); +``` + +## Install + + + + +### Download the install script + +Each [GitHub release](https://github.com/cipherstash/encrypt-query-language/releases) publishes a versioned `cipherstash-encrypt.sql`: + +```sh +curl -sLo cipherstash-encrypt.sql https://github.com/cipherstash/encrypt-query-language/releases/latest/download/cipherstash-encrypt.sql +``` + + + + +### Run it against each database + +```sh +psql -f cipherstash-encrypt.sql +``` + +The script installs the `eql_v3` schema with all domain types, operators, functions, and aggregates. It is idempotent: re-running it upgrades the `eql_v3` surface in place and won't remove anything you've built on top of it. To upgrade, download the latest script and run it again. + + + + +### Verify + +```sql +SELECT eql_v3.version(); +-- '3.0.0' +``` + + + + + +`DROP SCHEMA eql_v3 CASCADE` drops every column typed as an `eql_v3` domain. The domain types live in the schema, and your columns depend on them. + + +### dbdev + +EQL is also published to [dbdev](https://database.dev/cipherstash/eql). The dbdev release can lag behind GitHub releases, so prefer the install script when you need the latest version. + +### Docker for local development + +Run a Postgres image with EQL pre-installed: + +```sh +docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=postgres \ + ghcr.io/cipherstash/postgres-eql:17 +``` + +EQL installs automatically on first boot. Images are available for PostgreSQL 14โ€“17 (`:14` through `:17`), and you can pin a specific EQL version with a suffixed tag (for example `:17-3.0.0`). + +## Permissions + +Installing EQL and running queries against it need different privileges. A common production pattern splits them across two users. + +**Migration user** โ€” installs EQL and adds encrypted columns during migrations: + +```sql +GRANT CREATE ON DATABASE your_database TO your_migration_user; +GRANT CREATE ON SCHEMA public TO your_migration_user; +GRANT ALTER ON ALL TABLES IN SCHEMA public TO your_migration_user; +``` + +`CREATE ON DATABASE` creates the `eql_v3` schema and its types; `CREATE ON SCHEMA` and `ALTER` are needed to add encrypted columns (typed as `eql_v3` domains, with their `CHECK` constraints) to your tables. + +**Runtime user** โ€” the application's day-to-day access: + +```sql +-- EQL schema usage (resolves the encrypted operators / extractors) +GRANT USAGE ON SCHEMA eql_v3 TO your_app_user; +GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA eql_v3 TO your_app_user; + +-- User table access (normal application permissions) +GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE your_tables TO your_app_user; +``` + +Schema changes โ€” adding or removing encrypted columns โ€” always go through the migration user. + +## Managed Postgres and Supabase + +EQL v3 is designed to install without superuser. There are no custom operator classes (which managed platforms typically block), no `postgresql.conf` changes, and no separate Supabase build โ€” the single install script is the same artefact everywhere. Indexing works through ordinary functional indexes over EQL's term-extractor functions, which any user who can `CREATE INDEX` can build. See the [Supabase integration](/integrations/supabase) for platform-specific setup. + +## In this section + + + + The encrypted domain type families and the capability each variant carries. + + + Which SQL operators resolve on which variant, and what raises. + + + Functional-index recipes for equality, range, and text match. + + + Encrypted JSON documents: containment, field access, and GIN indexing. + + + The function equivalents of every operator, extractors, and aggregates. + + + The encrypted payload envelope and index terms. + + diff --git a/content/docs/reference/eql/indexes.mdx b/content/docs/reference/eql/indexes.mdx new file mode 100644 index 0000000..83f354e --- /dev/null +++ b/content/docs/reference/eql/indexes.mdx @@ -0,0 +1,182 @@ +--- +title: Indexes +description: "Create Postgres indexes on encrypted columns using functional indexes over EQL's term-extractor functions." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +EQL indexes are ordinary PostgreSQL functional indexes over **term-extractor functions** โ€” never an index or operator class on the column itself. Each extractor returns a small per-row index term whose return type already carries a default operator class: + +| Extractor | Index method | Term | Capability | +| --- | --- | --- | --- | +| `eql_v3.eq_term(col)` | `hash` (or `btree`) | `hm` (HMAC-256) | equality | +| `eql_v3.ord_term(col)` | `btree` | `ob` (ORE block) | range, `ORDER BY`, `MIN` / `MAX` | +| `eql_v3.match_term(col)` | `gin` | `bf` (bloom filter) | text containment | + +The extractors are inlinable SQL functions, so the planner rewrites a bare-form predicate into the same expression the index was built on. You don't rewrite queries to use the index: + +```sql +SELECT * FROM users WHERE email = $1::eql_v3.text_eq; +-- planner inlines `=` to: eql_v3.eq_term(email) = eql_v3.eq_term($1) +-- Index Cond on USING hash (eql_v3.eq_term(email)) +``` + + +EQL v3 deliberately ships no operator class for encrypted columns. Operators resolve against the domain's `jsonb` base type, so an opclass on the column would bypass the encrypted surface. Always index through the extractor. + + +## Index recipes + +Type the column as the domain variant that carries the term ([Types](/reference/eql/types)), then index the matching extractor: + +```sql +-- Equality: hash index on eq_term +-- (columns typed eql_v3._eq or text_search; equality on _ord columns +-- compares ORE terms, so the btree on ord_term below serves it) +CREATE INDEX users_email_eq + ON users USING hash (eql_v3.eq_term(email)); + +-- Range / ordering: btree index on ord_term +-- (columns typed eql_v3._ord or _ord_ore) +CREATE INDEX users_created_at_ord + ON users USING btree (eql_v3.ord_term(created_at)); + +-- Text match: GIN index on match_term +-- (columns typed eql_v3.text_match or text_search) +CREATE INDEX users_name_match + ON users USING gin (eql_v3.match_term(name)); + +ANALYZE users; +``` + +Run `ANALYZE` after every index build. `CREATE INDEX` on an expression gathers no statistics for that expression โ€” without `ANALYZE`, the planner has no histogram for `eql_v3.eq_term(email)` and can misjudge the index it just built. + +Create indexes when the table has a significant number of rows (typically more than 1,000) and you query the column with the matching operator. Drop indexes for capabilities you no longer query โ€” duplicate indexes compete for cache and slow writes. + +## Requirements for an index to engage + +All three must hold: + +1. **The value carries the required term.** Equality needs `hm`, range needs `ob`, containment needs `bf`. Which terms travel in a value's payload is decided by the encryption client โ€” a value with only a bloom term will not drive an equality index. +2. **The index was built after the data carried the term.** If you change which terms a column's values carry, recreate the index. +3. **The query operand is typed.** A typed parameter (`$1`, which CipherStash Proxy supplies) or an explicit cast resolves the encrypted operator; a bare `jsonb` literal falls through to native `jsonb` semantics and skips the index entirely: + +```sql +-- โœ“ resolves the encrypted operator โ†’ uses the index +WHERE email = $1; +WHERE email = $1::eql_v3.text_eq; + +-- โœ— falls through to native jsonb semantics +WHERE email = '{"hm":"abc"}'::jsonb; +``` + +## Query shapes + +### Equality + +```sql +SELECT * FROM users WHERE email = $1; +-- Index Scan using users_email_eq +-- Index Cond: (eql_v3.eq_term(email) = eql_v3.eq_term($1)) +``` + +### Range and ORDER BY + +The `<`, `<=`, `>`, `>=` operators inline to comparisons on `eql_v3.ord_term`, so natural-form range predicates match the btree: + +```sql +SELECT * FROM users WHERE created_at < $1; +``` + +`ORDER BY` needs care. The planner inlines operators in *predicates* but does not rewrite *sort keys*: `ORDER BY created_at` uses the index for the `WHERE` clause but still adds a `Sort` node, which scales linearly with the rows passing the filter. To stream rows out of the btree already ordered, write the sort key in extractor form: + +```sql +SELECT * FROM users + WHERE created_at < $1 + ORDER BY eql_v3.ord_term(created_at) DESC + LIMIT 10; +``` + +ORE terms are order-preserving, so this sorts identically to the natural form โ€” it just lets the index do the ordering. At large row counts this is the difference between seconds and milliseconds. + + +If you `SELECT col::jsonb ... ORDER BY col`, Postgres folds the cast into the scan and uses `(col)::jsonb` as the sort key โ€” which matches no index. Project the column raw, or write the sort key as `eql_v3.ord_term(col)`, which sidesteps this entirely. + + +### GROUP BY and DISTINCT + +Group on the extractor, not the raw column: + +```sql +SELECT eql_v3.eq_term(email), count(*) + FROM users + GROUP BY eql_v3.eq_term(email); +``` + +`GROUP BY email` uses the entire encrypted payload (1โ€“2 KB per row) as the hash key; Postgres estimates a hash table far larger than the default `work_mem` and falls back to a disk-spilling `GroupAggregate`. The extractor key is a small deterministic term, so the hash table fits in `work_mem` and the planner picks `HashAggregate` reliably. If an ORM forces the raw-column form, raising `work_mem` is the rescue knob โ€” but the extractor form is the design. + +## Encrypted JSON + +Containment (`@>` / `<@`) on `eql_v3.json` document columns uses a GIN index over `eql_v3.to_ste_vec_query(col)::jsonb`, and field-level equality and ordering have their own extractor recipes. See [JSON](/reference/eql/json). + +## Verify with EXPLAIN + +The first move on a slow query is `EXPLAIN (COSTS OFF)`: + +- **`Index Scan using `** โ€” the functional index is engaged. +- **`Index Cond:` referencing the extractor** (`eql_v3.eq_term(...)`, `eql_v3.ord_term(...)`) โ€” the inlined predicate matched the index. +- **`Seq Scan`** โ€” no index used. Check the three requirements above. +- **`Filter:` showing the raw operator** โ€” inlining did not happen. Usual causes: a pinned `search_path` on a customised function, or the planner judging another plan cheaper. +- **`Sort` node above an Index Scan** โ€” natural-form `ORDER BY`. Switch the sort key to `eql_v3.ord_term(col)` to eliminate it. + +Once the plan looks right, repeat with `EXPLAIN ANALYZE` to measure actual timings. For a full diagnosis walkthrough, see [query performance troubleshooting](/guides/troubleshooting/query-performance). + +## Building indexes on large tables + +Index *build* time is a separate axis from query time โ€” a functional index that queries in a millisecond can take hours to `CREATE` on a large table. + +**Raise `maintenance_work_mem`.** `CREATE INDEX` draws on `maintenance_work_mem` (default 64 MB โ€” far too small for a multi-million-row build). It's the single highest-leverage knob: + +```sql +SET maintenance_work_mem = '2GB'; +CREATE INDEX users_email_eq ON users USING btree (eql_v3.eq_term(email)); +``` + +**Prefer `btree` over `hash` for equality on large tables.** A btree build sorts then bulk-loads with sequential writes and can parallelise; a hash build scatters rows to random buckets and degrades to random I/O once the index outgrows cache โ€” it cannot parallelise. A btree on `eql_v3.eq_term(col)` serves `=` exactly as well as a hash index, with no query-side cost. Hash is fine up to mid-six-figure row counts. + +**Expect a de-TOAST floor.** A functional index over a large encrypted column de-TOASTs the whole stored value once per row to evaluate the extractor. This cost is identical across access methods and sets the build's floor rate. Index builds are also I/O-heavy in a way queries are not โ€” containerised Postgres on a virtualised filesystem (Docker Desktop on macOS, notably) pays a steep penalty, so run large builds on native storage. + +**Watch the build.** From a second session while `CREATE INDEX` runs: + +```sql +SELECT phase, tuples_done, tuples_total, + round(100.0 * tuples_done / nullif(tuples_total, 0), 1) AS pct +FROM pg_stat_progress_create_index; +``` + +A steady `tuples_done` rate is healthy. A rate that decays over time is the cache/memory wall โ€” raise `maintenance_work_mem`, and if it's a hash index, rebuild it as a btree. + +## Why this works on managed Postgres + +Everything above is a functional index over an `IMMUTABLE` SQL function โ€” no operator class on a column, no superuser, no `postgresql.conf` changes. Managed platforms that block custom operator classes (Supabase among them) run these recipes unchanged, so the indexing model is identical on Supabase, RDS, Cloud SQL, and self-hosted Postgres. See the [Supabase integration](/integrations/supabase). + +## Troubleshooting + +**Index not being used:** + +1. Verify the value carries the term: + + ```sql + SELECT email::jsonb ? 'hm' AS has_hmac, + email::jsonb ? 'ob' AS has_ore_block, + email::jsonb ? 'bf' AS has_bloom + FROM users LIMIT 1; + ``` + +2. Verify the operand is typed (`$1::eql_v3.text_eq`, not `$1::jsonb`). +3. Recreate the index if the column's terms changed after it was built. +4. Run `ANALYZE`. Very small tables may still choose a sequential scan โ€” that's correct. + +**`=` returns zero rows on a populated column:** equality requires the term its variant compares โ€” `hm` on `_eq` / `text_search`, `ob` on `_ord` variants. Type the column as an equality-capable variant and confirm the encryption client is emitting that term. diff --git a/content/docs/reference/eql/json.mdx b/content/docs/reference/eql/json.mdx new file mode 100644 index 0000000..6ef2e60 --- /dev/null +++ b/content/docs/reference/eql/json.mdx @@ -0,0 +1,228 @@ +--- +title: Encrypted JSON +description: "Store and query encrypted JSON documents with eql_v3.json โ€” containment, field access, and path queries over ciphertext, with the native jsonb operators that don't apply blocked outright." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +`eql_v3.json` is EQL's encrypted JSON document type, built on structured encryption (**ste_vec**). The document is encrypted as a vector of encrypted entries โ€” one entry per path inside the document โ€” and every path is queryable without decryption: containment, field and array access, and equality or range comparisons on extracted leaves. + +Like every EQL type, `eql_v3.json` holds ciphertext the database can't read. Encryption, decryption, and selector generation happen in the client โ€” the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy). See [Searchable encryption](/concepts/searchable-encryption) for how querying ciphertext works at all. + +## The types + +Three `jsonb`-backed domains make up the encrypted JSON surface: + +| Type | What it is | +| --- | --- | +| `eql_v3.json` | The column type. An encrypted document envelope carrying an `sv` array โ€” one encrypted entry per path in the document. | +| `eql_v3.ste_vec_entry` | A single entry from the vector: a selector, a ciphertext, and exactly one index term. This is what `->` returns. | +| `eql_v3.ste_vec_query` | A containment needle: entries with selectors and index terms but **no ciphertext**. This is what you cast a `@>` operand to. | + +The full wire shape of each is documented in [Payload format](/reference/eql/payload-format). + +## Storing encrypted JSON + +Type the column as `eql_v3.json`: + +```sql +CREATE TABLE orders ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + metadata eql_v3.json +); +``` + +There is no database-side configuration step. Which index terms a document carries is decided by the encryption client; typing the column as `eql_v3.json` is what makes the encrypted operators and functions resolve. The domain's `CHECK` constraint validates the payload shape on insert, so malformed values are rejected at write time. + +Insert and read through the Stack SDK or Proxy, which encrypt the document into the ste_vec payload on write and decrypt it on read. + +## What each node type supports + +During encryption, the client flattens the document: each unique path gets a deterministic **selector** hash, and each node gets an entry in the `sv` vector carrying index terms for its JSON type: + +| JSON node type | Index term | Equality (`=`, `<>`, `GROUP BY`) | Ordering (`<` โ€ฆ `>=`, `MIN`/`MAX`) | +| --- | --- | --- | --- | +| Object | `hm` (HMAC-256) | Yes | No | +| Array | `hm` (HMAC-256) | Yes | No | +| Boolean / JSON `null` | `hm` (HMAC-256) | Yes | No | +| String | `oc` (CLLW ORE, string domain) | Yes | Yes | +| Number | `oc` (CLLW ORE, numeric domain) | Yes | Yes | + +Each entry carries exactly one of `hm` or `oc` โ€” the domain `CHECK` enforces the exclusivity. `hm` is a deterministic hash, so it supports equality only. `oc` is a CLLW ORE term that reveals ordering and, being deterministic, collapses to equality on matching selectors โ€” `eql_v3.eq_term` reads whichever term an entry carries, so equality works uniformly across all node types. Earlier payload versions split the ORE term into `ocf` (fixed-width, numeric) and `ocv` (variable-width, string); current payloads emit a single `oc` whose leading domain-tag byte carries the numeric/string distinction. + +JSON `null` here means a `null` literal *inside* the document. A SQL `NULL` column value is not encrypted at all. + +## Blocked native jsonb operators + +These native PostgreSQL `jsonb` operators are **blocked** on `eql_v3.json`. They raise an error rather than silently running plaintext-jsonb semantics against the encrypted payload: + +- Key/path existence: `?`, `?|`, `?&`, `@?`, `@@` +- Path extraction: `#>`, `#>>` +- Mutation: `-`, `#-`, `||` +- Root-document comparison: `=`, `<>`, `<`, `<=`, `>`, `>=` + +Use containment (`@>` / `<@`), field access (`->` / `->>`), or the `eql_v3.jsonb_path_*` functions instead. There is no server-side mutation of an encrypted document โ€” updates re-encrypt in the client. + + +**Type your operands.** `eql_v3.json` is a domain over `jsonb`, and PostgreSQL resolves `domain OP untyped_literal` to the **native** `jsonb` operator โ€” bypassing both the encrypted operator and the blockers. `WHERE doc -> 'email'` silently runs native `jsonb ->` and returns `NULL`; `WHERE doc -> 'email'::text` resolves the encrypted operator. This is the same rule as the [scalar operators](/reference/eql/operators). Queries through CipherStash Proxy always bind typed parameters, so this only bites hand-written ad-hoc SQL. + + +## Containment: `@>` and `<@` + +`@>` tests whether the encrypted document contains a structure; `<@` is the reverse. Build the needle with the client and cast it to `eql_v3.ste_vec_query` (a typed `eql_v3.json` or `eql_v3.ste_vec_entry` operand also works): + +```sql +SELECT * FROM orders +WHERE metadata @> $1::eql_v3.ste_vec_query; +``` + +This is the encrypted equivalent of the plaintext `metadata @> '{"customer": {"tier": "premium"}}'`: containment checks that every encrypted term in the needle exists in the document's `sv` vector. `eql_v3.to_ste_vec_query(doc)` converts a stored document into the needle shape, and `eql_v3.ste_vec_contains(a, b)` is the function form backing `@>`. + +For large tables, back containment with a GIN index. The typed `@>` overload inlines to a native `jsonb @>` over `eql_v3.to_ste_vec_query(col)::jsonb`, so a GIN index on that same expression engages: + +```sql +CREATE INDEX orders_metadata_gin + ON orders USING gin (eql_v3.to_ste_vec_query(metadata)::jsonb jsonb_path_ops); +ANALYZE orders; +``` + +See [Indexes](/reference/eql/indexes) for the full recipes. + +## Field access: `->` and `->>` + +Fields are addressed by **selector hash** โ€” the deterministic identifier the client emits for a JSON path during encryption โ€” not a plaintext path string like `$.customer.tier`. + +```sql +-- Field access by selector (returns eql_v3.ste_vec_entry) +SELECT metadata -> 'selector_hash'::text FROM orders; + +-- The entry serialized as text (ciphertext JSON, not decrypted plaintext) +SELECT metadata ->> 'selector_hash'::text FROM orders; + +-- Array element by 0-based index +SELECT metadata -> 0 FROM orders; +``` + +The extracted `eql_v3.ste_vec_entry` is itself comparable: + +- `=` / `<>` resolve via `eql_v3.eq_term` โ€” works on every node type +- `<` / `<=` / `>` / `>=` resolve via `eql_v3.ore_cllw` โ€” String and Number leaves only +- `MIN` / `MAX` over an extracted ordered leaf use the `eql_v3.min` / `eql_v3.max` aggregates + +```sql +-- Equality on an extracted leaf +SELECT * FROM orders +WHERE metadata -> 'email_selector'::text = $1::eql_v3.ste_vec_entry; + +-- Group by an extracted leaf's equality term +SELECT eql_v3.eq_term(metadata -> 'region_selector'::text) AS region, COUNT(*) +FROM orders +GROUP BY eql_v3.eq_term(metadata -> 'region_selector'::text); +``` + +A hash index on `eql_v3.eq_term(col -> ''::text)` engages the equality lookup; a btree on `eql_v3.ore_cllw(...)` engages range and `ORDER BY`. See [Indexes](/reference/eql/indexes). + +## Path queries and array helpers + +The function forms take the same selector hashes: + +```sql +-- All entries matching a selector +SELECT eql_v3.jsonb_path_query(metadata, 'selector_hash') FROM orders; + +-- First match only +SELECT eql_v3.jsonb_path_query_first(metadata, 'selector_hash') FROM orders; + +-- Does the selector exist in this document? +SELECT eql_v3.jsonb_path_exists(metadata, 'selector_hash') FROM orders; +``` + +For encrypted array nodes: + +```sql +SELECT eql_v3.jsonb_array_length(metadata -> 'items_selector'::text) FROM orders; +SELECT eql_v3.jsonb_array_elements(metadata -> 'items_selector'::text) FROM orders; +SELECT eql_v3.jsonb_array_elements_text(metadata -> 'items_selector'::text) FROM orders; +``` + +`jsonb_array_elements` yields encrypted entries; `jsonb_array_elements_text` yields each element as ciphertext text. + +## Worked example + +An `orders` table with an encrypted `metadata` document. The plaintext your application works with: + +```json +{ + "customer": { + "tier": "premium", + "region": "apac" + }, + "items": ["sku-1042", "sku-2210"] +} +``` + +The client encrypts this into a ste_vec payload with selectors for `$`, `$.customer`, `$.customer.tier`, `$.customer.region`, `$.items`, and each array element โ€” every path becomes queryable. + + + + +### Create the table and insert + +```sql +CREATE TABLE orders ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + metadata eql_v3.json +); + +INSERT INTO orders (metadata) VALUES ($1); +-- $1 is the encrypted ste_vec payload produced by the Stack SDK or Proxy +``` + + + + +### Query by containment + +Find premium orders. The client encrypts the needle `{"customer": {"tier": "premium"}}` into a `ste_vec_query`: + +```sql +SELECT id FROM orders +WHERE metadata @> $1::eql_v3.ste_vec_query; +``` + +Add the GIN index from above once the table grows. + + + + +### Query by path + +Count orders per region, grouping on the encrypted leaf โ€” the database never sees `"apac"`: + +```sql +SELECT eql_v3.eq_term(metadata -> 'region_selector'::text) AS region, COUNT(*) +FROM orders +WHERE eql_v3.jsonb_path_exists(metadata, 'region_selector') +GROUP BY 1; +``` + +The rows come back as ciphertext; decrypt them in the client. + + + + +## In this section + + + + The wire shape of the ste_vec envelope and its entries. + + + GIN containment and field-level functional index recipes. + + + The full operator surface, including the typed-operand rule. + + diff --git a/content/docs/reference/eql/meta.json b/content/docs/reference/eql/meta.json index d58bc8f..48fe2e7 100644 --- a/content/docs/reference/eql/meta.json +++ b/content/docs/reference/eql/meta.json @@ -1,4 +1,11 @@ { "title": "EQL", - "pages": ["..."] + "pages": [ + "types", + "operators", + "indexes", + "json", + "functions", + "payload-format" + ] } diff --git a/content/docs/reference/eql/operators.mdx b/content/docs/reference/eql/operators.mdx new file mode 100644 index 0000000..60a5fff --- /dev/null +++ b/content/docs/reference/eql/operators.mdx @@ -0,0 +1,153 @@ +--- +title: Operators +description: "Which SQL operators work on each eql_v3 encrypted-domain variant, how unsupported operators fail, and why operands must be typed." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +EQL overloads standard PostgreSQL operators on the [encrypted-domain types](/reference/eql/types). Type the column as the variant that carries the right index term and the operator resolves โ€” and engages a matching [functional index](/reference/eql/indexes). + + +**Operands must be typed.** The `eql_v3` domains are backed by `jsonb`. When an operand has no known type โ€” a bare string literal, an untyped parameter โ€” PostgreSQL reduces the domain to its `jsonb` base type and resolves the **native `jsonb` operator** instead of the encrypted one. The query doesn't fail; it silently returns native `jsonb` semantics, which are meaningless for encrypted payloads. + +Always type the operand: a typed parameter (`$1::eql_v3.text_eq`) or an explicit cast (`'โ€ฆ'::eql_v3.int4_ord`). The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) type bound parameters automatically โ€” raw SQL must do it by hand. + + +## Operator support by variant + +A โœ… means the operator resolves on a column typed as that variant. A โŒ means it is blocked โ€” it raises, it does not return wrong rows. + +| SQL operator | Meaning | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | `text_match` | `text_search` | +| --- | --- | :---: | :---: | :---: | :---: | :---: | +| `=` | Equality | โŒ | โœ… | โœ… | โŒ | โœ… | +| `<>` / `!=` | Inequality | โŒ | โœ… | โœ… | โŒ | โœ… | +| `<` `<=` `>` `>=` | Ordered comparison | โŒ | โŒ | โœ… | โŒ | โœ… | +| `@>` / `<@` | Bloom-filter token containment | โŒ | โŒ | โŒ | โœ… | โœ… | +| `LIKE` / `ILIKE` (`~~` / `~~*`) | SQL pattern match | โŒ | โŒ | โŒ | โŒ | โŒ | +| `IS NULL` / `IS NOT NULL` | Null check | โœ… | โœ… | โœ… | โœ… | โœ… | + +A SQL `NULL` column value is not encrypted, so `IS NULL` / `IS NOT NULL` always work regardless of variant. + +## There is no `LIKE` + +`LIKE` and `ILIKE` (`~~` / `~~*`) raise on **every** encrypted-domain variant. SQL pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment โ€” `@>` on a `text_match` or `text_search` column: + +```sql +-- โŒ Raises: operator not supported +SELECT * FROM users WHERE email LIKE '%alice%'; + +-- โœ… Encrypted free-text match +SELECT * FROM users WHERE email @> $1::eql_v3.text_match; +``` + +`@>` / `<@` here is **probabilistic ngram-bloom containment** โ€” it tests whether the encrypted text contains the (encrypted) search terms. It is not JSONB containment and not `LIKE`. The client encrypts the search term into a bloom-filter query value; false positives are possible, false negatives are not. + +## Unsupported operators fail loudly + +Unsupported operators are not silent no-ops. Every operator that a variant doesn't support is still *defined* โ€” it routes to a blocker function that raises an `operator โ€ฆ is not supported` exception. A mis-typed query fails loudly instead of silently returning wrong results: + +```sql +-- salary is eql_v3.int8_eq (equality only) +SELECT * FROM users WHERE salary > $1::eql_v3.int8_eq; +-- ERROR: operator > is not supported for eql_v3.int8_eq +``` + +A `NULL` operand still raises โ€” the blockers are deliberately not `STRICT`, so PostgreSQL can't skip the check. + +## Query shapes + +### Equality: `=` and `<>` + +Works on `_eq`, `_ord` / `_ord_ore`, and `text_search`. On `_eq` and `text_search`, equality compares the HMAC (`hm`) term; on `_ord` variants it compares the ORE (`ob`) term, which collapses to equality โ€” so `_ord` columns get equality without carrying an `hm` term: + +```sql +SELECT * FROM users WHERE email = $1::eql_v3.text_eq; +SELECT * FROM users WHERE email <> $1::eql_v3.text_eq; +``` + +### Comparison, `BETWEEN`, and `ORDER BY` + +Works on `_ord` / `_ord_ore` and `text_search` (variants carrying an `ob` ORE term): + +```sql +SELECT * FROM users WHERE salary >= $1::eql_v3.int8_ord; + +-- BETWEEN desugars to >= and <= +SELECT * FROM users +WHERE created_at BETWEEN $1::eql_v3.timestamp_ord AND $2::eql_v3.timestamp_ord; + +-- ORDER BY is meaningful only with an ORE term +SELECT * FROM users ORDER BY salary DESC; +``` + +`ORDER BY` on a variant without an `ob` term won't produce a meaningful order โ€” type the column as an `_ord` variant when ordering matters. + +Bare `ORDER BY col` sorts correctly, but the planner doesn't rewrite sort keys, so it adds a `Sort` node even when a btree index exists. To stream rows out of the index already ordered, write the sort key in extractor form (`ORDER BY eql_v3.ord_term(col)`) โ€” see [Range and ORDER BY](/reference/eql/indexes#range-and-order-by). + +### Text containment: `@>` and `<@` + +Works on `text_match` and `text_search` only: + +```sql +SELECT * FROM users WHERE email @> $1::eql_v3.text_match; +``` + +### `IN` + +Desugars to `=`, so it needs an equality-capable variant (`_eq`, `_ord`, `text_search`): + +```sql +SELECT * FROM users +WHERE email IN ($1::eql_v3.text_eq, $2::eql_v3.text_eq); +``` + +### `GROUP BY` and `DISTINCT` + +Need an equality term (`_eq`, `_ord`, `text_search`): + +```sql +SELECT email, COUNT(*) FROM logins GROUP BY email; +SELECT DISTINCT email FROM logins; +``` + +Plain `COUNT(col)` needs no term and works on any variant; `COUNT(DISTINCT col)` needs an equality term. + +### Joins + +Equijoins work on equality-capable variants, with one extra constraint: **both sides must have been encrypted with the same keyset and typed as a matching variant** โ€” otherwise the equality terms can never match: + +```sql +SELECT u.*, o.total +FROM users u +JOIN orders o ON u.email = o.customer_email; -- both eql_v3.text_eq, same keyset +``` + +The same rule applies to `IN (subquery)` and set-operation deduplication. + +## Function-form equivalents + +Some managed platforms disallow custom operators. Every operator has a function form, generated per domain variant, taking the same domain types: + +| Function | Operator | Available on | +| --- | --- | --- | +| `eql_v3.eq(a, b)` | `=` | `_eq`, `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.neq(a, b)` | `<>` | `_eq`, `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.lt(a, b)` | `<` | `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.lte(a, b)` | `<=` | `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.gt(a, b)` | `>` | `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.gte(a, b)` | `>=` | `_ord` / `_ord_ore`, `text_search` | +| `eql_v3.contains(a, b)` | `@>` | `text_match`, `text_search`, `eql_v3.json` | +| `eql_v3.contained_by(a, b)` | `<@` | `text_match`, `text_search`, `eql_v3.json` | + +```sql +SELECT * FROM users WHERE eql_v3.eq(email, $1::eql_v3.text_eq); +SELECT * FROM users WHERE eql_v3.lt(created_at, $1::eql_v3.timestamp_ord); +``` + +There are no `like` / `ilike` function forms โ€” text matching is `eql_v3.contains` on a `text_match` value. See [Functions](/reference/eql/functions) for the full function surface, including `MIN` / `MAX`. + +## JSON operators + +`eql_v3.json` has its own operator surface โ€” document containment (`@>` / `<@`), field access (`->` / `->>`), and comparisons on extracted leaves โ€” and its own set of blocked native JSONB operators. See [JSON support](/reference/eql/json). diff --git a/content/docs/reference/eql/payload-format.mdx b/content/docs/reference/eql/payload-format.mdx new file mode 100644 index 0000000..24af439 --- /dev/null +++ b/content/docs/reference/eql/payload-format.mdx @@ -0,0 +1,123 @@ +--- +title: Payload format +description: "The wire format of every EQL encrypted value: the v/i/c envelope, the index-term keys, and the ste_vec document shape." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Every EQL encrypted value is a `jsonb` payload with a shared envelope plus the index terms that make it queryable. This page defines that wire format. Earlier CipherStash docs called this format the **CipherCell** โ€” this page is the current definition of the same structure. + +Payloads are produced by the encryption clients โ€” the [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) โ€” and consumed by EQL's operators and functions inside Postgres. EQL never sees plaintext: it validates, stores, and compares these payloads; it cannot produce or decrypt them. + +## The envelope + +Every payload carries three envelope keys. Each `eql_v3` domain's `CHECK` constraint requires them, so a value missing any of these is rejected at write time: + +| Key | Contents | Notes | +| --- | --- | --- | +| `v` | Payload version | Always exactly `2` on the wire. The domain `CHECK`s assert it and raise on any other value. | +| `i` | Ident: `{"t": "", "c": ""}` | Binds the ciphertext to the table and column it was encrypted for. Both keys required. | +| `c` | Ciphertext | The opaque, non-deterministic encrypted blob (mp_base85-encoded). Never used in comparisons. | + + +`eql_v3` names the **SQL schema generation**, not the payload version. The JSON envelope version is still `v: 2` โ€” the wire field names are unchanged from EQL v2, and the domain `CHECK`s assert `v = 2`. + + +A `k` discriminator (`"ct"` for a scalar ciphertext, `"sv"` for a JSON document) also appears on payloads emitted by the clients, distinguishing the two top-level shapes. + +## Index-term keys + +Alongside the envelope, a payload carries the index terms for its column's capability. On the wire, a payload is discriminated by *which term key is present* โ€” the SQL domain name carries the rest. Each key is backed by a SEM (searchable encrypted metadata) type in the `eql_v3` schema: + +| Key | SEM type | Wire shape | Enables | Reveals | +| --- | --- | --- | --- | --- | +| `hm` | `eql_v3.hmac_256` (domain over `text`) | Hex string (HMAC-SHA-256) | `=`, `<>` on `_eq` and `text_search` domains | Whether two values are equal โ€” nothing else | +| `ob` | `eql_v3.ore_block_256` (composite: array of `bytea` block terms) | Array of hex-encoded ORE blocks | `<`, `<=`, `>`, `>=`, `ORDER BY` on `_ord` / `_ord_ore` domains โ€” and `=` / `<>`, since ORE comparison collapses to equality | The relative order of two values | +| `bf` | `eql_v3.bloom_filter` (domain over `smallint[]`) | Array of set bit positions (**signed** 16-bit) | `@>` / `<@` token containment on `_match` domains | Probabilistic token overlap between values | + +Notes on the wire shapes: + +- **`ob` block count is width-agnostic**: 8 blocks for the int scalars, 12 for timestamp, 14 for numeric โ€” the array just carries more block strings. +- **`bf` positions are signed**: EQL stores the filter as PostgreSQL `smallint[]`, and filters sized above 32768 emit upper-half bit positions as *negative* signed values. Consumers must use a signed 16-bit integer type. + +The capability is encoded as **required keys**: the payload for an `eql_v3.text_eq` column must carry `hm`; an `eql_v3.int4_ord` payload must carry `ob` (and only `ob` โ€” equality on `_ord` domains compares ORE terms, so no `hm` is needed); a `text_match` payload must carry `bf`; a `text_search` payload carries all three. A payload missing its term key fails the domain `CHECK` โ€” and fails to deserialize in the client bindings. See [Types](/reference/eql/types) for the domain-to-capability mapping, and [Searchable encryption](/concepts/searchable-encryption) for what these terms do and don't leak. + +## JSON documents: the `sv` vector + +An [encrypted JSON document](/reference/eql/json) uses a different payload shape: no root ciphertext, and an `sv` array with one encrypted entry per path in the document. Each entry carries: + +| Key | Contents | +| --- | --- | +| `s` | Selector โ€” a deterministic hash of the JSON path. Required; entry matching compares selectors first. | +| `c` | Ciphertext for the node at that path. | +| `hm` **or** `oc` | Exactly one, never both โ€” the domain `CHECK` enforces the exclusivity. `hm` (HMAC-256) on Boolean/`null` leaves and Object/Array roots; `oc` (CLLW ORE, backed by `eql_v3.ore_cllw`) on String/Number leaves. | +| `a` | Optional array marker โ€” `true` when the selector points at an array context. | + +The decoded `oc` value starts with a domain-tag byte (`0x00` numeric, `0x01` string) followed by the CLLW ciphertext, so numeric and string values in one column keep a consistent total order. Earlier payload versions split this into two fields โ€” `ocf` (fixed-width, numeric) and `ocv` (variable-width, string) โ€” which consolidated into the single `oc` key; the tag byte now carries the distinction. + +A containment **query** payload (`eql_v3.ste_vec_query`) has the same `sv` shape but its entries carry no `c` โ€” containment matches selectors and index terms, never ciphertexts. + +## Example payloads + +A scalar payload for an `eql_v3.text_search` column (lookup + ordering + free-text match, so all three terms are required): + +```json +{ + "v": 2, + "i": { "t": "users", "c": "email" }, + "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", + "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", + "ob": ["7a1fd0c2...", "d24c9be1...", "03fa66b8..."], + "bf": [42, 1290, -8113, 30201] +} +``` + +- `v`, `i`, `c` โ€” the envelope +- `hm` โ€” equality term: `WHERE email = $1` compares this +- `ob` โ€” ordering term: `ORDER BY` and range comparisons walk these blocks +- `bf` โ€” bloom-filter term: `@>` token containment tests these bit positions + +A JSON document payload for an `eql_v3.json` column: + +```json +{ + "v": 2, + "k": "sv", + "i": { "t": "orders", "c": "metadata" }, + "sv": [ + { "s": "2517068c0d1f9d4d41d2c666211f785e", "c": "mBbKmM...", "hm": "b0e0..." }, + { "s": "f510853a4ab9d4f75f51a533ac264c5d", "c": "mBbKmQ...", "oc": "01a3f2..." }, + { "s": "33743aed3ae636f6bf05cff11ac4b519", "c": "mBbKmR...", "oc": "004e19..." } + ] +} +``` + +- First entry: an object root โ€” `hm` only, equality/containment +- Second entry: a string leaf โ€” `oc` starting with tag `01` +- Third entry: a numeric leaf โ€” `oc` starting with tag `00` + +And the containment needle the client builds for a `@>` query โ€” index terms, no ciphertexts: + +```json +{ + "sv": [ + { "s": "f510853a4ab9d4f75f51a533ac264c5d", "oc": "01a3f2..." } + ] +} +``` + +## Machine-readable schemas + +The [EQL repository](https://github.com/cipherstash/encrypt-query-language) publishes the format as JSON Schema in two places: + +- **`crates/eql-bindings/schema/`** โ€” one schema per scalar domain (`$id`s under `https://schemas.cipherstash.com/eql/v3/`), generated from the canonical Rust wire types in the `eql-bindings` crate. TypeScript bindings are generated from the same definitions, so every producer and consumer shares one source of truth. +- **`docs/reference/schema/`** โ€” full-payload schemas covering both the scalar and `sv` document shapes. These files are currently named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`) and reference `eql_v2` function names, even though the current SQL surface is `eql_v3` โ€” the v2.3 schema is the applicable document-shape definition, matching the still-`v: 2` envelope. + +## Who produces and consumes this + +- **Produce:** the Stack SDK and CipherStash Proxy encrypt plaintext into these payloads โ€” ciphertext, index terms, selectors โ€” using keys the database never holds. +- **Consume:** EQL's domain `CHECK`s validate the shape on write, and its operators and extractor functions ([Operators](/reference/eql/operators), [Indexes](/reference/eql/indexes)) compare the term keys at query time. + +The division is strict: EQL never sees plaintext, and the clients never rely on the database for key material. diff --git a/content/docs/reference/eql/types.mdx b/content/docs/reference/eql/types.mdx new file mode 100644 index 0000000..e4eb005 --- /dev/null +++ b/content/docs/reference/eql/types.mdx @@ -0,0 +1,95 @@ +--- +title: Encrypted types +description: "The eql_v3 encrypted-domain type families: which domain variant to declare for each scalar type, and what each variant lets you query." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +EQL ships its searchable-encryption surface as PostgreSQL **domains in the `eql_v3` schema**. There are two kinds: + +- **Per-scalar encrypted-domain types** โ€” `eql_v3.int4`, `eql_v3.text`, `eql_v3.timestamp`, and so on. One family of domain *variants* per scalar type. +- **An encrypted-JSON document type** โ€” `eql_v3.json` โ€” for structured encryption of whole JSONB documents. See [JSON support](/reference/eql/json). + +A column's query capability is fixed by the **domain variant you type it as**. There is no database-side configuration step: which index terms travel in a value's payload is decided by the encryption client (the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy)), and the column's domain variant is what makes the matching operators resolve. + +## The family model + +Every scalar type `` generates a storage-only variant plus the query variants its capabilities allow. All variants are `jsonb`-backed domains. + +| Domain variant | Capability | Index term carried | +| --- | --- | --- | +| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | +| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (`eql_v3.hmac_256`) | +| `eql_v3._ord` / `eql_v3._ord_ore` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and the `eql_v3.min` / `eql_v3.max` aggregates. | `ob` (`eql_v3.ore_block_256`) | +| `eql_v3.text_match` (text only) | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (`eql_v3.bloom_filter`) | +| `eql_v3.text_search` (text only) | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | + +Two things worth calling out: + +- **The bare variant blocks everything.** `eql_v3.` carries no index term. Querying it with any comparison operator raises an "operator not supported" exception. Use it for columns you only ever store and decrypt. If you later need to query, type the column as a query variant โ€” or cast at the call site (`col::eql_v3.int4_ord`) if the payload already carries the term. +- **`_ord` and `_ord_ore` are twins.** They are byte-identical surfaces backed by the same ORE block term. Pick the name that documents intent โ€” "ordered" versus "ordered via ORE block". Both support the full ordered surface and `MIN` / `MAX`. + +## Type matrix + +The scalar tokens that ship in EQL 3.0.0 are `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool`. + +| Scalar | `eql_v3.` | `_eq` | `_ord` | `_ord_ore` | `text_match` | `text_search` | +| --- | :---: | :---: | :---: | :---: | :---: | :---: | +| `int2` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `int4` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `int8` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `float4` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `float8` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `numeric` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `date` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `timestamp` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | +| `text` | โœ… | โœ… | โœ… | โœ… | โœ… | โœ… | +| `bool` | โœ… | โŒ | โŒ | โŒ | โ€” | โ€” | + + +**`bool` is storage-only by design.** A two-value column has too little cardinality for any searchable index to be safe โ€” an equality index over `true`/`false` would leak the value distribution outright. EQL ships only `eql_v3.bool`, with no `_eq` or `_ord` variants. Store and decrypt boolean columns; filter on them client-side. + + +## Index terms + +Each query variant stores one or more encrypted index terms alongside the ciphertext: + +- **`hm`** โ€” an HMAC-256 term (`eql_v3.hmac_256`). Supports exact equality. +- **`ob`** โ€” an ORE block term (`eql_v3.ore_block_256`). Order-revealing: supports comparison and sorting. +- **`bf`** โ€” a bloom filter term (`eql_v3.bloom_filter`). Supports probabilistic ngram token containment. + +The payload structure โ€” envelope keys plus per-variant term keys โ€” is documented in [Payload format](/reference/eql/payload-format). What each term mathematically reveals about the plaintext (and why you should only carry the terms you need) is covered in [Searchable encryption](/concepts/searchable-encryption). + +## Encrypted JSON: `eql_v3.json` + +`eql_v3.json` is the encrypted-JSON document domain, built on the structured-encryption ("ste_vec") model: a JSONB document is encrypted into a searchable vector of terms, one per path inside the document, supporting containment (`@>`), field access (`->` / `->>`), and path queries. It has its own operator and function surface โ€” see [JSON support](/reference/eql/json). + +## Choosing a variant + +Declare only the capabilities you query on. Every index term a value carries is extra material stored in the database, and each term class reveals different structure to an observer โ€” equality terms reveal value repetition, ORE terms reveal ordering, bloom terms reveal token overlap (see [Searchable encryption](/concepts/searchable-encryption)): + +- Never queried, only decrypted โ†’ bare `eql_v3.` +- Exact lookup, `IN`, joins, `GROUP BY` โ†’ `_eq` +- Ranges, `ORDER BY`, `MIN`/`MAX` โ†’ `_ord` +- Free-text matching on text โ†’ `text_match` +- Text you need to look up, sort, *and* search โ†’ `text_search` + +The variant you declare must match the terms the client is configured to emit for that column โ€” the domain makes the operator resolve, but the term in the payload is what makes it answer. + +## Example + +```sql +CREATE TABLE users ( + id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_search, -- lookup, sort, and free-text match + name eql_v3.text_match, -- free-text match only + tax_id eql_v3.text_eq, -- exact lookup only + salary eql_v3.int8_ord, -- range queries, ORDER BY, MIN/MAX + is_active eql_v3.bool, -- storage only (by design) + created_at eql_v3.timestamp_ord +); +``` + +Once the table exists, add functional indexes on the term extractors so queries engage an index โ€” see [Indexes](/reference/eql/indexes). The operators each variant supports are listed in [Operators](/reference/eql/operators). From dd2a8d68b431b40abad7e4aac811029a2b826586 Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 19:56:24 +1000 Subject: [PATCH 2/6] refactor(v2): restructure EQL reference Tailwind-style (CIP-3326) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit EQL is an abstraction over SQL the way Tailwind is over CSS โ€” the docs now follow the same shape: Install โ†’ Core concepts โ†’ type categories โ†’ Indexes โ†’ query patterns, increasing in complexity. Each type-category page is the complete reference for its types (variants, payload shape, operators/functions, example queries on one page). - index: trimmed to the Install page - core-concepts (new): the canonical home for shared mechanics โ€” variant model, payload anatomy (v/i/c envelope + hm/ob/bf terms, absorbs payload-format/CipherCell), typed-operand rule, fail-loud blockers, ORE-equality on _ord, term-leakage pointer - numbers-and-dates, text, booleans (new) + json (reworked): category pages; text owns the no-LIKE treatment; json absorbs the sv payload shape; booleans framed as "every type has a storage-only variant โ€” for bool it's the only one" - filtering, sorting, grouping-and-aggregates, joins (new): cross-type query patterns; joins headlines the same-keyset constraint - deleted: types.mdx, operators.mdx, functions.mdx, payload-format.mdx (content redistributed; URLs never shipped publicly, no redirect debt) - Anti-drift rule recorded in IA.md: mechanics live ONLY in core-concepts; category/query pages link, never restate - meta.json: flat URLs with ---Types---/---Indexes---/---Queries--- sidebar separators; legacy redirect map retargeted (queries โ†’ filtering, cipher-cell โ†’ core-concepts) Claude-Session: https://claude.ai/code/session_01ACPpFPHvKtrV48nbEYuv7P --- IA.md | 22 ++- content/docs/reference/eql/booleans.mdx | 62 +++++++ content/docs/reference/eql/core-concepts.mdx | 142 ++++++++++++++++ content/docs/reference/eql/filtering.mdx | 124 ++++++++++++++ content/docs/reference/eql/functions.mdx | 112 ------------- .../reference/eql/grouping-and-aggregates.mdx | 104 ++++++++++++ content/docs/reference/eql/index.mdx | 54 +++--- content/docs/reference/eql/indexes.mdx | 4 +- content/docs/reference/eql/joins.mdx | 112 +++++++++++++ content/docs/reference/eql/json.mdx | 60 +++++-- content/docs/reference/eql/meta.json | 17 +- .../docs/reference/eql/numbers-and-dates.mdx | 140 ++++++++++++++++ content/docs/reference/eql/operators.mdx | 153 ----------------- content/docs/reference/eql/payload-format.mdx | 123 -------------- content/docs/reference/eql/sorting.mdx | 96 +++++++++++ content/docs/reference/eql/text.mdx | 157 ++++++++++++++++++ content/docs/reference/eql/types.mdx | 95 ----------- v2-redirects.mjs | 4 +- 18 files changed, 1047 insertions(+), 534 deletions(-) create mode 100644 content/docs/reference/eql/booleans.mdx create mode 100644 content/docs/reference/eql/core-concepts.mdx create mode 100644 content/docs/reference/eql/filtering.mdx delete mode 100644 content/docs/reference/eql/functions.mdx create mode 100644 content/docs/reference/eql/grouping-and-aggregates.mdx create mode 100644 content/docs/reference/eql/joins.mdx create mode 100644 content/docs/reference/eql/numbers-and-dates.mdx delete mode 100644 content/docs/reference/eql/operators.mdx delete mode 100644 content/docs/reference/eql/payload-format.mdx create mode 100644 content/docs/reference/eql/sorting.mdx create mode 100644 content/docs/reference/eql/text.mdx delete mode 100644 content/docs/reference/eql/types.mdx diff --git a/IA.md b/IA.md index b51956c..6db1dee 100644 --- a/IA.md +++ b/IA.md @@ -130,14 +130,22 @@ live at `/docs/errors/` โ€” permanent, never restructured (CIP-3338). ## Reference - [x] Section scaffold ๐Ÿšง (eql, stack, auth, cli, proxy, workspace) -- **EQL (v3 rewrite โ€” CIP-3326):** -- [x] `/reference/eql` โ€” overview + install (single SQL file, permissions split, dbdev, Docker) -- [x] `/reference/eql/types` โ€” 10 scalar families ร— variants + `eql_v3.json` -- [x] `/reference/eql/operators` โ€” per-variant matrix incl. what RAISES; typed-operand rule +- **EQL (v3 rewrite โ€” CIP-3326; Tailwind-shaped: install โ†’ core concepts โ†’ type + categories โ†’ indexes โ†’ query patterns). Anti-drift rule: shared mechanics + (typed operands, blockers, envelope, variant model, ORE-equality) live ONLY in + core-concepts โ€” category/query pages link, never restate:** +- [x] `/reference/eql` โ€” install (single SQL file, permissions split, dbdev, Docker) +- [x] `/reference/eql/core-concepts` โ€” variant model, payload anatomy (absorbs + cipher-cell), typed-operand rule, fail-loud blockers, term leakage pointer +- [x] `/reference/eql/numbers-and-dates` โ€” int*/float*/numeric/date/timestamp +- [x] `/reference/eql/text` โ€” all six text variants; owns the no-LIKE treatment +- [x] `/reference/eql/json` โ€” ste_vec + sv payload shape + containment/path queries +- [x] `/reference/eql/booleans` โ€” storage-only variants (bool has only that one) - [x] `/reference/eql/indexes` โ€” functional indexes on extractors; Supabase-compatible -- [x] `/reference/eql/json` โ€” ste_vec, path queries -- [x] `/reference/eql/functions` โ€” incl. aggregates (min/max only) -- [x] `/reference/eql/payload-format` โ€” v/i/c envelope, hm/ob/bf (absorbs cipher-cell) +- [x] `/reference/eql/filtering` โ€” =, IN, ranges, token match, containment +- [x] `/reference/eql/sorting` โ€” ORDER BY, extractor sort-key form, pagination +- [x] `/reference/eql/grouping-and-aggregates` โ€” GROUP BY/DISTINCT, min/max, no SUM/AVG +- [x] `/reference/eql/joins` โ€” equijoins, the same-keyset constraint - **Stack SDK:** - [ ] `/reference/stack` โ€” client + configuration (port encryption/* pages) - [ ] `/reference/stack/schema` diff --git a/content/docs/reference/eql/booleans.mdx b/content/docs/reference/eql/booleans.mdx new file mode 100644 index 0000000..390dc97 --- /dev/null +++ b/content/docs/reference/eql/booleans.mdx @@ -0,0 +1,62 @@ +--- +title: Booleans +description: "Encrypted booleans are storage-only by design: eql_v3.bool stores and decrypts, carries no index terms, and blocks every comparison." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Every scalar type has a storage-only variant โ€” for `bool` it's the only one. EQL ships `eql_v3.bool` and nothing else: there is no `bool_eq` and no `bool_ord`. An encrypted boolean column can be stored, decrypted, and null-checked; it cannot be filtered, sorted, grouped, or joined on. + +## Why there are no query variants + +A two-value column has too little cardinality for any searchable index to be safe. An equality term over `true` / `false` would partition the table into two visible buckets โ€” leaking the value distribution (and, with any outside knowledge, the values themselves) outright. Rather than ship an index term that can't keep its promise, EQL omits the query variants entirely. See [Searchable encryption](/concepts/searchable-encryption) for the general analysis of what index terms reveal. + +## What works, what raises + +`eql_v3.bool` follows the bare-variant contract described in [Core concepts](/reference/eql/core-concepts#variants-declare-capability): it carries no index terms, so `IS NULL` / `IS NOT NULL` are the only predicates that work. Every comparison operator routes to a blocker and raises โ€” the [fail-loud behavior](/reference/eql/core-concepts#unsupported-operations-fail-loudly) shared by all encrypted variants: + +```sql +-- โŒ Raises: operator = is not supported for eql_v3.bool +SELECT * FROM users WHERE is_active = $1::eql_v3.bool; + +-- โœ… Works: NULL columns are not encrypted +SELECT * FROM users WHERE is_active IS NOT NULL; +``` + +## Filter client-side + +Query on other columns, decrypt the boolean in your application, and filter there: + +```sql +CREATE TABLE users ( + id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_eq, -- exact lookup + created_at eql_v3.timestamp_ord, -- range queries, ORDER BY + is_active eql_v3.bool -- storage only (by design) +); +``` + +```sql +-- Narrow the result set with the columns that do carry index termsโ€ฆ +SELECT id, email, is_active FROM users +WHERE created_at >= $1::eql_v3.timestamp_ord; +-- โ€ฆthen decrypt is_active in the client and filter on the plaintext. +``` + +The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) decrypt the payload back to a plain boolean on read, so the client-side filter is an ordinary `if`. + +If a boolean genuinely needs to be a server-side predicate, that is a data-modelling signal: consider whether the flag is actually sensitive. A non-sensitive flag can stay a plain PostgreSQL `boolean` column alongside your encrypted columns. + +## Storing without querying + +`bool` is the forced case of a pattern available to every scalar type: the bare variant `eql_v3.` (for example `eql_v3.int4`, `eql_v3.text`, `eql_v3.timestamp`) is storage-and-decryption only. It carries no index terms, and every comparison operator raises โ€” use it for columns you only ever store and decrypt, so the database holds no searchable material for them at all. + +For every type other than `bool`, storage-only is a choice you can walk back. If you later need to query, retype the column as a query variant โ€” or, if the payloads already carry the needed term (the client decides which terms travel in the payload), cast at the call site: + +```sql +SELECT * FROM readings WHERE value::eql_v3.int4_ord > $1::eql_v3.int4_ord; +``` + +The variant families and what each one enables are covered in [Core concepts](/reference/eql/core-concepts); the per-type specifics live in [Numbers and dates](/reference/eql/numbers-and-dates) and [Text](/reference/eql/text). diff --git a/content/docs/reference/eql/core-concepts.mdx b/content/docs/reference/eql/core-concepts.mdx new file mode 100644 index 0000000..a262fe8 --- /dev/null +++ b/content/docs/reference/eql/core-concepts.mdx @@ -0,0 +1,142 @@ +--- +title: Core concepts +description: "The model behind every EQL page: domain variants that declare capability, the encrypted payload envelope, the typed-operand rule, and fail-loud blockers." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Everything in the EQL reference builds on four ideas: columns are typed as **domain variants** that declare what they can do, every value is a **`jsonb` payload** carrying encrypted index terms, **operands must be typed** for the encrypted operators to resolve, and anything a column can't do **fails loudly** instead of returning wrong rows. This page is the canonical home for all four โ€” the per-type and per-query pages link back here rather than restating them. + +## Variants declare capability + +EQL ships its searchable-encryption surface as PostgreSQL **domains in the `eql_v3` schema**, all backed by `jsonb`. Each scalar type generates a *family* of domain variants, and the variant you type a column as fixes its query capability. Each domain carries a `CHECK` constraint that validates the encrypted payload on insert, so a malformed or wrong-version value is rejected at write time rather than surfacing at query time. + +There is no database-side configuration table. Earlier EQL versions tracked encryption config in the database (`config_add_table`, `config_add_column`, and friends) โ€” those are gone in v3. The searchable surface of a column is fixed by the domain variant you type it as, and which index terms travel in a value's payload is decided by the encryption client (the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy)). The domain makes the matching operators resolve; the term in the payload is what makes them answer. + +For any scalar type ``, the family looks like this: + +| Domain variant | Capability | Index term carried | +| --- | --- | --- | +| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | +| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (`eql_v3.hmac_256`) | +| `eql_v3._ord` / `eql_v3._ord_ore` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and the `eql_v3.min` / `eql_v3.max` aggregates. | `ob` (`eql_v3.ore_block_256`) | +| `eql_v3.text_match` (text only) | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (`eql_v3.bloom_filter`) | +| `eql_v3.text_search` (text only) | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | + +Two things worth calling out: + +- **The bare variant blocks everything.** `eql_v3.` carries no index term. Querying it with any comparison operator raises an "operator not supported" exception. Use it for columns you only ever store and decrypt โ€” [Booleans](/reference/eql/booleans) covers this pattern in full. +- **`_ord` and `_ord_ore` are twins.** They are byte-identical surfaces backed by the same ORE block term. Pick the name that documents intent โ€” "ordered" versus "ordered via ORE block". Both support the full ordered surface and `MIN` / `MAX`. + +Declaring a table is just typing each column as the variant it needs: + +```sql +CREATE TABLE users ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_eq, -- equality only + salary eql_v3.int4_ord, -- equality + range + ORDER BY + created_at eql_v3.timestamp_ord +); +``` + +Every scalar type โ€” `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool` in EQL 3.0.0 โ€” ships some subset of this family. The per-category pages list exactly which variants each type has and how to choose between them: [Numbers and dates](/reference/eql/numbers-and-dates), [Text](/reference/eql/text), and [Booleans](/reference/eql/booleans). Encrypted JSON documents use a separate domain, `eql_v3.json`, with its own operator surface โ€” see [JSON](/reference/eql/json). + +## Anatomy of an encrypted value + +Every EQL encrypted value is a `jsonb` payload with a shared envelope plus the index terms that make it queryable. Earlier CipherStash docs called this format the **CipherCell** โ€” this section is the current definition of the same structure. + +Payloads are **produced** by the encryption clients โ€” the [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) โ€” and **consumed** by EQL's operators and functions inside Postgres. EQL never sees plaintext: it validates, stores, and compares these payloads; it cannot produce or decrypt them. The division is strict: the clients never rely on the database for key material. + +### The envelope + +Every payload carries three envelope keys. Each `eql_v3` domain's `CHECK` constraint requires them, so a value missing any of these is rejected at write time: + +| Key | Contents | Notes | +| --- | --- | --- | +| `v` | Payload version | Always exactly `2` on the wire. The domain `CHECK`s assert it and raise on any other value. | +| `i` | Ident: `{"t": "
", "c": ""}` | Binds the ciphertext to the table and column it was encrypted for. Both keys required. | +| `c` | Ciphertext | The opaque, non-deterministic encrypted blob (mp_base85-encoded). Never used in comparisons. | + + +`eql_v3` names the **SQL schema generation**, not the payload version. The JSON envelope version is still `v: 2` โ€” the wire field names are unchanged from EQL v2, and the domain `CHECK`s assert `v = 2`. + + +A `k` discriminator (`"ct"` for a scalar ciphertext, `"sv"` for a JSON document) also appears on payloads emitted by the clients, distinguishing the two top-level shapes. + +### Index-term keys + +Alongside the envelope, a payload carries the index terms for its column's capability. Each key is backed by a SEM (searchable encrypted metadata) type in the `eql_v3` schema: + +| Key | SEM type | Wire shape | Enables | Reveals | +| --- | --- | --- | --- | --- | +| `hm` | `eql_v3.hmac_256` (domain over `text`) | Hex string (HMAC-SHA-256) | `=`, `<>` on `_eq` and `text_search` domains | Whether two values are equal โ€” nothing else | +| `ob` | `eql_v3.ore_block_256` (composite: array of `bytea` block terms) | Array of hex-encoded ORE blocks (block count varies by scalar width) | `<`, `<=`, `>`, `>=`, `ORDER BY` on `_ord` / `_ord_ore` domains โ€” and `=` / `<>`, since ORE comparison collapses to equality | The relative order of two values | +| `bf` | `eql_v3.bloom_filter` (domain over `smallint[]`) | Array of set bit positions (**signed** 16-bit โ€” large filters emit negative positions) | `@>` / `<@` token containment on `_match` domains | Probabilistic token overlap between values | + +The capability is encoded as **required keys**: the payload for an `eql_v3.text_eq` column must carry `hm`; an `eql_v3.int4_ord` payload must carry `ob` (and only `ob`); a `text_match` payload must carry `bf`; a `text_search` payload carries all three. A payload missing its term key fails the domain `CHECK` โ€” and fails to deserialize in the client bindings. + +A scalar payload for an `eql_v3.text_search` column (lookup + ordering + free-text match, so all three terms are required): + +```json +{ + "v": 2, + "i": { "t": "users", "c": "email" }, + "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", + "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", + "ob": ["7a1fd0c2...", "d24c9be1...", "03fa66b8..."], + "bf": [42, 1290, -8113, 30201] +} +``` + +- `v`, `i`, `c` โ€” the envelope +- `hm` โ€” equality term: `WHERE email = $1` compares this +- `ob` โ€” ordering term: `ORDER BY` and range comparisons walk these blocks +- `bf` โ€” bloom-filter term: `@>` token containment tests these bit positions + +Encrypted JSON documents use a different payload shape โ€” an `sv` array with one encrypted entry per path in the document instead of a root ciphertext โ€” defined in [JSON](/reference/eql/json). + +### Machine-readable schemas + +The [EQL repository](https://github.com/cipherstash/encrypt-query-language) publishes the format as JSON Schema in two places: + +- **`crates/eql-bindings/schema/`** โ€” one schema per scalar domain (`$id`s under `https://schemas.cipherstash.com/eql/v3/`), generated from the canonical Rust wire types in the `eql-bindings` crate. TypeScript bindings are generated from the same definitions, so every producer and consumer shares one source of truth. +- **`docs/reference/schema/`** โ€” full-payload schemas covering both the scalar and `sv` document shapes. These files are currently named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`) and reference `eql_v2` function names, even though the current SQL surface is `eql_v3` โ€” the v2.3 schema is the applicable document-shape definition, matching the still-`v: 2` envelope. + +## The typed-operand rule + +The `eql_v3` domains are backed by `jsonb`. When an operand has no known type โ€” a bare string literal, an untyped parameter โ€” PostgreSQL reduces the domain to its `jsonb` base type and resolves the **native `jsonb` operator** instead of the encrypted one. The query doesn't fail; it silently returns native `jsonb` semantics, which are meaningless for encrypted payloads. + +```sql +-- โŒ Wrong: untyped parameter. PostgreSQL falls back to the native jsonb `=`, +-- which compares raw payloads โ€” syntactically valid, semantically meaningless. +SELECT * FROM users WHERE email = $1; + +-- โœ… Right: typed operand โ€” the encrypted `=` resolves. +SELECT * FROM users WHERE email = $1::eql_v3.text_eq; +``` + +Always type the operand: a typed parameter (`$1::eql_v3.text_eq`) or an explicit cast (`'โ€ฆ'::eql_v3.int4_ord`). The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) type bound parameters automatically โ€” raw SQL must do it by hand. + +This is the one place where a mistake is *silent*. Everything else fails loudly: + +## Unsupported operations fail loudly + +Unsupported operators are not silent no-ops. Every operator that a variant doesn't support is still *defined* โ€” it routes to a blocker function that raises an `operator โ€ฆ is not supported` exception. A mis-typed query fails loudly instead of silently returning wrong results: + +```sql +-- salary is eql_v3.int8_eq (equality only) +SELECT * FROM users WHERE salary > $1::eql_v3.int8_eq; +-- ERROR: operator > is not supported for eql_v3.int8_eq +``` + +A `NULL` operand still raises โ€” the blockers are deliberately not `STRICT`, so PostgreSQL can't skip the check. (A SQL `NULL` column value is not encrypted, so `IS NULL` / `IS NOT NULL` themselves always work, on every variant.) + +`LIKE` and `ILIKE` are blocked on **every** encrypted variant โ€” pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment instead; [Text](/reference/eql/text) covers it. + +One equality subtlety follows from the term table above: on `_ord` / `_ord_ore` columns, `=` and `<>` compare the **ORE (`ob`) term** โ€” ORE comparison collapses to equality โ€” so `_ord` payloads carry no `hm` term at all. On `_eq` and `text_search` columns, equality compares the HMAC (`hm`) term. + +## What the terms reveal + +Every index term a value carries is extra material stored in the database, and each term class reveals defined structure to an observer who can read the stored payloads: equality terms reveal *value repetition* (which rows share a value), ORE terms reveal *ordering* (which of two values is larger), and bloom terms reveal *probabilistic token overlap*. None of them reveal the plaintext โ€” but you should only carry the terms you actually query on. The full analysis of what each term does and doesn't leak is in [Searchable encryption](/concepts/searchable-encryption). diff --git a/content/docs/reference/eql/filtering.mdx b/content/docs/reference/eql/filtering.mdx new file mode 100644 index 0000000..6f1b779 --- /dev/null +++ b/content/docs/reference/eql/filtering.mdx @@ -0,0 +1,124 @@ +--- +title: Filtering +description: "WHERE-clause patterns on encrypted columns: equality, IN lists, ranges and BETWEEN, text token matching, JSON containment, and combining encrypted and plaintext predicates." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Every filter below is ordinary SQL โ€” the encrypted operators resolve from the column's domain variant, and a functional index on the matching term extractor serves the predicate. One rule applies throughout: **operands must be typed** (`$1::eql_v3.text_eq`, not a bare literal), or PostgreSQL falls through to native `jsonb` semantics. See [Core concepts](/reference/eql/core-concepts) for the typed-operand rule and how unsupported operators fail loudly instead of returning wrong rows. + +## Equality: `=` and `<>` + +Works on `_eq` and `_ord` / `_ord_ore` variants of every scalar, and on `text_search`: + +```sql +SELECT * FROM users WHERE email = $1::eql_v3.text_eq; +SELECT * FROM users WHERE tax_id <> $1::eql_v3.text_eq; +``` + +On `_eq` and `text_search` columns equality compares the HMAC (`hm`) term. On `_ord` variants there is no `hm` โ€” equality compares the ORE (`ob`) term, which collapses to equality, so `_ord` columns get `=` and `<>` for free. See [Core concepts](/reference/eql/core-concepts) for the mechanism. + +```sql +-- salary is eql_v3.int8_ord: equality works without an hm term +SELECT * FROM users WHERE salary = $1::eql_v3.int8_ord; +``` + +Bare storage-only variants (`eql_v3.text`, `eql_v3.int4`, โ€ฆ) block every comparison โ€” see the type pages for what each variant supports: [Numbers & dates](/reference/eql/numbers-and-dates), [Text](/reference/eql/text), [Booleans](/reference/eql/booleans). + +## `IN` lists + +`IN` desugars to `=`, so it needs the same equality-capable variants. Each list element is a separately encrypted, typed operand: + +```sql +SELECT * FROM users +WHERE email IN ($1::eql_v3.text_eq, $2::eql_v3.text_eq, $3::eql_v3.text_eq); +``` + +There is no way to encrypt a list as one value โ€” the client encrypts each element and binds it as its own parameter. `IN (subquery)` also works, subject to the same-keyset rule covered in [Joins](/reference/eql/joins). + +## Ranges and `BETWEEN` + +`<`, `<=`, `>`, `>=` work on `_ord` / `_ord_ore` variants and `text_search` โ€” the variants carrying an ORE (`ob`) term: + +```sql +SELECT * FROM users WHERE salary >= $1::eql_v3.int8_ord; + +-- BETWEEN desugars to >= and <= +SELECT * FROM users +WHERE created_at BETWEEN $1::eql_v3.timestamp_ord AND $2::eql_v3.timestamp_ord; +``` + +Half-open ranges compose the same way: + +```sql +SELECT * FROM events +WHERE occurred_at >= $1::eql_v3.timestamp_ord + AND occurred_at < $2::eql_v3.timestamp_ord; +``` + +## Text token matching: `@>` + +There is no `LIKE` on encrypted columns โ€” encrypted free-text matching is bloom-filter token containment via `@>` on a `text_match` or `text_search` column: + +```sql +SELECT * FROM users WHERE name @> $1::eql_v3.text_match; +``` + +The client encrypts the search term into a bloom-filter query value; matching is probabilistic (false positives possible, false negatives not). For the full no-`LIKE` story and match-term tuning, see [Text](/reference/eql/text). + +## JSON containment and path filters + +Encrypted JSON documents (`eql_v3.json`) filter by containment and path existence: + +```sql +-- Does the document contain this (encrypted) structure? +SELECT * FROM orders WHERE metadata @> $1::eql_v3.ste_vec_query; + +-- Does this path exist in the document? +SELECT * FROM orders WHERE eql_v3.jsonb_path_exists(metadata, 'region_selector'); + +-- Equality on an extracted leaf +SELECT * FROM orders +WHERE metadata -> 'email_selector'::text = $1::eql_v3.ste_vec_entry; +``` + +Field access is by selector hash, not plaintext path. The full JSON surface โ€” containment, field access, path queries, and range filters on extracted leaves โ€” is in [JSON](/reference/eql/json). + +## Combining predicates + +Encrypted predicates compose with `AND`, `OR`, `NOT`, and parentheses like any other predicate โ€” and plaintext columns filter normally alongside encrypted ones in the same `WHERE` clause: + +```sql +SELECT * FROM users +WHERE status = 'active' -- plaintext column, native operator + AND created_at >= $1::eql_v3.timestamp_ord -- encrypted range + AND (email = $2::eql_v3.text_eq -- encrypted equality + OR name @> $3::eql_v3.text_match); -- encrypted token match +``` + +The planner treats each encrypted predicate independently, so it can combine an index on a plaintext column with a functional index on an encrypted one (bitmap-AND, or whichever plan is cheapest). + +## `IS NULL` and `IS NOT NULL` + +A SQL `NULL` column value is never encrypted โ€” there is no payload to encrypt โ€” so null checks work on **every** variant, including storage-only ones: + +```sql +SELECT * FROM users WHERE tax_id IS NULL; +SELECT * FROM users WHERE tax_id IS NOT NULL; +``` + +Don't confuse this with a JSON `null` *inside* an encrypted document, which is an encrypted value like any other โ€” see [JSON](/reference/eql/json). + +## Shape summary + +| Filter shape | Operators | Works on | Index | +| --- | --- | --- | --- | +| Equality | `=` `<>` `IN` | `_eq`, `_ord` / `_ord_ore`, `text_search` | hash (or btree) on `eql_v3.eq_term` โ€” btree on `eql_v3.ord_term` for `_ord` | +| Range | `<` `<=` `>` `>=` `BETWEEN` | `_ord` / `_ord_ore`, `text_search` | btree on `eql_v3.ord_term` | +| Text token match | `@>` `<@` | `text_match`, `text_search` | GIN on `eql_v3.match_term` | +| JSON containment | `@>` `<@` | `eql_v3.json` | GIN on `eql_v3.to_ste_vec_query(col)::jsonb` | +| Null check | `IS NULL` / `IS NOT NULL` | every variant | โ€” | + +Every one of these has a full index recipe โ€” which method, which extractor, and how to confirm the index engages with `EXPLAIN` โ€” in [Indexes](/reference/eql/indexes). diff --git a/content/docs/reference/eql/functions.mdx b/content/docs/reference/eql/functions.mdx deleted file mode 100644 index 210ca31..0000000 --- a/content/docs/reference/eql/functions.mdx +++ /dev/null @@ -1,112 +0,0 @@ ---- -title: Functions -description: "The eql_v3 function surface: comparison functions, index-term extractors, MIN/MAX aggregates, JSON functions, and version reporting." -type: reference -components: [eql] -verifiedAgainst: - eql: "3.0.0" ---- - -Everything EQL exposes lives in the `eql_v3` schema. Most functions are generated per [domain variant](/reference/eql/types), so PostgreSQL's overload resolution picks the right implementation from the argument type. As with operators, arguments must be typed โ€” see [the typed-operand rule](/reference/eql/operators). - -## Comparison functions - -Function forms of the comparison operators, for platforms that disallow custom operators. Each is generated per capable domain variant, with overloads accepting the domain on either side and `jsonb` on the other: - -```sql -eql_v3.eq(a, b) RETURNS boolean -- = on _eq / _ord / _ord_ore / text_search -eql_v3.neq(a, b) RETURNS boolean -- <> -eql_v3.lt(a, b) RETURNS boolean -- < on _ord / _ord_ore / text_search -eql_v3.lte(a, b) RETURNS boolean -- <= -eql_v3.gt(a, b) RETURNS boolean -- > -eql_v3.gte(a, b) RETURNS boolean -- >= -eql_v3.contains(a, b) RETURNS boolean -- @> on text_match / text_search / eql_v3.json -eql_v3.contained_by(a, b) RETURNS boolean -- <@ -``` - -```sql -SELECT * FROM users WHERE eql_v3.eq(email, $1::eql_v3.text_eq); -SELECT * FROM users WHERE eql_v3.lt(created_at, $1::eql_v3.timestamp_ord); -``` - -Calling a comparison function a variant doesn't support resolves to a blocker that raises `operator โ€ฆ is not supported` โ€” the same [fail-loud behavior](/reference/eql/operators) as the operators. There are no `like` / `ilike` functions: text matching is `eql_v3.contains` on a `text_match` value. - -## Index-term extractors - -These extract the encrypted index term from a domain value. They're generated per eq-, ord-, and match-capable variant of every scalar type, and they return the self-contained `eql_v3` index-term types: - -```sql --- Equality term (hm) -eql_v3.eq_term(a eql_v3._eq) RETURNS eql_v3.hmac_256 - --- Ordering term (ob) -eql_v3.ord_term(a eql_v3._ord) RETURNS eql_v3.ore_block_256 -eql_v3.ord_term(a eql_v3._ord_ore) RETURNS eql_v3.ore_block_256 - --- Text-match term (bf) -eql_v3.match_term(a eql_v3.text_match) RETURNS eql_v3.bloom_filter -``` - -`eql_v3.text_search` carries all three terms, so all three extractors work on it. - -The extractors exist for **indexing**: EQL indexes through a functional index on the extractor, never an operator class on the column. The extractors are inlinable, so bare-form predicates (`WHERE email = $1`) engage the index without rewriting. Sort keys are the exception โ€” see [Range and ORDER BY](/reference/eql/indexes#range-and-order-by): - -```sql -CREATE INDEX users_email_eq ON users USING hash (eql_v3.eq_term(email)); -CREATE INDEX users_salary_ord ON users USING btree (eql_v3.ord_term(salary)); -CREATE INDEX users_name_match ON users USING gin (eql_v3.match_term(name)); -``` - -See [Indexes](/reference/eql/indexes) for the full recipes and performance guidance. - -## Aggregates: `eql_v3.min` and `eql_v3.max` - -`MIN` / `MAX` over encrypted values, defined per ord-capable variant of every scalar type. The input type selects the aggregate; the return type matches the input: - -```sql -eql_v3.min(eql_v3._ord) RETURNS eql_v3._ord -eql_v3.max(eql_v3._ord) RETURNS eql_v3._ord -eql_v3.min(eql_v3._ord_ore) RETURNS eql_v3._ord_ore -eql_v3.max(eql_v3._ord_ore) RETURNS eql_v3._ord_ore -``` - -Comparison routes through the variant's `<` / `>` operator, which uses the ORE block term โ€” no decryption happens in the database. `NULL` inputs are skipped, and an all-`NULL` input set returns `NULL`. - -```sql -SELECT eql_v3.min(salary) FROM users; -SELECT eql_v3.max(salary) FROM users WHERE department = 'engineering'; - --- On a generic jsonb column, cast to the right domain at the call site -SELECT eql_v3.min(salary_jsonb::eql_v3.int8_ord) FROM users; -``` - - -**`SUM`, `AVG`, and other arithmetic aggregates are not supported** on encrypted columns โ€” they would require homomorphic encryption. `MIN` / `MAX` work because they only need comparison. For sums and averages, decrypt at the application boundary and aggregate client-side. - - -## JSON functions - -The encrypted-JSON document type `eql_v3.json` has its own function surface: - -- `eql_v3.jsonb_path_query(doc, selector)` โ€” set-returning path query yielding encrypted entries; also `jsonb_path_query_first` and `jsonb_path_exists` -- `eql_v3.jsonb_array_length` / `jsonb_array_elements` / `jsonb_array_elements_text` โ€” array helpers -- `eql_v3.to_ste_vec_query(doc)` โ€” builds the GIN-indexable containment query form -- Entry-level term extractors: `eql_v3.eq_term(eql_v3.ste_vec_entry)` and `eql_v3.ore_cllw(eql_v3.ste_vec_entry)` - -These are documented with worked examples in [JSON support](/reference/eql/json). - -## `eql_v3.version()` - -Returns the installed EQL version string, baked in at build time: - -```sql -SELECT eql_v3.version(); --- '3.0.0' -``` - -The same version string is mirrored as a comment on the `eql_v3` schema, so you can read it without calling a function: - -```sql -SELECT obj_description('eql_v3'::regnamespace); --- '3.0.0' -``` diff --git a/content/docs/reference/eql/grouping-and-aggregates.mdx b/content/docs/reference/eql/grouping-and-aggregates.mdx new file mode 100644 index 0000000..544a91c --- /dev/null +++ b/content/docs/reference/eql/grouping-and-aggregates.mdx @@ -0,0 +1,104 @@ +--- +title: Grouping & aggregates +description: "GROUP BY, DISTINCT, COUNT, and eql_v3.min/max on encrypted columns โ€” why to group on the extractor, and why SUM and AVG stay client-side." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Grouping and deduplication need an equality term, so they work on the same variants as `=`: `_eq`, `_ord` / `_ord_ore`, and `text_search`. `MIN` / `MAX` need an ordering term (`_ord` / `_ord_ore`, `text_search`). Arithmetic aggregates don't work at all โ€” that's the last section. As everywhere, operands and call-site casts must be typed; see [Core concepts](/reference/eql/core-concepts). + +## `GROUP BY` and `DISTINCT` + +Both work in natural form on equality-capable variants: + +```sql +SELECT email, COUNT(*) FROM logins GROUP BY email; +SELECT DISTINCT email FROM logins; +``` + +Grouping compares equality terms, so rows encrypting the same plaintext land in the same group โ€” but the group key that comes back is ciphertext. Decrypt it in the client if you need to display it. + +## Group on the extractor + +For anything beyond small tables, group on the equality-term extractor instead of the raw column: + +```sql +SELECT eql_v3.eq_term(email) AS email_term, COUNT(*) + FROM logins + GROUP BY eql_v3.eq_term(email); +``` + +The reason is planner economics. `GROUP BY email` uses the entire encrypted payload โ€” 1โ€“2 KB per row โ€” as the hash key. Postgres estimates a hash table far larger than the default `work_mem` and falls back to a disk-spilling `GroupAggregate`. The extractor key is a small deterministic term: the hash table fits in `work_mem` and the planner picks `HashAggregate` reliably. If an ORM forces the raw-column form, raising `work_mem` is the rescue knob โ€” but the extractor form is the design. The same reasoning, from the index-tuning angle, is in [Indexes](/reference/eql/indexes). + +Note the trade-off: grouping on `eq_term` returns the *term*, not the encrypted value โ€” fine for counting, but the term itself can't be decrypted. If you need the group key's plaintext, join the grouped result back to the table on the term to recover a representative encrypted value, then decrypt that in the client. + +## `COUNT` and `COUNT(DISTINCT)` + +Plain `COUNT(col)` counts non-`NULL` rows โ€” it never compares values, so it works on **any** variant, including storage-only ones: + +```sql +SELECT COUNT(tax_id) FROM users; -- works even on bare eql_v3.text +``` + +`COUNT(DISTINCT col)` deduplicates, so it needs an equality-capable variant โ€” and the same extractor advice applies: + +```sql +SELECT COUNT(DISTINCT eql_v3.eq_term(email)) FROM logins; +``` + +## `MIN` and `MAX`: `eql_v3.min` / `eql_v3.max` + +EQL ships `min` / `max` aggregates per ord-capable variant of every scalar type. The input type selects the aggregate, and the return type matches the input: + +```sql +eql_v3.min(eql_v3._ord) RETURNS eql_v3._ord +eql_v3.max(eql_v3._ord) RETURNS eql_v3._ord +eql_v3.min(eql_v3._ord_ore) RETURNS eql_v3._ord_ore +eql_v3.max(eql_v3._ord_ore) RETURNS eql_v3._ord_ore +``` + +Comparison routes through the variant's `<` / `>` operator on the ORE term โ€” no decryption happens in the database, and the result is an encrypted value the client decrypts. `NULL` inputs are skipped; an all-`NULL` input set returns `NULL`, matching native aggregate semantics. + +```sql +SELECT eql_v3.min(salary) FROM users; +SELECT eql_v3.max(salary) FROM users WHERE department = 'engineering'; + +-- Combined with grouping +SELECT eql_v3.eq_term(department_code) AS dept, eql_v3.max(salary) + FROM users + GROUP BY eql_v3.eq_term(department_code); +``` + +If the column is generic `jsonb` rather than a domain, cast to the right variant at the call site so overload resolution can pick the aggregate: + +```sql +SELECT eql_v3.min(salary_jsonb::eql_v3.int8_ord) FROM users; +``` + +A btree on `eql_v3.ord_term(col)` serves `MIN` / `MAX` โ€” the [Indexes](/reference/eql/indexes) page has the recipe. + +## No `SUM`, no `AVG` + + +**`SUM`, `AVG`, and every other arithmetic aggregate are unsupported** on encrypted columns โ€” they would require homomorphic encryption, which EQL does not do. `MIN` / `MAX` work because they only need *comparison*, which the ORE term provides. For sums and averages, select the rows (or `MIN`/`MAX`/`COUNT` server-side to narrow them) and aggregate client-side after decryption. + + +## Grouping on extracted JSON leaves + +Leaves inside an encrypted JSON document group the same way โ€” extract the entry by selector, then group on its equality term: + +```sql +SELECT eql_v3.eq_term(metadata -> 'region_selector'::text) AS region, COUNT(*) + FROM orders + GROUP BY eql_v3.eq_term(metadata -> 'region_selector'::text); +``` + +`eql_v3.eq_term` reads whichever term the entry carries, so this works on every JSON node type. String and Number leaves also support `eql_v3.min` / `eql_v3.max` via their CLLW ORE term. Selectors and node capabilities are in [JSON](/reference/eql/json). + +## Where to go next + +- [Indexes](/reference/eql/indexes) โ€” the hash/btree recipes that back these shapes, and the full `work_mem` / `HashAggregate` story. +- [Joins](/reference/eql/joins) โ€” equality terms across tables, and the same-keyset rule. +- [Filtering](/reference/eql/filtering) โ€” the `WHERE` shapes that feed these aggregates. diff --git a/content/docs/reference/eql/index.mdx b/content/docs/reference/eql/index.mdx index 68b95c2..e3d7f0f 100644 --- a/content/docs/reference/eql/index.mdx +++ b/content/docs/reference/eql/index.mdx @@ -11,20 +11,7 @@ Encrypt Query Language (EQL) is a set of types, operators, and functions for sto EQL itself never encrypts anything. Encryption and decryption happen in the client, using the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy). EQL provides the database-side surface those clients query against: encrypted column types, the operators that compare them, and the term-extractor functions that make indexes work. -## The v3 model - -Every encrypted column is a `jsonb`-backed **domain type** in the `eql_v3` schema. The domain variant you choose declares the column's searchable capability: `eql_v3.text_eq` supports equality (`=` / `<>`), `eql_v3.text_match` supports encrypted text containment (`@>` / `<@`), `eql_v3.int4_ord` adds range comparisons, `ORDER BY`, and `MIN` / `MAX`. Each domain carries a `CHECK` constraint that validates the encrypted payload on insert, so a malformed or wrong-version value is rejected at write time rather than surfacing at query time. - -There is no database-side configuration table. Earlier EQL versions tracked encryption config in the database (`config_add_table`, `config_add_column`, and friends) โ€” those are gone in v3. The searchable surface of a column is fixed by the domain variant you type it as, and which index terms travel in a value's payload is decided by the encryption client. Operators that a variant doesn't support raise an "operator not supported" error rather than silently falling through to native `jsonb` semantics โ€” and `LIKE` / `ILIKE` are blocked on every encrypted column. - -```sql -CREATE TABLE users ( - id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, - email eql_v3.text_eq, -- equality only - salary eql_v3.int4_ord, -- equality + range + ORDER BY - created_at eql_v3.timestamp_ord -); -``` +Every encrypted column is a `jsonb`-backed domain type in the `eql_v3` schema, and the domain variant you choose declares what the column can do โ€” the full model is in [Core concepts](/reference/eql/core-concepts). ## Install @@ -113,25 +100,42 @@ Schema changes โ€” adding or removing encrypted columns โ€” always go through th EQL v3 is designed to install without superuser. There are no custom operator classes (which managed platforms typically block), no `postgresql.conf` changes, and no separate Supabase build โ€” the single install script is the same artefact everywhere. Indexing works through ordinary functional indexes over EQL's term-extractor functions, which any user who can `CREATE INDEX` can build. See the [Supabase integration](/integrations/supabase) for platform-specific setup. -## In this section +## Understand - - The encrypted domain type families and the capability each variant carries. + + Domain variants, the encrypted payload, typed operands, and fail-loud blockers โ€” the model every other page assumes. - - Which SQL operators resolve on which variant, and what raises. + + Encrypted integers, floats, numerics, dates, and timestamps. - - Functional-index recipes for equality, range, and text match. + + Encrypted text: equality, ordering, and free-text token matching โ€” and why there is no `LIKE`. Encrypted JSON documents: containment, field access, and GIN indexing. - - The function equivalents of every operator, extractors, and aggregates. + + Storage-only by design: why encrypted booleans carry no index terms. + + + Functional-index recipes over the term extractors, and what it takes for an index to engage. + + + +## Use + + + + `WHERE` clauses on encrypted columns: equality, ranges, and text containment. + + + `ORDER BY` on encrypted columns, and how to keep the sort in the index. + + + `GROUP BY`, `DISTINCT`, `COUNT`, and the `MIN` / `MAX` aggregates. - - The encrypted payload envelope and index terms. + + Equijoins on encrypted columns and the same-keyset rule. diff --git a/content/docs/reference/eql/indexes.mdx b/content/docs/reference/eql/indexes.mdx index 83f354e..3d1df3e 100644 --- a/content/docs/reference/eql/indexes.mdx +++ b/content/docs/reference/eql/indexes.mdx @@ -29,7 +29,7 @@ EQL v3 deliberately ships no operator class for encrypted columns. Operators res ## Index recipes -Type the column as the domain variant that carries the term ([Types](/reference/eql/types)), then index the matching extractor: +Type the column as the domain variant that carries the term (see [Core concepts](/reference/eql/core-concepts) for the variant model, and the per-type pages for specifics), then index the matching extractor: ```sql -- Equality: hash index on eq_term @@ -65,8 +65,8 @@ All three must hold: ```sql -- โœ“ resolves the encrypted operator โ†’ uses the index -WHERE email = $1; WHERE email = $1::eql_v3.text_eq; +WHERE email = $1; -- only when the client (Stack SDK / Proxy) binds $1 typed -- โœ— falls through to native jsonb semantics WHERE email = '{"hm":"abc"}'::jsonb; diff --git a/content/docs/reference/eql/joins.mdx b/content/docs/reference/eql/joins.mdx new file mode 100644 index 0000000..fdcc0e4 --- /dev/null +++ b/content/docs/reference/eql/joins.mdx @@ -0,0 +1,112 @@ +--- +title: Joins +description: "Equijoins on encrypted columns: the same-keyset and matching-variant constraint, IN (subquery) and set operations, a worked example, and how to diagnose a join that returns nothing." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Equijoins work on equality-capable variants (`_eq`, `_ord` / `_ord_ore`, `text_search`) โ€” the join condition is just encrypted equality. But there is one constraint that has no plaintext equivalent, and it is the single thing to internalize on this page: + + +**Both sides of the join must be encrypted with the same keyset and typed as a matching variant.** Encrypted equality compares deterministic index terms, and those terms are derived from the encryption keys. Two columns encrypted under different keysets produce different terms for the *same plaintext* โ€” their terms can **never** match, and the join returns no rows. This is not an error the database can detect: the query is valid, the plan is fine, the result is simply empty. + + +"Matching variant" means both sides compare the same term kind: `_eq` with `_eq` (or `text_search`, which carries an `hm` term too) compares HMAC terms; `_ord` with `_ord` compares ORE terms. An `_eq` column can't join an `_ord` column โ€” one side has no `hm`, the other no `ob`, and the equality operator between mismatched variants doesn't resolve. See [Core concepts](/reference/eql/core-concepts) for the term model. + +## Equijoin + +```sql +SELECT u.*, o.total +FROM users u +JOIN orders o ON u.email = o.customer_email; +-- both columns eql_v3.text_eq, encrypted with the same keyset +``` + +No typed-operand cast is needed here โ€” both operands are encrypted columns, so their domain types resolve the encrypted operator directly. All join types (`INNER`, `LEFT`, `RIGHT`, `FULL`) work; `LEFT JOIN` null-extension behaves normally because SQL `NULL`s are not encrypted. + +Index both sides for anything beyond small tables โ€” a hash (or btree) index on `eql_v3.eq_term(col)` on each column. Recipes are in [Indexes](/reference/eql/indexes). + +## `IN (subquery)` and set operations + +Both follow the same rule, because both compare equality terms across two column sources: + +```sql +-- IN (subquery): users.email and orders.customer_email must share a keyset +SELECT * FROM users +WHERE email IN (SELECT customer_email FROM orders WHERE flagged); + +-- Set-operation dedup: UNION / INTERSECT / EXCEPT dedupe by equality term +SELECT email FROM users +UNION +SELECT customer_email FROM orders; +``` + +If the two columns are under different keysets, `IN (subquery)` matches nothing, `INTERSECT` is empty, `EXCEPT` returns everything, and `UNION` never merges duplicates โ€” all silently. + +## Worked example + +Two tables sharing an encrypted customer identifier, both columns typed `eql_v3.text_eq` and encrypted by the same client configuration (same keyset): + +```sql +CREATE TABLE users ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_eq +); + +CREATE TABLE orders ( + id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + customer_email eql_v3.text_eq, + total BIGINT NOT NULL +); + +CREATE INDEX users_email_eq ON users USING hash (eql_v3.eq_term(email)); +CREATE INDEX orders_cust_eq ON orders USING hash (eql_v3.eq_term(customer_email)); +ANALYZE users; ANALYZE orders; +``` + +Orders per user, filtered by an encrypted lookup on one side: + +```sql +SELECT u.id, COUNT(o.id) AS order_count +FROM users u +LEFT JOIN orders o ON u.email = o.customer_email +WHERE u.email = $1::eql_v3.text_eq +GROUP BY u.id; +``` + +The `WHERE` engages the hash index on `users`; the join condition engages the one on `orders`. The grouping key here is a plaintext `id`, so no extractor is needed โ€” grouping on encrypted columns is covered in [Grouping & aggregates](/reference/eql/grouping-and-aggregates). + +## Anti-pattern: joining across keysets + +The failure mode is quiet. A join across keysets doesn't raise, doesn't warn, and produces a plan that looks healthy โ€” the terms just never match, so it behaves exactly like a join where no rows happen to correlate: + +```sql +-- users encrypted by service A's keyset, partners by service B's: +SELECT * FROM users u JOIN partners p ON u.email = p.contact_email; +-- 0 rows. Always. Even when the plaintext emails overlap. +``` + +To diagnose a join that returns fewer rows than expected (or none): + +1. **Check the variants.** Both columns must be equality-capable and compare the same term kind. A blocked operator raises loudly, so if the query *runs*, the variants at least resolve โ€” but confirm they compare the same term (`hm` vs `ob`). +2. **Compare terms for a known-matching pair.** Take one row from each table that you know holds the same plaintext and compare their equality terms: + + ```sql + SELECT eql_v3.eq_term(u.email) = eql_v3.eq_term(p.contact_email) AS terms_match + FROM users u, partners p + WHERE u.id = 42 AND p.id = 7; -- rows known to share a plaintext value + ``` + + `false` for plaintext-identical values means the terms were derived under different keysets (or different client configurations) โ€” no SQL will make them join. +3. **Fix it at the encryption layer.** Configure both columns under the same keyset in the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy) and re-encrypt one side. Cross-keyset correlation otherwise has to happen in the client, after decryption. + +Treat shared keysets as part of your schema design: columns you intend to join are a unit, the same way a foreign key pair is. + +## Where to go next + +- [Filtering](/reference/eql/filtering) โ€” the equality and `IN` shapes joins are built from. +- [Grouping & aggregates](/reference/eql/grouping-and-aggregates) โ€” grouping joined results on encrypted keys. +- [Indexes](/reference/eql/indexes) โ€” equality index recipes for both sides of a join. +- [Core concepts](/reference/eql/core-concepts) โ€” index terms, variants, and why determinism makes joins possible at all. diff --git a/content/docs/reference/eql/json.mdx b/content/docs/reference/eql/json.mdx index 6ef2e60..4204ca8 100644 --- a/content/docs/reference/eql/json.mdx +++ b/content/docs/reference/eql/json.mdx @@ -1,6 +1,6 @@ --- -title: Encrypted JSON -description: "Store and query encrypted JSON documents with eql_v3.json โ€” containment, field access, and path queries over ciphertext, with the native jsonb operators that don't apply blocked outright." +title: JSON +description: "The complete reference for encrypted JSON documents with eql_v3.json โ€” the ste_vec payload shape, containment, field access, and path queries over ciphertext, with the native jsonb operators that don't apply blocked outright." type: reference components: [eql] verifiedAgainst: @@ -21,7 +21,47 @@ Three `jsonb`-backed domains make up the encrypted JSON surface: | `eql_v3.ste_vec_entry` | A single entry from the vector: a selector, a ciphertext, and exactly one index term. This is what `->` returns. | | `eql_v3.ste_vec_query` | A containment needle: entries with selectors and index terms but **no ciphertext**. This is what you cast a `@>` operand to. | -The full wire shape of each is documented in [Payload format](/reference/eql/payload-format). +## Payload shape + +An encrypted JSON document uses a different payload shape from the scalar types: the standard envelope keys are present (`v`, `i`, plus the `k: "sv"` discriminator โ€” envelope anatomy is covered in [Core concepts](/reference/eql/core-concepts)), but there is no root ciphertext. Instead, an `sv` array carries one encrypted entry per path in the document. Each entry has: + +| Key | Contents | +| --- | --- | +| `s` | Selector โ€” a deterministic hash of the JSON path. Required; entry matching compares selectors first. | +| `c` | Ciphertext for the node at that path. | +| `hm` **or** `oc` | Exactly one, never both โ€” the domain `CHECK` enforces the exclusivity. `hm` (HMAC-256) on Boolean/`null` leaves and Object/Array roots; `oc` (CLLW ORE, backed by `eql_v3.ore_cllw`) on String/Number leaves. | +| `a` | Optional array marker โ€” `true` when the selector points at an array context. | + +The decoded `oc` value starts with a domain-tag byte (`0x00` numeric, `0x01` string) followed by the CLLW ciphertext, so numeric and string values in one column keep a consistent total order. Earlier payload versions split this into two fields โ€” `ocf` (fixed-width, numeric) and `ocv` (variable-width, string) โ€” which consolidated into the single `oc` key; the tag byte now carries the distinction. + +A document payload for an `eql_v3.json` column: + +```json +{ + "v": 2, + "k": "sv", + "i": { "t": "orders", "c": "metadata" }, + "sv": [ + { "s": "2517068c0d1f9d4d41d2c666211f785e", "c": "mBbKmM...", "hm": "b0e0..." }, + { "s": "f510853a4ab9d4f75f51a533ac264c5d", "c": "mBbKmQ...", "oc": "01a3f2..." }, + { "s": "33743aed3ae636f6bf05cff11ac4b519", "c": "mBbKmR...", "oc": "004e19..." } + ] +} +``` + +- First entry: an object root โ€” `hm` only, equality/containment +- Second entry: a string leaf โ€” `oc` starting with tag `01` +- Third entry: a numeric leaf โ€” `oc` starting with tag `00` + +A containment **query** payload (`eql_v3.ste_vec_query`) has the same `sv` shape but its entries carry no `c` โ€” containment matches selectors and index terms, never ciphertexts. This is the needle the client builds for a `@>` query: + +```json +{ + "sv": [ + { "s": "f510853a4ab9d4f75f51a533ac264c5d", "oc": "01a3f2..." } + ] +} +``` ## Storing encrypted JSON @@ -50,7 +90,7 @@ During encryption, the client flattens the document: each unique path gets a det | String | `oc` (CLLW ORE, string domain) | Yes | Yes | | Number | `oc` (CLLW ORE, numeric domain) | Yes | Yes | -Each entry carries exactly one of `hm` or `oc` โ€” the domain `CHECK` enforces the exclusivity. `hm` is a deterministic hash, so it supports equality only. `oc` is a CLLW ORE term that reveals ordering and, being deterministic, collapses to equality on matching selectors โ€” `eql_v3.eq_term` reads whichever term an entry carries, so equality works uniformly across all node types. Earlier payload versions split the ORE term into `ocf` (fixed-width, numeric) and `ocv` (variable-width, string); current payloads emit a single `oc` whose leading domain-tag byte carries the numeric/string distinction. +Each entry carries exactly one of `hm` or `oc` โ€” the domain `CHECK` enforces the exclusivity. `hm` is a deterministic hash, so it supports equality only. `oc` is a CLLW ORE term that reveals ordering and, being deterministic, collapses to equality on matching selectors โ€” `eql_v3.eq_term` reads whichever term an entry carries, so equality works uniformly across all node types. JSON `null` here means a `null` literal *inside* the document. A SQL `NULL` column value is not encrypted at all. @@ -66,7 +106,7 @@ These native PostgreSQL `jsonb` operators are **blocked** on `eql_v3.json`. They Use containment (`@>` / `<@`), field access (`->` / `->>`), or the `eql_v3.jsonb_path_*` functions instead. There is no server-side mutation of an encrypted document โ€” updates re-encrypt in the client. -**Type your operands.** `eql_v3.json` is a domain over `jsonb`, and PostgreSQL resolves `domain OP untyped_literal` to the **native** `jsonb` operator โ€” bypassing both the encrypted operator and the blockers. `WHERE doc -> 'email'` silently runs native `jsonb ->` and returns `NULL`; `WHERE doc -> 'email'::text` resolves the encrypted operator. This is the same rule as the [scalar operators](/reference/eql/operators). Queries through CipherStash Proxy always bind typed parameters, so this only bites hand-written ad-hoc SQL. +**Operands must be typed** (`doc -> 'email'::text`, not `doc -> 'email'`) โ€” an untyped operand resolves the native `jsonb` operator, bypassing both the encrypted operator and the blockers. See [Core concepts](/reference/eql/core-concepts). ## Containment: `@>` and `<@` @@ -213,16 +253,16 @@ The rows come back as ciphertext; decrypt them in the client. -## In this section +## Where to next - - The wire shape of the ste_vec envelope and its entries. + + The envelope anatomy, typed-operand rule, and fail-loud behavior shared by every EQL type. GIN containment and field-level functional index recipes. - - The full operator surface, including the typed-operand rule. + + WHERE-clause patterns across all encrypted types. diff --git a/content/docs/reference/eql/meta.json b/content/docs/reference/eql/meta.json index 48fe2e7..3f4469e 100644 --- a/content/docs/reference/eql/meta.json +++ b/content/docs/reference/eql/meta.json @@ -1,11 +1,18 @@ { "title": "EQL", "pages": [ - "types", - "operators", - "indexes", + "core-concepts", + "---Types---", + "numbers-and-dates", + "text", "json", - "functions", - "payload-format" + "booleans", + "---Indexes---", + "indexes", + "---Queries---", + "filtering", + "sorting", + "grouping-and-aggregates", + "joins" ] } diff --git a/content/docs/reference/eql/numbers-and-dates.mdx b/content/docs/reference/eql/numbers-and-dates.mdx new file mode 100644 index 0000000..1d01dec --- /dev/null +++ b/content/docs/reference/eql/numbers-and-dates.mdx @@ -0,0 +1,140 @@ +--- +title: Numbers & dates +description: "The complete reference for encrypted numeric and date/time columns: the int, float, numeric, date, and timestamp domain variants, the ORE-backed payload they carry, and range, ORDER BY, and MIN/MAX queries." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Eight scalar types share one identical query surface: `int2`, `int4`, `int8`, `float4`, `float8`, `numeric`, `date`, and `timestamp`. These are the columns you filter by range, sort newest-first, and take a `MIN` / `MAX` over โ€” salaries, totals, rates, hire dates, timestamps. Everything on this page applies to all eight; only the domain name changes. + +There is no free-text matching for these types โ€” `_match` and `_search` are [text-only variants](/reference/eql/text). Boolean columns are a separate, storage-only story โ€” see [Booleans](/reference/eql/booleans). + +## Variants + +Each of the eight scalar types generates the same four `jsonb`-backed domain variants: + +| Domain variant | Capability | Index term carried | +| --- | --- | --- | +| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | +| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (HMAC-256) | +| `eql_v3._ord` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and `MIN` / `MAX`. | `ob` (ORE block) | +| `eql_v3._ord_ore` | Identical to `_ord` โ€” a twin name that documents intent. | `ob` (ORE block) | + +Declare only the capability you query on โ€” each index term class reveals different structure to an observer (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): + +```sql +CREATE TABLE employees ( + id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + salary eql_v3.int8_ord, -- range queries, ORDER BY, MIN/MAX + tax_rate eql_v3.numeric_eq, -- exact lookup only + net_worth eql_v3.numeric, -- store and decrypt only, never queried + hired_on eql_v3.date_ord, + created_at eql_v3.timestamp_ord +); +``` + +## Payload + +A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” see [Core concepts](/reference/eql/core-concepts)) plus the `ob` ordering term. Here is a payload for the `eql_v3.int8_ord` `salary` column: + +```json +{ + "v": 2, + "i": { "t": "employees", "c": "salary" }, + "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", + "ob": [ + "7a1fd0c2...", "d24c9be1...", "03fa66b8...", "91b7e04d...", + "5c28aa19...", "e6f3071c...", "48d92ab5...", "0b64cf37..." + ] +} +``` + +- **`ob` is the only index term.** An `_ord` payload carries no `hm`: equality on `_ord` variants compares ORE terms, which collapse to equality โ€” see [Core concepts](/reference/eql/core-concepts). Only `_eq` payloads carry `hm` (a single hex HMAC-SHA-256 string) instead of `ob`. +- **The `ob` block count varies with the plaintext width**: 8 blocks for the int scalars, 12 for `timestamp`, 14 for `numeric` โ€” the array just carries more block strings. + +## Operators and functions + +The function forms exist for managed platforms that disallow custom operators โ€” they take the same typed arguments and resolve identically. + +| SQL operator | Function form | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | +| --- | --- | :---: | :---: | :---: | +| `=` / `<>` | `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | โŒ | โœ… | โœ… | +| `<` `<=` `>` `>=` | `eql_v3.lt` / `lte` / `gt` / `gte` | โŒ | โŒ | โœ… | +| `BETWEEN` | desugars to `>=` and `<=` | โŒ | โŒ | โœ… | +| `IN` | desugars to `=` | โŒ | โœ… | โœ… | +| `GROUP BY` / `DISTINCT` | โ€” (needs an equality term) | โŒ | โœ… | โœ… | +| `ORDER BY` | sort key: `eql_v3.ord_term(col)` | โŒ | โŒ | โœ… | +| `MIN` / `MAX` | `eql_v3.min(col)` / `eql_v3.max(col)` | โŒ | โŒ | โœ… | +| `IS NULL` / `IS NOT NULL` | โ€” | โœ… | โœ… | โœ… | + +Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.int8_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). + +**`SUM`, `AVG`, and other arithmetic aggregates are not supported** on encrypted columns โ€” they would require homomorphic encryption. `MIN` / `MAX` work because they only need comparison; for sums and averages, decrypt at the application boundary and aggregate client-side. + +## Example queries + +### Range filter + +```sql +SELECT * FROM employees +WHERE salary >= $1::eql_v3.int8_ord; + +SELECT * FROM employees +WHERE salary BETWEEN $1::eql_v3.int8_ord AND $2::eql_v3.int8_ord; +``` + +### Date window + +`BETWEEN` works the same on `date` and `timestamp` columns: + +```sql +SELECT * FROM employees +WHERE hired_on BETWEEN $1::eql_v3.date_ord AND $2::eql_v3.date_ord; +``` + +### Newest-first listing + +Bare `ORDER BY created_at` sorts correctly, but the planner doesn't rewrite sort keys, so it adds a `Sort` node even when a btree index exists. Write the sort key in extractor form to stream rows out of the index already ordered โ€” at large row counts this is the difference between seconds and milliseconds (see [Sorting](/reference/eql/sorting)): + +```sql +SELECT * FROM employees +WHERE created_at >= $1::eql_v3.timestamp_ord +ORDER BY eql_v3.ord_term(created_at) DESC +LIMIT 10; +``` + +### MIN and MAX + +`eql_v3.min` / `eql_v3.max` compare ORE terms โ€” no decryption happens in the database, and the encrypted result decrypts in the client. `NULL` inputs are skipped; an all-`NULL` input set returns `NULL`: + +```sql +SELECT eql_v3.min(salary) FROM employees; +SELECT eql_v3.max(created_at) FROM employees; +``` + +### Cast at the call site + +On a generic `jsonb` column whose payloads already carry the `ob` term, cast to the right domain in the query: + +```sql +SELECT eql_v3.min(salary_jsonb::eql_v3.int8_ord) FROM employees; +``` + +## Where to next + + + + Btree recipes on `eql_v3.ord_term` for range, ORDER BY, and MIN/MAX. + + + WHERE-clause patterns across all encrypted types. + + + Why the extractor-form sort key matters, and how to verify with EXPLAIN. + + + GROUP BY, DISTINCT, and the aggregate surface on encrypted columns. + + diff --git a/content/docs/reference/eql/operators.mdx b/content/docs/reference/eql/operators.mdx deleted file mode 100644 index 60a5fff..0000000 --- a/content/docs/reference/eql/operators.mdx +++ /dev/null @@ -1,153 +0,0 @@ ---- -title: Operators -description: "Which SQL operators work on each eql_v3 encrypted-domain variant, how unsupported operators fail, and why operands must be typed." -type: reference -components: [eql] -verifiedAgainst: - eql: "3.0.0" ---- - -EQL overloads standard PostgreSQL operators on the [encrypted-domain types](/reference/eql/types). Type the column as the variant that carries the right index term and the operator resolves โ€” and engages a matching [functional index](/reference/eql/indexes). - - -**Operands must be typed.** The `eql_v3` domains are backed by `jsonb`. When an operand has no known type โ€” a bare string literal, an untyped parameter โ€” PostgreSQL reduces the domain to its `jsonb` base type and resolves the **native `jsonb` operator** instead of the encrypted one. The query doesn't fail; it silently returns native `jsonb` semantics, which are meaningless for encrypted payloads. - -Always type the operand: a typed parameter (`$1::eql_v3.text_eq`) or an explicit cast (`'โ€ฆ'::eql_v3.int4_ord`). The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) type bound parameters automatically โ€” raw SQL must do it by hand. - - -## Operator support by variant - -A โœ… means the operator resolves on a column typed as that variant. A โŒ means it is blocked โ€” it raises, it does not return wrong rows. - -| SQL operator | Meaning | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | `text_match` | `text_search` | -| --- | --- | :---: | :---: | :---: | :---: | :---: | -| `=` | Equality | โŒ | โœ… | โœ… | โŒ | โœ… | -| `<>` / `!=` | Inequality | โŒ | โœ… | โœ… | โŒ | โœ… | -| `<` `<=` `>` `>=` | Ordered comparison | โŒ | โŒ | โœ… | โŒ | โœ… | -| `@>` / `<@` | Bloom-filter token containment | โŒ | โŒ | โŒ | โœ… | โœ… | -| `LIKE` / `ILIKE` (`~~` / `~~*`) | SQL pattern match | โŒ | โŒ | โŒ | โŒ | โŒ | -| `IS NULL` / `IS NOT NULL` | Null check | โœ… | โœ… | โœ… | โœ… | โœ… | - -A SQL `NULL` column value is not encrypted, so `IS NULL` / `IS NOT NULL` always work regardless of variant. - -## There is no `LIKE` - -`LIKE` and `ILIKE` (`~~` / `~~*`) raise on **every** encrypted-domain variant. SQL pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment โ€” `@>` on a `text_match` or `text_search` column: - -```sql --- โŒ Raises: operator not supported -SELECT * FROM users WHERE email LIKE '%alice%'; - --- โœ… Encrypted free-text match -SELECT * FROM users WHERE email @> $1::eql_v3.text_match; -``` - -`@>` / `<@` here is **probabilistic ngram-bloom containment** โ€” it tests whether the encrypted text contains the (encrypted) search terms. It is not JSONB containment and not `LIKE`. The client encrypts the search term into a bloom-filter query value; false positives are possible, false negatives are not. - -## Unsupported operators fail loudly - -Unsupported operators are not silent no-ops. Every operator that a variant doesn't support is still *defined* โ€” it routes to a blocker function that raises an `operator โ€ฆ is not supported` exception. A mis-typed query fails loudly instead of silently returning wrong results: - -```sql --- salary is eql_v3.int8_eq (equality only) -SELECT * FROM users WHERE salary > $1::eql_v3.int8_eq; --- ERROR: operator > is not supported for eql_v3.int8_eq -``` - -A `NULL` operand still raises โ€” the blockers are deliberately not `STRICT`, so PostgreSQL can't skip the check. - -## Query shapes - -### Equality: `=` and `<>` - -Works on `_eq`, `_ord` / `_ord_ore`, and `text_search`. On `_eq` and `text_search`, equality compares the HMAC (`hm`) term; on `_ord` variants it compares the ORE (`ob`) term, which collapses to equality โ€” so `_ord` columns get equality without carrying an `hm` term: - -```sql -SELECT * FROM users WHERE email = $1::eql_v3.text_eq; -SELECT * FROM users WHERE email <> $1::eql_v3.text_eq; -``` - -### Comparison, `BETWEEN`, and `ORDER BY` - -Works on `_ord` / `_ord_ore` and `text_search` (variants carrying an `ob` ORE term): - -```sql -SELECT * FROM users WHERE salary >= $1::eql_v3.int8_ord; - --- BETWEEN desugars to >= and <= -SELECT * FROM users -WHERE created_at BETWEEN $1::eql_v3.timestamp_ord AND $2::eql_v3.timestamp_ord; - --- ORDER BY is meaningful only with an ORE term -SELECT * FROM users ORDER BY salary DESC; -``` - -`ORDER BY` on a variant without an `ob` term won't produce a meaningful order โ€” type the column as an `_ord` variant when ordering matters. - -Bare `ORDER BY col` sorts correctly, but the planner doesn't rewrite sort keys, so it adds a `Sort` node even when a btree index exists. To stream rows out of the index already ordered, write the sort key in extractor form (`ORDER BY eql_v3.ord_term(col)`) โ€” see [Range and ORDER BY](/reference/eql/indexes#range-and-order-by). - -### Text containment: `@>` and `<@` - -Works on `text_match` and `text_search` only: - -```sql -SELECT * FROM users WHERE email @> $1::eql_v3.text_match; -``` - -### `IN` - -Desugars to `=`, so it needs an equality-capable variant (`_eq`, `_ord`, `text_search`): - -```sql -SELECT * FROM users -WHERE email IN ($1::eql_v3.text_eq, $2::eql_v3.text_eq); -``` - -### `GROUP BY` and `DISTINCT` - -Need an equality term (`_eq`, `_ord`, `text_search`): - -```sql -SELECT email, COUNT(*) FROM logins GROUP BY email; -SELECT DISTINCT email FROM logins; -``` - -Plain `COUNT(col)` needs no term and works on any variant; `COUNT(DISTINCT col)` needs an equality term. - -### Joins - -Equijoins work on equality-capable variants, with one extra constraint: **both sides must have been encrypted with the same keyset and typed as a matching variant** โ€” otherwise the equality terms can never match: - -```sql -SELECT u.*, o.total -FROM users u -JOIN orders o ON u.email = o.customer_email; -- both eql_v3.text_eq, same keyset -``` - -The same rule applies to `IN (subquery)` and set-operation deduplication. - -## Function-form equivalents - -Some managed platforms disallow custom operators. Every operator has a function form, generated per domain variant, taking the same domain types: - -| Function | Operator | Available on | -| --- | --- | --- | -| `eql_v3.eq(a, b)` | `=` | `_eq`, `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.neq(a, b)` | `<>` | `_eq`, `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.lt(a, b)` | `<` | `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.lte(a, b)` | `<=` | `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.gt(a, b)` | `>` | `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.gte(a, b)` | `>=` | `_ord` / `_ord_ore`, `text_search` | -| `eql_v3.contains(a, b)` | `@>` | `text_match`, `text_search`, `eql_v3.json` | -| `eql_v3.contained_by(a, b)` | `<@` | `text_match`, `text_search`, `eql_v3.json` | - -```sql -SELECT * FROM users WHERE eql_v3.eq(email, $1::eql_v3.text_eq); -SELECT * FROM users WHERE eql_v3.lt(created_at, $1::eql_v3.timestamp_ord); -``` - -There are no `like` / `ilike` function forms โ€” text matching is `eql_v3.contains` on a `text_match` value. See [Functions](/reference/eql/functions) for the full function surface, including `MIN` / `MAX`. - -## JSON operators - -`eql_v3.json` has its own operator surface โ€” document containment (`@>` / `<@`), field access (`->` / `->>`), and comparisons on extracted leaves โ€” and its own set of blocked native JSONB operators. See [JSON support](/reference/eql/json). diff --git a/content/docs/reference/eql/payload-format.mdx b/content/docs/reference/eql/payload-format.mdx deleted file mode 100644 index 24af439..0000000 --- a/content/docs/reference/eql/payload-format.mdx +++ /dev/null @@ -1,123 +0,0 @@ ---- -title: Payload format -description: "The wire format of every EQL encrypted value: the v/i/c envelope, the index-term keys, and the ste_vec document shape." -type: reference -components: [eql] -verifiedAgainst: - eql: "3.0.0" ---- - -Every EQL encrypted value is a `jsonb` payload with a shared envelope plus the index terms that make it queryable. This page defines that wire format. Earlier CipherStash docs called this format the **CipherCell** โ€” this page is the current definition of the same structure. - -Payloads are produced by the encryption clients โ€” the [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) โ€” and consumed by EQL's operators and functions inside Postgres. EQL never sees plaintext: it validates, stores, and compares these payloads; it cannot produce or decrypt them. - -## The envelope - -Every payload carries three envelope keys. Each `eql_v3` domain's `CHECK` constraint requires them, so a value missing any of these is rejected at write time: - -| Key | Contents | Notes | -| --- | --- | --- | -| `v` | Payload version | Always exactly `2` on the wire. The domain `CHECK`s assert it and raise on any other value. | -| `i` | Ident: `{"t": "
", "c": ""}` | Binds the ciphertext to the table and column it was encrypted for. Both keys required. | -| `c` | Ciphertext | The opaque, non-deterministic encrypted blob (mp_base85-encoded). Never used in comparisons. | - - -`eql_v3` names the **SQL schema generation**, not the payload version. The JSON envelope version is still `v: 2` โ€” the wire field names are unchanged from EQL v2, and the domain `CHECK`s assert `v = 2`. - - -A `k` discriminator (`"ct"` for a scalar ciphertext, `"sv"` for a JSON document) also appears on payloads emitted by the clients, distinguishing the two top-level shapes. - -## Index-term keys - -Alongside the envelope, a payload carries the index terms for its column's capability. On the wire, a payload is discriminated by *which term key is present* โ€” the SQL domain name carries the rest. Each key is backed by a SEM (searchable encrypted metadata) type in the `eql_v3` schema: - -| Key | SEM type | Wire shape | Enables | Reveals | -| --- | --- | --- | --- | --- | -| `hm` | `eql_v3.hmac_256` (domain over `text`) | Hex string (HMAC-SHA-256) | `=`, `<>` on `_eq` and `text_search` domains | Whether two values are equal โ€” nothing else | -| `ob` | `eql_v3.ore_block_256` (composite: array of `bytea` block terms) | Array of hex-encoded ORE blocks | `<`, `<=`, `>`, `>=`, `ORDER BY` on `_ord` / `_ord_ore` domains โ€” and `=` / `<>`, since ORE comparison collapses to equality | The relative order of two values | -| `bf` | `eql_v3.bloom_filter` (domain over `smallint[]`) | Array of set bit positions (**signed** 16-bit) | `@>` / `<@` token containment on `_match` domains | Probabilistic token overlap between values | - -Notes on the wire shapes: - -- **`ob` block count is width-agnostic**: 8 blocks for the int scalars, 12 for timestamp, 14 for numeric โ€” the array just carries more block strings. -- **`bf` positions are signed**: EQL stores the filter as PostgreSQL `smallint[]`, and filters sized above 32768 emit upper-half bit positions as *negative* signed values. Consumers must use a signed 16-bit integer type. - -The capability is encoded as **required keys**: the payload for an `eql_v3.text_eq` column must carry `hm`; an `eql_v3.int4_ord` payload must carry `ob` (and only `ob` โ€” equality on `_ord` domains compares ORE terms, so no `hm` is needed); a `text_match` payload must carry `bf`; a `text_search` payload carries all three. A payload missing its term key fails the domain `CHECK` โ€” and fails to deserialize in the client bindings. See [Types](/reference/eql/types) for the domain-to-capability mapping, and [Searchable encryption](/concepts/searchable-encryption) for what these terms do and don't leak. - -## JSON documents: the `sv` vector - -An [encrypted JSON document](/reference/eql/json) uses a different payload shape: no root ciphertext, and an `sv` array with one encrypted entry per path in the document. Each entry carries: - -| Key | Contents | -| --- | --- | -| `s` | Selector โ€” a deterministic hash of the JSON path. Required; entry matching compares selectors first. | -| `c` | Ciphertext for the node at that path. | -| `hm` **or** `oc` | Exactly one, never both โ€” the domain `CHECK` enforces the exclusivity. `hm` (HMAC-256) on Boolean/`null` leaves and Object/Array roots; `oc` (CLLW ORE, backed by `eql_v3.ore_cllw`) on String/Number leaves. | -| `a` | Optional array marker โ€” `true` when the selector points at an array context. | - -The decoded `oc` value starts with a domain-tag byte (`0x00` numeric, `0x01` string) followed by the CLLW ciphertext, so numeric and string values in one column keep a consistent total order. Earlier payload versions split this into two fields โ€” `ocf` (fixed-width, numeric) and `ocv` (variable-width, string) โ€” which consolidated into the single `oc` key; the tag byte now carries the distinction. - -A containment **query** payload (`eql_v3.ste_vec_query`) has the same `sv` shape but its entries carry no `c` โ€” containment matches selectors and index terms, never ciphertexts. - -## Example payloads - -A scalar payload for an `eql_v3.text_search` column (lookup + ordering + free-text match, so all three terms are required): - -```json -{ - "v": 2, - "i": { "t": "users", "c": "email" }, - "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", - "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", - "ob": ["7a1fd0c2...", "d24c9be1...", "03fa66b8..."], - "bf": [42, 1290, -8113, 30201] -} -``` - -- `v`, `i`, `c` โ€” the envelope -- `hm` โ€” equality term: `WHERE email = $1` compares this -- `ob` โ€” ordering term: `ORDER BY` and range comparisons walk these blocks -- `bf` โ€” bloom-filter term: `@>` token containment tests these bit positions - -A JSON document payload for an `eql_v3.json` column: - -```json -{ - "v": 2, - "k": "sv", - "i": { "t": "orders", "c": "metadata" }, - "sv": [ - { "s": "2517068c0d1f9d4d41d2c666211f785e", "c": "mBbKmM...", "hm": "b0e0..." }, - { "s": "f510853a4ab9d4f75f51a533ac264c5d", "c": "mBbKmQ...", "oc": "01a3f2..." }, - { "s": "33743aed3ae636f6bf05cff11ac4b519", "c": "mBbKmR...", "oc": "004e19..." } - ] -} -``` - -- First entry: an object root โ€” `hm` only, equality/containment -- Second entry: a string leaf โ€” `oc` starting with tag `01` -- Third entry: a numeric leaf โ€” `oc` starting with tag `00` - -And the containment needle the client builds for a `@>` query โ€” index terms, no ciphertexts: - -```json -{ - "sv": [ - { "s": "f510853a4ab9d4f75f51a533ac264c5d", "oc": "01a3f2..." } - ] -} -``` - -## Machine-readable schemas - -The [EQL repository](https://github.com/cipherstash/encrypt-query-language) publishes the format as JSON Schema in two places: - -- **`crates/eql-bindings/schema/`** โ€” one schema per scalar domain (`$id`s under `https://schemas.cipherstash.com/eql/v3/`), generated from the canonical Rust wire types in the `eql-bindings` crate. TypeScript bindings are generated from the same definitions, so every producer and consumer shares one source of truth. -- **`docs/reference/schema/`** โ€” full-payload schemas covering both the scalar and `sv` document shapes. These files are currently named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`) and reference `eql_v2` function names, even though the current SQL surface is `eql_v3` โ€” the v2.3 schema is the applicable document-shape definition, matching the still-`v: 2` envelope. - -## Who produces and consumes this - -- **Produce:** the Stack SDK and CipherStash Proxy encrypt plaintext into these payloads โ€” ciphertext, index terms, selectors โ€” using keys the database never holds. -- **Consume:** EQL's domain `CHECK`s validate the shape on write, and its operators and extractor functions ([Operators](/reference/eql/operators), [Indexes](/reference/eql/indexes)) compare the term keys at query time. - -The division is strict: EQL never sees plaintext, and the clients never rely on the database for key material. diff --git a/content/docs/reference/eql/sorting.mdx b/content/docs/reference/eql/sorting.mdx new file mode 100644 index 0000000..b684f3e --- /dev/null +++ b/content/docs/reference/eql/sorting.mdx @@ -0,0 +1,96 @@ +--- +title: Sorting +description: "ORDER BY on encrypted columns: which variants sort, when to write the sort key in extractor form, keyset pagination, and the ::jsonb projection trap." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +`ORDER BY` on an encrypted column needs an ORE ordering term: it works on `_ord` / `_ord_ore` variants of every scalar and on `text_search`. ORE terms are order-preserving, so the database sorts ciphertext in exactly the order the plaintext would sort โ€” without decrypting anything. Which variants carry the term is covered in [Numbers & dates](/reference/eql/numbers-and-dates) and [Text](/reference/eql/text); the variant model itself is in [Core concepts](/reference/eql/core-concepts). + +Sorting a variant *without* an ORE term (`_eq`, `text_match`, bare storage variants) won't raise โ€” but the order is meaningless. Type the column as an `_ord` variant when ordering matters. + +## Bare form vs extractor form + +Both of these sort correctly: + +```sql +-- Bare form +SELECT * FROM users ORDER BY created_at DESC; + +-- Extractor form +SELECT * FROM users ORDER BY eql_v3.ord_term(created_at) DESC; +``` + +The difference is the plan. The planner inlines encrypted operators in *predicates*, so a `WHERE created_at < $1` matches a btree on `eql_v3.ord_term(created_at)` without rewriting โ€” but it does **not** rewrite *sort keys*. Bare `ORDER BY created_at` therefore adds a `Sort` node above the scan, and that sort's cost scales linearly with the rows passing the filter. + +Writing the sort key in extractor form makes it textually match the index expression, so rows stream out of the btree already ordered โ€” no `Sort` node at all: + +```sql +CREATE INDEX users_created_at_ord + ON users USING btree (eql_v3.ord_term(created_at)); +ANALYZE users; + +SELECT * FROM users + WHERE created_at < $1::eql_v3.timestamp_ord + ORDER BY eql_v3.ord_term(created_at) DESC + LIMIT 10; +-- Index Scan Backward using users_created_at_ord โ€” no Sort node +``` + +At large row counts this is the difference between seconds and milliseconds, and it matters most for `LIMIT` queries: with a `Sort` node, Postgres must sort *every* matching row before it can return the top 10; streaming from the index, it stops after 10. + +Rule of thumb: bare form is fine for small result sets or when no ordering index exists; any hot query with `ORDER BY ... LIMIT` should use the extractor form. Confirm with `EXPLAIN (COSTS OFF)` โ€” a `Sort` node above an `Index Scan` means the sort key didn't match the index. Full plan-reading guidance is in [Indexes](/reference/eql/indexes). + +## `ASC`, `DESC`, and `NULLS` + +`ASC` / `DESC` behave normally โ€” a btree serves both directions (backward scans handle `DESC`). SQL `NULL` column values are not encrypted, so `NULLS FIRST` / `NULLS LAST` also behave normally: + +```sql +SELECT * FROM users +ORDER BY eql_v3.ord_term(last_login) DESC NULLS LAST; +``` + +## Keyset pagination + +`OFFSET` pagination degrades on encrypted columns the same way it does on plaintext ones โ€” every page re-sorts and discards the rows before the offset. Keyset (cursor) pagination composes an encrypted range filter with an extractor-form sort: + +```sql +-- Page 1 +SELECT id, email, created_at FROM users + ORDER BY eql_v3.ord_term(created_at) DESC + LIMIT 20; + +-- Next page: pass the last row's created_at back, re-encrypted as the cursor +SELECT id, email, created_at FROM users + WHERE created_at < $1::eql_v3.timestamp_ord + ORDER BY eql_v3.ord_term(created_at) DESC + LIMIT 20; +``` + +Both the filter and the sort ride the same btree on `eql_v3.ord_term(created_at)`, so every page is an index scan that stops after 20 rows. The client re-encrypts the cursor value for the next request โ€” the database only ever sees ciphertext. + +## The `::jsonb` projection trap + + +If you project the column with a cast and sort on it โ€” `SELECT col::jsonb ... ORDER BY col` โ€” Postgres folds the cast into the scan and uses `(col)::jsonb` as the sort key, which matches no index. Project the column raw and let the client decode it, or write the sort key as `eql_v3.ord_term(col)`, which sidesteps the problem entirely. + + +## Sorting extracted JSON leaves + +String and Number leaves inside an encrypted JSON document carry a CLLW ORE term, so they sort too โ€” the extractor is `eql_v3.ore_cllw` on the extracted entry: + +```sql +SELECT * FROM orders +ORDER BY eql_v3.ore_cllw(metadata -> 'total_selector'::text) DESC +LIMIT 10; +``` + +A btree on the same `eql_v3.ore_cllw(...)` expression streams this ordered, exactly like `ord_term` on a scalar column. Selectors, node types, and which leaves are orderable are covered in [JSON](/reference/eql/json). + +## Where to go next + +- [Indexes](/reference/eql/indexes) โ€” the btree recipe behind every sort on this page, plus `EXPLAIN` verification and large-table build guidance. +- [Filtering](/reference/eql/filtering) โ€” the range predicates that pair with these sorts. +- [Grouping & aggregates](/reference/eql/grouping-and-aggregates) โ€” `MIN` / `MAX`, which use the same ordering term. diff --git a/content/docs/reference/eql/text.mdx b/content/docs/reference/eql/text.mdx new file mode 100644 index 0000000..c70ee13 --- /dev/null +++ b/content/docs/reference/eql/text.mdx @@ -0,0 +1,157 @@ +--- +title: Text +description: "The complete reference for encrypted text columns: all six text domain variants, the multi-term payload, why LIKE is gone everywhere, and bloom-filter token containment as the encrypted free-text match." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +Text is the richest encrypted scalar. Beyond the four variants every scalar type gets, `text` adds two of its own: `text_match` for encrypted free-text matching, and `text_search` for columns you need to look up, sort, *and* search. Emails, names, tax IDs, addresses โ€” this page is the full surface for all of them. + +## Variants + +All six are `jsonb`-backed domains. Which one you declare fixes the column's query capability โ€” the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): + +| Domain variant | Capability | Index terms carried | +| --- | --- | --- | +| `eql_v3.text` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | +| `eql_v3.text_eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (HMAC-256) | +| `eql_v3.text_ord` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, `MIN` / `MAX`. | `ob` (ORE block) | +| `eql_v3.text_ord_ore` | Identical to `text_ord` โ€” a twin name that documents intent. | `ob` (ORE block) | +| `eql_v3.text_match` | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (bloom filter) | +| `eql_v3.text_search` | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | + +Declare only the capabilities you query on โ€” each term class reveals different structure to an observer: equality terms reveal value repetition, ORE terms reveal ordering, bloom terms reveal token overlap (see [Searchable encryption](/concepts/searchable-encryption)): + +```sql +CREATE TABLE users ( + id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + email eql_v3.text_search, -- lookup, sort, and free-text match + name eql_v3.text_match, -- free-text match only + tax_id eql_v3.text_eq, -- exact lookup only + notes eql_v3.text -- store and decrypt only +); +``` + +## Payload + +A value for a `text_search` column carries the shared envelope keys (`v`, `i`, `c` โ€” see [Core concepts](/reference/eql/core-concepts)) plus all three index terms: + +```json +{ + "v": 2, + "i": { "t": "users", "c": "email" }, + "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", + "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", + "ob": ["7a1fd0c2...", "d24c9be1...", "03fa66b8..."], + "bf": [42, 1290, -8113, 30201] +} +``` + +- `hm` โ€” equality term: `WHERE email = $1` compares this +- `ob` โ€” ordering term: `ORDER BY` and range comparisons walk these blocks +- `bf` โ€” bloom-filter term: `@>` token containment tests these bit positions + +The narrower variants carry only their own term: a `text_eq` payload carries `hm` only, `text_match` carries `bf` only, and `text_ord` / `text_ord_ore` carry `ob` only (no `hm` โ€” equality on `_ord` variants compares ORE terms, see [Core concepts](/reference/eql/core-concepts)). A payload missing its variant's required term fails the domain `CHECK` at write time. + +**`bf` positions are signed**: EQL stores the filter as PostgreSQL `smallint[]`, and filters sized above 32768 emit upper-half bit positions as *negative* signed values. Consumers must use a signed 16-bit integer type. + +## Operators and functions + +The function forms exist for managed platforms that disallow custom operators โ€” they take the same typed arguments and resolve identically. + +| SQL operator | Function form | `eql_v3.text` | `text_eq` | `text_ord` / `text_ord_ore` | `text_match` | `text_search` | +| --- | --- | :---: | :---: | :---: | :---: | :---: | +| `=` / `<>` | `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | โŒ | โœ… | โœ… | โŒ | โœ… | +| `<` `<=` `>` `>=` | `eql_v3.lt` / `lte` / `gt` / `gte` | โŒ | โŒ | โœ… | โŒ | โœ… | +| `@>` / `<@` | `eql_v3.contains(a, b)` / `eql_v3.contained_by(a, b)` | โŒ | โŒ | โŒ | โœ… | โœ… | +| `LIKE` / `ILIKE` (`~~` / `~~*`) | none | โŒ | โŒ | โŒ | โŒ | โŒ | +| `IN` / `GROUP BY` / `DISTINCT` | desugar to `=` / need an equality term | โŒ | โœ… | โœ… | โŒ | โœ… | +| `ORDER BY`, `MIN` / `MAX` | `eql_v3.min(col)` / `eql_v3.max(col)` | โŒ | โŒ | โœ… | โŒ | โœ… | +| `IS NULL` / `IS NOT NULL` | โ€” | โœ… | โœ… | โœ… | โœ… | โœ… | + +Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.text_eq`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). + +## There is no `LIKE` + +`LIKE` and `ILIKE` (`~~` / `~~*`) raise on **every** encrypted-domain variant โ€” including `text_match` and `text_search`. SQL pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment โ€” `@>` on a `text_match` or `text_search` column: + +```sql +-- โŒ Raises: operator not supported +SELECT * FROM users WHERE email LIKE '%alice%'; + +-- โœ… Encrypted free-text match +SELECT * FROM users WHERE email @> $1::eql_v3.text_match; +``` + +`@>` / `<@` here is **probabilistic ngram-bloom containment** โ€” it tests whether the encrypted text contains the (encrypted) search terms. It is not JSONB containment and not `LIKE`. The client encrypts the search term into a bloom-filter query value; false positives are possible, false negatives are not. There are no `like` / `ilike` function forms either โ€” text matching is `eql_v3.contains` on a `text_match` value. + +## Example queries + +### Exact lookup + +Equality on a `text_eq` column compares HMAC terms. `IN` desugars to `=`: + +```sql +SELECT * FROM users WHERE tax_id = $1::eql_v3.text_eq; + +SELECT * FROM users +WHERE tax_id IN ($1::eql_v3.text_eq, $2::eql_v3.text_eq); +``` + +### Free-text match + +The client encrypts the search term into the bloom-filter needle: + +```sql +SELECT * FROM users WHERE name @> $1::eql_v3.text_match; + +-- Function form, for platforms without custom operators +SELECT * FROM users WHERE eql_v3.contains(name, $1::eql_v3.text_match); +``` + +### The works: `text_search` + +A `text_search` column answers exact lookup, free-text match, and ordering โ€” here, all three in one query: + +```sql +SELECT id, email FROM users +WHERE email @> $1::eql_v3.text_match -- token containment on bf + AND email <> $2::eql_v3.text_eq -- exclude an exact value via hm +ORDER BY eql_v3.ord_term(email) -- sort on ob +LIMIT 20; +``` + +### Sorting text + +ORE terms are order-preserving, so `ORDER BY` sorts encrypted text correctly. Write the sort key in extractor form so a btree index can do the ordering instead of a `Sort` node โ€” see [Sorting](/reference/eql/sorting): + +```sql +SELECT * FROM users +ORDER BY eql_v3.ord_term(email) +LIMIT 50; +``` + +`MIN` / `MAX` work on any ord-capable text column too: + +```sql +SELECT eql_v3.min(email) FROM users; +``` + +## Where to next + + + + Hash on `eq_term`, btree on `ord_term`, GIN on `match_term`. + + + WHERE-clause patterns across all encrypted types. + + + Extractor-form sort keys and index-backed ordering. + + + Equijoins on encrypted text columns, and the same-keyset rule. + + diff --git a/content/docs/reference/eql/types.mdx b/content/docs/reference/eql/types.mdx deleted file mode 100644 index e4eb005..0000000 --- a/content/docs/reference/eql/types.mdx +++ /dev/null @@ -1,95 +0,0 @@ ---- -title: Encrypted types -description: "The eql_v3 encrypted-domain type families: which domain variant to declare for each scalar type, and what each variant lets you query." -type: reference -components: [eql] -verifiedAgainst: - eql: "3.0.0" ---- - -EQL ships its searchable-encryption surface as PostgreSQL **domains in the `eql_v3` schema**. There are two kinds: - -- **Per-scalar encrypted-domain types** โ€” `eql_v3.int4`, `eql_v3.text`, `eql_v3.timestamp`, and so on. One family of domain *variants* per scalar type. -- **An encrypted-JSON document type** โ€” `eql_v3.json` โ€” for structured encryption of whole JSONB documents. See [JSON support](/reference/eql/json). - -A column's query capability is fixed by the **domain variant you type it as**. There is no database-side configuration step: which index terms travel in a value's payload is decided by the encryption client (the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy)), and the column's domain variant is what makes the matching operators resolve. - -## The family model - -Every scalar type `` generates a storage-only variant plus the query variants its capabilities allow. All variants are `jsonb`-backed domains. - -| Domain variant | Capability | Index term carried | -| --- | --- | --- | -| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | -| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (`eql_v3.hmac_256`) | -| `eql_v3._ord` / `eql_v3._ord_ore` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and the `eql_v3.min` / `eql_v3.max` aggregates. | `ob` (`eql_v3.ore_block_256`) | -| `eql_v3.text_match` (text only) | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (`eql_v3.bloom_filter`) | -| `eql_v3.text_search` (text only) | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | - -Two things worth calling out: - -- **The bare variant blocks everything.** `eql_v3.` carries no index term. Querying it with any comparison operator raises an "operator not supported" exception. Use it for columns you only ever store and decrypt. If you later need to query, type the column as a query variant โ€” or cast at the call site (`col::eql_v3.int4_ord`) if the payload already carries the term. -- **`_ord` and `_ord_ore` are twins.** They are byte-identical surfaces backed by the same ORE block term. Pick the name that documents intent โ€” "ordered" versus "ordered via ORE block". Both support the full ordered surface and `MIN` / `MAX`. - -## Type matrix - -The scalar tokens that ship in EQL 3.0.0 are `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool`. - -| Scalar | `eql_v3.` | `_eq` | `_ord` | `_ord_ore` | `text_match` | `text_search` | -| --- | :---: | :---: | :---: | :---: | :---: | :---: | -| `int2` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `int4` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `int8` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `float4` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `float8` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `numeric` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `date` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `timestamp` | โœ… | โœ… | โœ… | โœ… | โ€” | โ€” | -| `text` | โœ… | โœ… | โœ… | โœ… | โœ… | โœ… | -| `bool` | โœ… | โŒ | โŒ | โŒ | โ€” | โ€” | - - -**`bool` is storage-only by design.** A two-value column has too little cardinality for any searchable index to be safe โ€” an equality index over `true`/`false` would leak the value distribution outright. EQL ships only `eql_v3.bool`, with no `_eq` or `_ord` variants. Store and decrypt boolean columns; filter on them client-side. - - -## Index terms - -Each query variant stores one or more encrypted index terms alongside the ciphertext: - -- **`hm`** โ€” an HMAC-256 term (`eql_v3.hmac_256`). Supports exact equality. -- **`ob`** โ€” an ORE block term (`eql_v3.ore_block_256`). Order-revealing: supports comparison and sorting. -- **`bf`** โ€” a bloom filter term (`eql_v3.bloom_filter`). Supports probabilistic ngram token containment. - -The payload structure โ€” envelope keys plus per-variant term keys โ€” is documented in [Payload format](/reference/eql/payload-format). What each term mathematically reveals about the plaintext (and why you should only carry the terms you need) is covered in [Searchable encryption](/concepts/searchable-encryption). - -## Encrypted JSON: `eql_v3.json` - -`eql_v3.json` is the encrypted-JSON document domain, built on the structured-encryption ("ste_vec") model: a JSONB document is encrypted into a searchable vector of terms, one per path inside the document, supporting containment (`@>`), field access (`->` / `->>`), and path queries. It has its own operator and function surface โ€” see [JSON support](/reference/eql/json). - -## Choosing a variant - -Declare only the capabilities you query on. Every index term a value carries is extra material stored in the database, and each term class reveals different structure to an observer โ€” equality terms reveal value repetition, ORE terms reveal ordering, bloom terms reveal token overlap (see [Searchable encryption](/concepts/searchable-encryption)): - -- Never queried, only decrypted โ†’ bare `eql_v3.` -- Exact lookup, `IN`, joins, `GROUP BY` โ†’ `_eq` -- Ranges, `ORDER BY`, `MIN`/`MAX` โ†’ `_ord` -- Free-text matching on text โ†’ `text_match` -- Text you need to look up, sort, *and* search โ†’ `text_search` - -The variant you declare must match the terms the client is configured to emit for that column โ€” the domain makes the operator resolve, but the term in the payload is what makes it answer. - -## Example - -```sql -CREATE TABLE users ( - id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, - email eql_v3.text_search, -- lookup, sort, and free-text match - name eql_v3.text_match, -- free-text match only - tax_id eql_v3.text_eq, -- exact lookup only - salary eql_v3.int8_ord, -- range queries, ORDER BY, MIN/MAX - is_active eql_v3.bool, -- storage only (by design) - created_at eql_v3.timestamp_ord -); -``` - -Once the table exists, add functional indexes on the term extractors so queries engage an index โ€” see [Indexes](/reference/eql/indexes). The operators each variant supports are listed in [Operators](/reference/eql/operators). diff --git a/v2-redirects.mjs b/v2-redirects.mjs index 1f068b1..a4d8129 100644 --- a/v2-redirects.mjs +++ b/v2-redirects.mjs @@ -73,7 +73,7 @@ export const v2Redirects = [ }, { source: "/stack/cipherstash/encryption/queries", - destination: "/reference/eql/operators", + destination: "/reference/eql/filtering", permanent: true, }, // configuration, encrypt-decrypt, bulk-operations, models, schema, storing-data @@ -286,7 +286,7 @@ export const v2Redirects = [ }, { source: "/stack/reference/cipher-cell", - destination: "/reference/eql/payload-format", + destination: "/reference/eql/core-concepts", permanent: true, }, { From 9cecb45f043a5c430fa318e262370fa59965cf21 Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 20:18:03 +1000 Subject: [PATCH 3/6] refactor(v2): SEM specifiers, Tailwind-style variant enumeration, payload v:3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review feedback on the EQL section: - Variant tables: generic form first, then full enumeration of every concrete domain name (Tailwind-style); capability column made concise; "index term carried" column dropped โ€” term internals live in core-concepts' payload anatomy - SEM specifiers documented as a concept in core-concepts: a trailing mechanism suffix (_ord_ore) pins WHICH searchable-encryption mechanism implements a capability; _ord tracks the default (currently ORE). Replaces the "twins" framing. Each orderable type page lists its specifiers under an "SEM specifiers" heading, noting the OPE specifier arriving for all orderable types (incl. text) in the v3 release - Payload `v` field documented as the EQL version (3) per team decision 2026-07-02; all payload examples updated from v:2 Claude-Session: https://claude.ai/code/session_01ACPpFPHvKtrV48nbEYuv7P --- content/docs/reference/eql/core-concepts.mdx | 34 +++++++++----- content/docs/reference/eql/json.mdx | 2 +- .../docs/reference/eql/numbers-and-dates.mdx | 46 ++++++++++++++----- content/docs/reference/eql/text.mdx | 31 +++++++++---- 4 files changed, 79 insertions(+), 34 deletions(-) diff --git a/content/docs/reference/eql/core-concepts.mdx b/content/docs/reference/eql/core-concepts.mdx index a262fe8..11b6a0f 100644 --- a/content/docs/reference/eql/core-concepts.mdx +++ b/content/docs/reference/eql/core-concepts.mdx @@ -17,18 +17,28 @@ There is no database-side configuration table. Earlier EQL versions tracked encr For any scalar type ``, the family looks like this: -| Domain variant | Capability | Index term carried | -| --- | --- | --- | -| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | -| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (`eql_v3.hmac_256`) | -| `eql_v3._ord` / `eql_v3._ord_ore` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and the `eql_v3.min` / `eql_v3.max` aggregates. | `ob` (`eql_v3.ore_block_256`) | -| `eql_v3.text_match` (text only) | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (`eql_v3.bloom_filter`) | -| `eql_v3.text_search` (text only) | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | +| Domain variant | Capability | +| --- | --- | +| `eql_v3.` | Storage and decryption only. | +| `eql_v3._eq` | Equality: `=`, `<>`, `IN`, `GROUP BY`, `DISTINCT`, equijoins. | +| `eql_v3._ord` | Comparisons (`<` โ€ฆ `>=`), `BETWEEN`, `ORDER BY`, `MIN` / `MAX` โ€” plus equality. | +| `eql_v3._ord_ore` | As `_ord`, with the ORE mechanism pinned โ€” see [SEM specifiers](#sem-specifiers). | +| `eql_v3.text_match` (text only) | Free-text token containment: `@>` / `<@`. | +| `eql_v3.text_search` (text only) | Equality + ordering + token containment. | Two things worth calling out: - **The bare variant blocks everything.** `eql_v3.` carries no index term. Querying it with any comparison operator raises an "operator not supported" exception. Use it for columns you only ever store and decrypt โ€” [Booleans](/reference/eql/booleans) covers this pattern in full. -- **`_ord` and `_ord_ore` are twins.** They are byte-identical surfaces backed by the same ORE block term. Pick the name that documents intent โ€” "ordered" versus "ordered via ORE block". Both support the full ordered surface and `MIN` / `MAX`. +- **Which index term backs each capability** is an implementation detail of the payload โ€” covered in [Anatomy of an encrypted value](#anatomy-of-an-encrypted-value) below. + +### SEM specifiers + +A trailing mechanism suffix โ€” the `_ore` in `_ord_ore` โ€” is a **SEM specifier**: it pins *which* searchable-encryption mechanism implements the capability, rather than just declaring the capability itself. + +- `eql_v3._ord` declares *orderable* and leaves the mechanism to EQL's default โ€” currently ORE (order-revealing encryption). +- `eql_v3._ord_ore` declares *orderable via ORE, explicitly*. Today the two are byte-identical surfaces backed by the same term. + +The distinction earns its keep as mechanisms multiply: the EQL v3 release adds an **OPE** (order-preserving encryption) specifier for every orderable type โ€” including `text` โ€” at which point pinning a specifier documents and freezes a column's mechanism choice, while unspecified variants track the default. Each type page lists its available specifiers under an "SEM specifiers" heading. Declaring a table is just typing each column as the variant it needs: @@ -55,12 +65,12 @@ Every payload carries three envelope keys. Each `eql_v3` domain's `CHECK` constr | Key | Contents | Notes | | --- | --- | --- | -| `v` | Payload version | Always exactly `2` on the wire. The domain `CHECK`s assert it and raise on any other value. | +| `v` | The EQL version | `3` โ€” the payload version matches the EQL major version. The domain `CHECK`s assert it and raise on any other value. | | `i` | Ident: `{"t": "
", "c": ""}` | Binds the ciphertext to the table and column it was encrypted for. Both keys required. | | `c` | Ciphertext | The opaque, non-deterministic encrypted blob (mp_base85-encoded). Never used in comparisons. | -`eql_v3` names the **SQL schema generation**, not the payload version. The JSON envelope version is still `v: 2` โ€” the wire field names are unchanged from EQL v2, and the domain `CHECK`s assert `v = 2`. +Payloads produced by EQL v2 clients carried `v: 2`; from 3.0.0 the payload version and the EQL version move together. A `k` discriminator (`"ct"` for a scalar ciphertext, `"sv"` for a JSON document) also appears on payloads emitted by the clients, distinguishing the two top-level shapes. @@ -81,7 +91,7 @@ A scalar payload for an `eql_v3.text_search` column (lookup + ordering + free-te ```json { - "v": 2, + "v": 3, "i": { "t": "users", "c": "email" }, "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", @@ -102,7 +112,7 @@ Encrypted JSON documents use a different payload shape โ€” an `sv` array with on The [EQL repository](https://github.com/cipherstash/encrypt-query-language) publishes the format as JSON Schema in two places: - **`crates/eql-bindings/schema/`** โ€” one schema per scalar domain (`$id`s under `https://schemas.cipherstash.com/eql/v3/`), generated from the canonical Rust wire types in the `eql-bindings` crate. TypeScript bindings are generated from the same definitions, so every producer and consumer shares one source of truth. -- **`docs/reference/schema/`** โ€” full-payload schemas covering both the scalar and `sv` document shapes. These files are currently named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`) and reference `eql_v2` function names, even though the current SQL surface is `eql_v3` โ€” the v2.3 schema is the applicable document-shape definition, matching the still-`v: 2` envelope. +- **`docs/reference/schema/`** โ€” full-payload schemas covering both the scalar and `sv` document shapes. These files are still named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`); the v2.3 schema describes the document shape, with the payload version field moving to `3` alongside the EQL 3.0.0 release. ## The typed-operand rule diff --git a/content/docs/reference/eql/json.mdx b/content/docs/reference/eql/json.mdx index 4204ca8..93eb920 100644 --- a/content/docs/reference/eql/json.mdx +++ b/content/docs/reference/eql/json.mdx @@ -38,7 +38,7 @@ A document payload for an `eql_v3.json` column: ```json { - "v": 2, + "v": 3, "k": "sv", "i": { "t": "orders", "c": "metadata" }, "sv": [ diff --git a/content/docs/reference/eql/numbers-and-dates.mdx b/content/docs/reference/eql/numbers-and-dates.mdx index 1d01dec..a78052f 100644 --- a/content/docs/reference/eql/numbers-and-dates.mdx +++ b/content/docs/reference/eql/numbers-and-dates.mdx @@ -13,16 +13,29 @@ There is no free-text matching for these types โ€” `_match` and `_search` are [t ## Variants -Each of the eight scalar types generates the same four `jsonb`-backed domain variants: - -| Domain variant | Capability | Index term carried | -| --- | --- | --- | -| `eql_v3.` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | -| `eql_v3._eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (HMAC-256) | -| `eql_v3._ord` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, and `MIN` / `MAX`. | `ob` (ORE block) | -| `eql_v3._ord_ore` | Identical to `_ord` โ€” a twin name that documents intent. | `ob` (ORE block) | - -Declare only the capability you query on โ€” each index term class reveals different structure to an observer (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): +Each of the eight scalar types generates the same `jsonb`-backed domain variants. The generic form: + +| Domain variant | Capability | +| --- | --- | +| `eql_v3.` | Storage and decryption only. | +| `eql_v3._eq` | Equality: `=`, `<>`, `IN`, `GROUP BY`, `DISTINCT`, equijoins. | +| `eql_v3._ord` | Comparisons, `BETWEEN`, `ORDER BY`, `MIN` / `MAX` โ€” plus equality. | +| `eql_v3._ord_ore` | As `_ord`, with the ORE mechanism pinned โ€” see [SEM specifiers](#sem-specifiers). | + +And every concrete domain this page covers: + +| Type | Variants | +| --- | --- | +| `int2` | `eql_v3.int2` ยท `eql_v3.int2_eq` ยท `eql_v3.int2_ord` ยท `eql_v3.int2_ord_ore` | +| `int4` | `eql_v3.int4` ยท `eql_v3.int4_eq` ยท `eql_v3.int4_ord` ยท `eql_v3.int4_ord_ore` | +| `int8` | `eql_v3.int8` ยท `eql_v3.int8_eq` ยท `eql_v3.int8_ord` ยท `eql_v3.int8_ord_ore` | +| `float4` | `eql_v3.float4` ยท `eql_v3.float4_eq` ยท `eql_v3.float4_ord` ยท `eql_v3.float4_ord_ore` | +| `float8` | `eql_v3.float8` ยท `eql_v3.float8_eq` ยท `eql_v3.float8_ord` ยท `eql_v3.float8_ord_ore` | +| `numeric` | `eql_v3.numeric` ยท `eql_v3.numeric_eq` ยท `eql_v3.numeric_ord` ยท `eql_v3.numeric_ord_ore` | +| `date` | `eql_v3.date` ยท `eql_v3.date_eq` ยท `eql_v3.date_ord` ยท `eql_v3.date_ord_ore` | +| `timestamp` | `eql_v3.timestamp` ยท `eql_v3.timestamp_eq` ยท `eql_v3.timestamp_ord` ยท `eql_v3.timestamp_ord_ore` | + +Declare only the capability you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): ```sql CREATE TABLE employees ( @@ -35,13 +48,24 @@ CREATE TABLE employees ( ); ``` +### SEM specifiers + +All eight types take the same mechanism specifiers on their orderable variant (the concept is defined in [Core concepts](/reference/eql/core-concepts#sem-specifiers)): + +| Specifier | Meaning | +| --- | --- | +| `_ord` | Orderable, using EQL's default mechanism (currently ORE). | +| `_ord_ore` | Orderable via ORE, pinned explicitly. | + +The EQL v3 release adds an OPE specifier for every orderable type; unspecified `_ord` columns keep tracking the default. + ## Payload A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” see [Core concepts](/reference/eql/core-concepts)) plus the `ob` ordering term. Here is a payload for the `eql_v3.int8_ord` `salary` column: ```json { - "v": 2, + "v": 3, "i": { "t": "employees", "c": "salary" }, "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", "ob": [ diff --git a/content/docs/reference/eql/text.mdx b/content/docs/reference/eql/text.mdx index c70ee13..25228c9 100644 --- a/content/docs/reference/eql/text.mdx +++ b/content/docs/reference/eql/text.mdx @@ -13,16 +13,16 @@ Text is the richest encrypted scalar. Beyond the four variants every scalar type All six are `jsonb`-backed domains. Which one you declare fixes the column's query capability โ€” the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): -| Domain variant | Capability | Index terms carried | -| --- | --- | --- | -| `eql_v3.text` | Storage and decryption only. Every comparison operator is blocked โ€” only `IS NULL` / `IS NOT NULL` work. | none | -| `eql_v3.text_eq` | Equality: `=` and `<>` (plus `IN`, `GROUP BY`, `DISTINCT`, equijoins). | `hm` (HMAC-256) | -| `eql_v3.text_ord` | Full comparison surface: `=` `<>` `<` `<=` `>` `>=`, `BETWEEN`, `ORDER BY`, `MIN` / `MAX`. | `ob` (ORE block) | -| `eql_v3.text_ord_ore` | Identical to `text_ord` โ€” a twin name that documents intent. | `ob` (ORE block) | -| `eql_v3.text_match` | Encrypted free-text token containment via `@>` / `<@`. No equality, no ordering. | `bf` (bloom filter) | -| `eql_v3.text_search` | Everything: equality, ordering, and token containment combined. | `hm` + `ob` + `bf` | +| Domain variant | Capability | +| --- | --- | +| `eql_v3.text` | Storage and decryption only. | +| `eql_v3.text_eq` | Equality: `=`, `<>`, `IN`, `GROUP BY`, `DISTINCT`, equijoins. | +| `eql_v3.text_ord` | Comparisons, `BETWEEN`, `ORDER BY`, `MIN` / `MAX` โ€” plus equality. | +| `eql_v3.text_ord_ore` | As `text_ord`, with the ORE mechanism pinned โ€” see [SEM specifiers](#sem-specifiers). | +| `eql_v3.text_match` | Free-text token containment: `@>` / `<@`. | +| `eql_v3.text_search` | Equality + ordering + token containment. | -Declare only the capabilities you query on โ€” each term class reveals different structure to an observer: equality terms reveal value repetition, ORE terms reveal ordering, bloom terms reveal token overlap (see [Searchable encryption](/concepts/searchable-encryption)): +Declare only the capabilities you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)): ```sql CREATE TABLE users ( @@ -34,13 +34,24 @@ CREATE TABLE users ( ); ``` +### SEM specifiers + +Text takes the same mechanism specifiers as the other orderable types (the concept is defined in [Core concepts](/reference/eql/core-concepts#sem-specifiers)): + +| Specifier | Meaning | +| --- | --- | +| `_ord` | Orderable, using EQL's default mechanism (currently ORE). | +| `_ord_ore` | Orderable via ORE, pinned explicitly. | + +The EQL v3 release adds an OPE specifier for every orderable type โ€” including `text` โ€” so lexicographic ordering can be pinned to either mechanism; unspecified `_ord` columns keep tracking the default. + ## Payload A value for a `text_search` column carries the shared envelope keys (`v`, `i`, `c` โ€” see [Core concepts](/reference/eql/core-concepts)) plus all three index terms: ```json { - "v": 2, + "v": 3, "i": { "t": "users", "c": "email" }, "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", "hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790", From 127f0a6ce3d55ae6bdb1ffc27c55f75ebdafa4ec Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 21:27:44 +1000 Subject: [PATCH 4/6] refactor(v2): split numbers/dates pages, Example headings, separate Operators and Functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review feedback: - Dates & times split out of Numbers โ€” same traits, distinct semantics; each page's examples now match its domain (payroll vs audit-event time windows / retention cutoffs / newest-first) - CREATE TABLE examples get an explicit "Example" sub-heading + lead-in - Operators and Functions are separate sections on every type page โ€” operators as the per-variant support matrix, functions as the form-equivalents table (+ MIN/MAX, which only exist as functions) - IA.md: split reflected; query-performance follow-up added (CIP-3351 โ€” the v3 branch already folded the v2 perf guide into database-indexes.md, which our indexes page absorbed) Claude-Session: https://claude.ai/code/session_01ACPpFPHvKtrV48nbEYuv7P --- IA.md | 7 +- content/docs/reference/eql/booleans.mdx | 2 +- content/docs/reference/eql/core-concepts.mdx | 2 +- .../docs/reference/eql/dates-and-times.mdx | 151 ++++++++++++++++++ content/docs/reference/eql/filtering.mdx | 2 +- content/docs/reference/eql/index.mdx | 7 +- content/docs/reference/eql/meta.json | 3 +- .../{numbers-and-dates.mdx => numbers.mdx} | 89 +++++------ content/docs/reference/eql/sorting.mdx | 2 +- content/docs/reference/eql/text.mdx | 41 +++-- 10 files changed, 239 insertions(+), 67 deletions(-) create mode 100644 content/docs/reference/eql/dates-and-times.mdx rename content/docs/reference/eql/{numbers-and-dates.mdx => numbers.mdx} (59%) diff --git a/IA.md b/IA.md index 6db1dee..d5736a9 100644 --- a/IA.md +++ b/IA.md @@ -137,7 +137,9 @@ live at `/docs/errors/` โ€” permanent, never restructured (CIP-3338). - [x] `/reference/eql` โ€” install (single SQL file, permissions split, dbdev, Docker) - [x] `/reference/eql/core-concepts` โ€” variant model, payload anatomy (absorbs cipher-cell), typed-operand rule, fail-loud blockers, term leakage pointer -- [x] `/reference/eql/numbers-and-dates` โ€” int*/float*/numeric/date/timestamp +- [x] `/reference/eql/numbers` โ€” int*/float*/numeric +- [x] `/reference/eql/dates-and-times` โ€” date/timestamp (same traits as numbers, + distinct semantics) - [x] `/reference/eql/text` โ€” all six text variants; owns the no-LIKE treatment - [x] `/reference/eql/json` โ€” ste_vec + sv payload shape + containment/path queries - [x] `/reference/eql/booleans` โ€” storage-only variants (bool has only that one) @@ -146,6 +148,9 @@ live at `/docs/errors/` โ€” permanent, never restructured (CIP-3338). - [x] `/reference/eql/sorting` โ€” ORDER BY, extractor sort-key form, pagination - [x] `/reference/eql/grouping-and-aggregates` โ€” GROUP BY/DISTINCT, min/max, no SUM/AVG - [x] `/reference/eql/joins` โ€” equijoins, the same-keyset constraint +- [ ] โ›” `/reference/eql/query-performance` โ€” port the EQL repo performance guide once + rewritten for v3 upstream (v3 branch folded it into database-indexes.md; verify + nothing from the v2 guide on main was lost) โ€” see CIP-3351 - **Stack SDK:** - [ ] `/reference/stack` โ€” client + configuration (port encryption/* pages) - [ ] `/reference/stack/schema` diff --git a/content/docs/reference/eql/booleans.mdx b/content/docs/reference/eql/booleans.mdx index 390dc97..5403fc9 100644 --- a/content/docs/reference/eql/booleans.mdx +++ b/content/docs/reference/eql/booleans.mdx @@ -59,4 +59,4 @@ For every type other than `bool`, storage-only is a choice you can walk back. If SELECT * FROM readings WHERE value::eql_v3.int4_ord > $1::eql_v3.int4_ord; ``` -The variant families and what each one enables are covered in [Core concepts](/reference/eql/core-concepts); the per-type specifics live in [Numbers and dates](/reference/eql/numbers-and-dates) and [Text](/reference/eql/text). +The variant families and what each one enables are covered in [Core concepts](/reference/eql/core-concepts); the per-type specifics live in [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), and [Text](/reference/eql/text). diff --git a/content/docs/reference/eql/core-concepts.mdx b/content/docs/reference/eql/core-concepts.mdx index 11b6a0f..427bc7b 100644 --- a/content/docs/reference/eql/core-concepts.mdx +++ b/content/docs/reference/eql/core-concepts.mdx @@ -51,7 +51,7 @@ CREATE TABLE users ( ); ``` -Every scalar type โ€” `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool` in EQL 3.0.0 โ€” ships some subset of this family. The per-category pages list exactly which variants each type has and how to choose between them: [Numbers and dates](/reference/eql/numbers-and-dates), [Text](/reference/eql/text), and [Booleans](/reference/eql/booleans). Encrypted JSON documents use a separate domain, `eql_v3.json`, with its own operator surface โ€” see [JSON](/reference/eql/json). +Every scalar type โ€” `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool` in EQL 3.0.0 โ€” ships some subset of this family. The per-category pages list exactly which variants each type has and how to choose between them: [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), [Text](/reference/eql/text), and [Booleans](/reference/eql/booleans). Encrypted JSON documents use a separate domain, `eql_v3.json`, with its own operator surface โ€” see [JSON](/reference/eql/json). ## Anatomy of an encrypted value diff --git a/content/docs/reference/eql/dates-and-times.mdx b/content/docs/reference/eql/dates-and-times.mdx new file mode 100644 index 0000000..1014599 --- /dev/null +++ b/content/docs/reference/eql/dates-and-times.mdx @@ -0,0 +1,151 @@ +--- +title: Dates & times +description: "The complete reference for encrypted date and timestamp columns: the domain variants, the ORE-backed payload, and time-window, newest-first, and MIN/MAX queries." +type: reference +components: [eql] +verifiedAgainst: + eql: "3.0.0" +--- + +`date` and `timestamp` columns carry the same capabilities as [encrypted numbers](/reference/eql/numbers) โ€” equality, ranges, ordering, `MIN` / `MAX` โ€” but the queries they serve are temporal: time windows, newest-first listings, retention cutoffs, "when did this last happen". + +## Variants + +Both types generate the same `jsonb`-backed domain variants. The generic form: + +| Domain variant | Capability | +| --- | --- | +| `eql_v3.` | Storage and decryption only. | +| `eql_v3._eq` | Equality: `=`, `<>`, `IN`, `GROUP BY`, `DISTINCT`, equijoins. | +| `eql_v3._ord` | Comparisons, `BETWEEN`, `ORDER BY`, `MIN` / `MAX` โ€” plus equality. | +| `eql_v3._ord_ore` | As `_ord`, with the ORE mechanism pinned โ€” see [SEM specifiers](#sem-specifiers). | + +And every concrete domain this page covers: + +| Type | Variants | +| --- | --- | +| `date` | `eql_v3.date` ยท `eql_v3.date_eq` ยท `eql_v3.date_ord` ยท `eql_v3.date_ord_ore` | +| `timestamp` | `eql_v3.timestamp` ยท `eql_v3.timestamp_eq` ยท `eql_v3.timestamp_ord` ยท `eql_v3.timestamp_ord_ore` | + +Time columns are nearly always ranged and sorted, so `_ord` is the usual choice. Declare only the capability you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts). + +### Example + +An audit-events table where the timestamps drive time-window queries and sorting: + +```sql +CREATE TABLE audit_events ( + id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + occurred_at eql_v3.timestamp_ord, -- time windows, newest-first, MIN/MAX + review_due eql_v3.date_ord, -- range filters + sealed_on eql_v3.date -- store and decrypt only +); +``` + +### SEM specifiers + +Both types take the same mechanism specifiers on their orderable variant (the concept is defined in [Core concepts](/reference/eql/core-concepts#sem-specifiers)): + +| Specifier | Meaning | +| --- | --- | +| `_ord` | Orderable, using EQL's default mechanism (currently ORE). | +| `_ord_ore` | Orderable via ORE, pinned explicitly. | + +The EQL v3 release adds an OPE specifier for every orderable type; unspecified `_ord` columns keep tracking the default. + +## Payload + +A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” see [Core concepts](/reference/eql/core-concepts)) plus the `ob` ordering term. Here is a payload for the `eql_v3.timestamp_ord` `occurred_at` column: + +```json +{ + "v": 3, + "i": { "t": "audit_events", "c": "occurred_at" }, + "c": "mBbKmsMM%bK#QQOx1yLDBHyD...", + "ob": [ + "7a1fd0c2...", "d24c9be1...", "03fa66b8...", "91b7e04d...", + "5c28aa19...", "e6f3071c...", "48d92ab5...", "0b64cf37...", + "2ce8b1f4...", "a90d57e2...", "6f13c8ba...", "d4720e95..." + ] +} +``` + +- **`ob` is the only index term.** An `_ord` payload carries no `hm`: equality on `_ord` variants compares ORE terms, which collapse to equality โ€” see [Core concepts](/reference/eql/core-concepts). +- **The `ob` block count varies with the plaintext width** โ€” `timestamp` values carry 12 blocks. + +## Operators + +| SQL operator | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | +| --- | :---: | :---: | :---: | +| `=` / `<>` | โŒ | โœ… | โœ… | +| `<` `<=` `>` `>=` | โŒ | โŒ | โœ… | +| `BETWEEN` (desugars to `>=` and `<=`) | โŒ | โŒ | โœ… | +| `IN` (desugars to `=`) | โŒ | โœ… | โœ… | +| `GROUP BY` / `DISTINCT` | โŒ | โœ… | โœ… | +| `ORDER BY` | โŒ | โŒ | โœ… | +| `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | + +Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.timestamp_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). + +## Functions + +Every operator has a function form, for managed platforms that disallow custom operators โ€” same typed arguments, identical resolution. The `MIN` / `MAX` aggregates only exist as functions: + +| Function | Equivalent | Available on | +| --- | --- | --- | +| `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | `=` / `<>` | `_eq`, `_ord` / `_ord_ore` | +| `eql_v3.lt` / `lte` / `gt` / `gte` | `<` `<=` `>` `>=` | `_ord` / `_ord_ore` | +| `eql_v3.min(col)` / `eql_v3.max(col)` | aggregate `MIN` / `MAX` | `_ord` / `_ord_ore` | + +## Example queries + +### Time window + +```sql +SELECT * FROM audit_events +WHERE occurred_at BETWEEN $1::eql_v3.timestamp_ord AND $2::eql_v3.timestamp_ord; + +SELECT * FROM audit_events +WHERE review_due BETWEEN $1::eql_v3.date_ord AND $2::eql_v3.date_ord; +``` + +### Retention cutoff + +```sql +SELECT id FROM audit_events +WHERE occurred_at < $1::eql_v3.timestamp_ord; +``` + +### Newest-first listing + +Write the sort key in extractor form to stream rows out of the index already ordered โ€” at large row counts this is the difference between seconds and milliseconds (see [Sorting](/reference/eql/sorting)): + +```sql +SELECT * FROM audit_events +WHERE occurred_at >= $1::eql_v3.timestamp_ord +ORDER BY eql_v3.ord_term(occurred_at) DESC +LIMIT 10; +``` + +### First and last event + +```sql +SELECT eql_v3.min(occurred_at), eql_v3.max(occurred_at) FROM audit_events; +``` + +## Where to next + + + + The same capabilities on int, float, and numeric columns. + + + Btree recipes on `eql_v3.ord_term` for range, ORDER BY, and MIN/MAX. + + + Why the extractor-form sort key matters, and how to verify with EXPLAIN. + + + WHERE-clause patterns across all encrypted types. + + diff --git a/content/docs/reference/eql/filtering.mdx b/content/docs/reference/eql/filtering.mdx index 6f1b779..9fe45a2 100644 --- a/content/docs/reference/eql/filtering.mdx +++ b/content/docs/reference/eql/filtering.mdx @@ -25,7 +25,7 @@ On `_eq` and `text_search` columns equality compares the HMAC (`hm`) term. On `_ SELECT * FROM users WHERE salary = $1::eql_v3.int8_ord; ``` -Bare storage-only variants (`eql_v3.text`, `eql_v3.int4`, โ€ฆ) block every comparison โ€” see the type pages for what each variant supports: [Numbers & dates](/reference/eql/numbers-and-dates), [Text](/reference/eql/text), [Booleans](/reference/eql/booleans). +Bare storage-only variants (`eql_v3.text`, `eql_v3.int4`, โ€ฆ) block every comparison โ€” see the type pages for what each variant supports: [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), [Text](/reference/eql/text), [Booleans](/reference/eql/booleans). ## `IN` lists diff --git a/content/docs/reference/eql/index.mdx b/content/docs/reference/eql/index.mdx index e3d7f0f..6b9ce9e 100644 --- a/content/docs/reference/eql/index.mdx +++ b/content/docs/reference/eql/index.mdx @@ -106,8 +106,11 @@ EQL v3 is designed to install without superuser. There are no custom operator cl Domain variants, the encrypted payload, typed operands, and fail-loud blockers โ€” the model every other page assumes. - - Encrypted integers, floats, numerics, dates, and timestamps. + + Encrypted integers, floats, and numerics. + + + Encrypted dates and timestamps: time windows, newest-first, retention cutoffs. Encrypted text: equality, ordering, and free-text token matching โ€” and why there is no `LIKE`. diff --git a/content/docs/reference/eql/meta.json b/content/docs/reference/eql/meta.json index 3f4469e..f4268ce 100644 --- a/content/docs/reference/eql/meta.json +++ b/content/docs/reference/eql/meta.json @@ -3,7 +3,8 @@ "pages": [ "core-concepts", "---Types---", - "numbers-and-dates", + "numbers", + "dates-and-times", "text", "json", "booleans", diff --git a/content/docs/reference/eql/numbers-and-dates.mdx b/content/docs/reference/eql/numbers.mdx similarity index 59% rename from content/docs/reference/eql/numbers-and-dates.mdx rename to content/docs/reference/eql/numbers.mdx index a78052f..cda9471 100644 --- a/content/docs/reference/eql/numbers-and-dates.mdx +++ b/content/docs/reference/eql/numbers.mdx @@ -1,19 +1,19 @@ --- -title: Numbers & dates -description: "The complete reference for encrypted numeric and date/time columns: the int, float, numeric, date, and timestamp domain variants, the ORE-backed payload they carry, and range, ORDER BY, and MIN/MAX queries." +title: Numbers +description: "The complete reference for encrypted numeric columns: the int, float, and numeric domain variants, the ORE-backed payload they carry, and range, ORDER BY, and MIN/MAX queries." type: reference components: [eql] verifiedAgainst: eql: "3.0.0" --- -Eight scalar types share one identical query surface: `int2`, `int4`, `int8`, `float4`, `float8`, `numeric`, `date`, and `timestamp`. These are the columns you filter by range, sort newest-first, and take a `MIN` / `MAX` over โ€” salaries, totals, rates, hire dates, timestamps. Everything on this page applies to all eight; only the domain name changes. +Six numeric types share one identical query surface: `int2`, `int4`, `int8`, `float4`, `float8`, and `numeric`. These are the columns you filter by range, sort, and take a `MIN` / `MAX` over โ€” salaries, totals, rates, quantities. -There is no free-text matching for these types โ€” `_match` and `_search` are [text-only variants](/reference/eql/text). Boolean columns are a separate, storage-only story โ€” see [Booleans](/reference/eql/booleans). +Date and time columns have the same capabilities but their own semantics โ€” see [Dates & times](/reference/eql/dates-and-times). There is no free-text matching for numeric types โ€” `_match` and `_search` are [text-only variants](/reference/eql/text). ## Variants -Each of the eight scalar types generates the same `jsonb`-backed domain variants. The generic form: +Each numeric type generates the same `jsonb`-backed domain variants. The generic form: | Domain variant | Capability | | --- | --- | @@ -32,25 +32,25 @@ And every concrete domain this page covers: | `float4` | `eql_v3.float4` ยท `eql_v3.float4_eq` ยท `eql_v3.float4_ord` ยท `eql_v3.float4_ord_ore` | | `float8` | `eql_v3.float8` ยท `eql_v3.float8_eq` ยท `eql_v3.float8_ord` ยท `eql_v3.float8_ord_ore` | | `numeric` | `eql_v3.numeric` ยท `eql_v3.numeric_eq` ยท `eql_v3.numeric_ord` ยท `eql_v3.numeric_ord_ore` | -| `date` | `eql_v3.date` ยท `eql_v3.date_eq` ยท `eql_v3.date_ord` ยท `eql_v3.date_ord_ore` | -| `timestamp` | `eql_v3.timestamp` ยท `eql_v3.timestamp_eq` ยท `eql_v3.timestamp_ord` ยท `eql_v3.timestamp_ord_ore` | -Declare only the capability you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts): +Declare only the capability you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)), and the variant model itself is covered in [Core concepts](/reference/eql/core-concepts). + +### Example + +A payroll table mixing the variants by how each column is queried: ```sql CREATE TABLE employees ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, salary eql_v3.int8_ord, -- range queries, ORDER BY, MIN/MAX tax_rate eql_v3.numeric_eq, -- exact lookup only - net_worth eql_v3.numeric, -- store and decrypt only, never queried - hired_on eql_v3.date_ord, - created_at eql_v3.timestamp_ord + net_worth eql_v3.numeric -- store and decrypt only, never queried ); ``` ### SEM specifiers -All eight types take the same mechanism specifiers on their orderable variant (the concept is defined in [Core concepts](/reference/eql/core-concepts#sem-specifiers)): +All six types take the same mechanism specifiers on their orderable variant (the concept is defined in [Core concepts](/reference/eql/core-concepts#sem-specifiers)): | Specifier | Meaning | | --- | --- | @@ -76,25 +76,32 @@ A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” ``` - **`ob` is the only index term.** An `_ord` payload carries no `hm`: equality on `_ord` variants compares ORE terms, which collapse to equality โ€” see [Core concepts](/reference/eql/core-concepts). Only `_eq` payloads carry `hm` (a single hex HMAC-SHA-256 string) instead of `ob`. -- **The `ob` block count varies with the plaintext width**: 8 blocks for the int scalars, 12 for `timestamp`, 14 for `numeric` โ€” the array just carries more block strings. - -## Operators and functions +- **The `ob` block count varies with the plaintext width**: 8 blocks for the int types, 14 for `numeric`. -The function forms exist for managed platforms that disallow custom operators โ€” they take the same typed arguments and resolve identically. +## Operators -| SQL operator | Function form | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | -| --- | --- | :---: | :---: | :---: | -| `=` / `<>` | `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | โŒ | โœ… | โœ… | -| `<` `<=` `>` `>=` | `eql_v3.lt` / `lte` / `gt` / `gte` | โŒ | โŒ | โœ… | -| `BETWEEN` | desugars to `>=` and `<=` | โŒ | โŒ | โœ… | -| `IN` | desugars to `=` | โŒ | โœ… | โœ… | -| `GROUP BY` / `DISTINCT` | โ€” (needs an equality term) | โŒ | โœ… | โœ… | -| `ORDER BY` | sort key: `eql_v3.ord_term(col)` | โŒ | โŒ | โœ… | -| `MIN` / `MAX` | `eql_v3.min(col)` / `eql_v3.max(col)` | โŒ | โŒ | โœ… | -| `IS NULL` / `IS NOT NULL` | โ€” | โœ… | โœ… | โœ… | +| SQL operator | `eql_v3.` | `_eq` | `_ord` / `_ord_ore` | +| --- | :---: | :---: | :---: | +| `=` / `<>` | โŒ | โœ… | โœ… | +| `<` `<=` `>` `>=` | โŒ | โŒ | โœ… | +| `BETWEEN` (desugars to `>=` and `<=`) | โŒ | โŒ | โœ… | +| `IN` (desugars to `=`) | โŒ | โœ… | โœ… | +| `GROUP BY` / `DISTINCT` | โŒ | โœ… | โœ… | +| `ORDER BY` | โŒ | โŒ | โœ… | +| `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.int8_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). +## Functions + +Every operator has a function form, for managed platforms that disallow custom operators โ€” same typed arguments, identical resolution. The `MIN` / `MAX` aggregates only exist as functions: + +| Function | Equivalent | Available on | +| --- | --- | --- | +| `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | `=` / `<>` | `_eq`, `_ord` / `_ord_ore` | +| `eql_v3.lt` / `lte` / `gt` / `gte` | `<` `<=` `>` `>=` | `_ord` / `_ord_ore` | +| `eql_v3.min(col)` / `eql_v3.max(col)` | aggregate `MIN` / `MAX` | `_ord` / `_ord_ore` | + **`SUM`, `AVG`, and other arithmetic aggregates are not supported** on encrypted columns โ€” they would require homomorphic encryption. `MIN` / `MAX` work because they only need comparison; for sums and averages, decrypt at the application boundary and aggregate client-side. ## Example queries @@ -109,35 +116,25 @@ SELECT * FROM employees WHERE salary BETWEEN $1::eql_v3.int8_ord AND $2::eql_v3.int8_ord; ``` -### Date window +### MIN and MAX -`BETWEEN` works the same on `date` and `timestamp` columns: +`eql_v3.min` / `eql_v3.max` compare ORE terms โ€” no decryption happens in the database, and the encrypted result decrypts in the client. `NULL` inputs are skipped; an all-`NULL` input set returns `NULL`: ```sql -SELECT * FROM employees -WHERE hired_on BETWEEN $1::eql_v3.date_ord AND $2::eql_v3.date_ord; +SELECT eql_v3.min(salary) FROM employees; +SELECT eql_v3.max(salary) FROM employees; ``` -### Newest-first listing +### Sorted listing -Bare `ORDER BY created_at` sorts correctly, but the planner doesn't rewrite sort keys, so it adds a `Sort` node even when a btree index exists. Write the sort key in extractor form to stream rows out of the index already ordered โ€” at large row counts this is the difference between seconds and milliseconds (see [Sorting](/reference/eql/sorting)): +Write the sort key in extractor form to stream rows out of the index already ordered (see [Sorting](/reference/eql/sorting) for why): ```sql SELECT * FROM employees -WHERE created_at >= $1::eql_v3.timestamp_ord -ORDER BY eql_v3.ord_term(created_at) DESC +ORDER BY eql_v3.ord_term(salary) DESC LIMIT 10; ``` -### MIN and MAX - -`eql_v3.min` / `eql_v3.max` compare ORE terms โ€” no decryption happens in the database, and the encrypted result decrypts in the client. `NULL` inputs are skipped; an all-`NULL` input set returns `NULL`: - -```sql -SELECT eql_v3.min(salary) FROM employees; -SELECT eql_v3.max(created_at) FROM employees; -``` - ### Cast at the call site On a generic `jsonb` column whose payloads already carry the `ob` term, cast to the right domain in the query: @@ -149,15 +146,15 @@ SELECT eql_v3.min(salary_jsonb::eql_v3.int8_ord) FROM employees; ## Where to next + + The same capabilities on date and timestamp columns. + Btree recipes on `eql_v3.ord_term` for range, ORDER BY, and MIN/MAX. WHERE-clause patterns across all encrypted types. - - Why the extractor-form sort key matters, and how to verify with EXPLAIN. - GROUP BY, DISTINCT, and the aggregate surface on encrypted columns. diff --git a/content/docs/reference/eql/sorting.mdx b/content/docs/reference/eql/sorting.mdx index b684f3e..eb34f12 100644 --- a/content/docs/reference/eql/sorting.mdx +++ b/content/docs/reference/eql/sorting.mdx @@ -7,7 +7,7 @@ verifiedAgainst: eql: "3.0.0" --- -`ORDER BY` on an encrypted column needs an ORE ordering term: it works on `_ord` / `_ord_ore` variants of every scalar and on `text_search`. ORE terms are order-preserving, so the database sorts ciphertext in exactly the order the plaintext would sort โ€” without decrypting anything. Which variants carry the term is covered in [Numbers & dates](/reference/eql/numbers-and-dates) and [Text](/reference/eql/text); the variant model itself is in [Core concepts](/reference/eql/core-concepts). +`ORDER BY` on an encrypted column needs an ORE ordering term: it works on `_ord` / `_ord_ore` variants of every scalar and on `text_search`. ORE terms are order-preserving, so the database sorts ciphertext in exactly the order the plaintext would sort โ€” without decrypting anything. Which variants carry the term is covered in [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), and [Text](/reference/eql/text); the variant model itself is in [Core concepts](/reference/eql/core-concepts). Sorting a variant *without* an ORE term (`_eq`, `text_match`, bare storage variants) won't raise โ€” but the order is meaningless. Type the column as an `_ord` variant when ordering matters. diff --git a/content/docs/reference/eql/text.mdx b/content/docs/reference/eql/text.mdx index 25228c9..9fe995f 100644 --- a/content/docs/reference/eql/text.mdx +++ b/content/docs/reference/eql/text.mdx @@ -22,7 +22,11 @@ All six are `jsonb`-backed domains. Which one you declare fixes the column's que | `eql_v3.text_match` | Free-text token containment: `@>` / `<@`. | | `eql_v3.text_search` | Equality + ordering + token containment. | -Declare only the capabilities you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)): +Declare only the capabilities you query on โ€” each capability stores extra searchable material with defined leakage (see [Searchable encryption](/concepts/searchable-encryption)). + +### Example + +A users table mixing the variants by how each column is queried: ```sql CREATE TABLE users ( @@ -68,22 +72,33 @@ The narrower variants carry only their own term: a `text_eq` payload carries `hm **`bf` positions are signed**: EQL stores the filter as PostgreSQL `smallint[]`, and filters sized above 32768 emit upper-half bit positions as *negative* signed values. Consumers must use a signed 16-bit integer type. -## Operators and functions +## Operators -The function forms exist for managed platforms that disallow custom operators โ€” they take the same typed arguments and resolve identically. - -| SQL operator | Function form | `eql_v3.text` | `text_eq` | `text_ord` / `text_ord_ore` | `text_match` | `text_search` | -| --- | --- | :---: | :---: | :---: | :---: | :---: | -| `=` / `<>` | `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | โŒ | โœ… | โœ… | โŒ | โœ… | -| `<` `<=` `>` `>=` | `eql_v3.lt` / `lte` / `gt` / `gte` | โŒ | โŒ | โœ… | โŒ | โœ… | -| `@>` / `<@` | `eql_v3.contains(a, b)` / `eql_v3.contained_by(a, b)` | โŒ | โŒ | โŒ | โœ… | โœ… | -| `LIKE` / `ILIKE` (`~~` / `~~*`) | none | โŒ | โŒ | โŒ | โŒ | โŒ | -| `IN` / `GROUP BY` / `DISTINCT` | desugar to `=` / need an equality term | โŒ | โœ… | โœ… | โŒ | โœ… | -| `ORDER BY`, `MIN` / `MAX` | `eql_v3.min(col)` / `eql_v3.max(col)` | โŒ | โŒ | โœ… | โŒ | โœ… | -| `IS NULL` / `IS NOT NULL` | โ€” | โœ… | โœ… | โœ… | โœ… | โœ… | +| SQL operator | `eql_v3.text` | `text_eq` | `text_ord` / `text_ord_ore` | `text_match` | `text_search` | +| --- | :---: | :---: | :---: | :---: | :---: | +| `=` / `<>` | โŒ | โœ… | โœ… | โŒ | โœ… | +| `<` `<=` `>` `>=` | โŒ | โŒ | โœ… | โŒ | โœ… | +| `@>` / `<@` | โŒ | โŒ | โŒ | โœ… | โœ… | +| `LIKE` / `ILIKE` (`~~` / `~~*`) | โŒ | โŒ | โŒ | โŒ | โŒ | +| `IN` / `GROUP BY` / `DISTINCT` | โŒ | โœ… | โœ… | โŒ | โœ… | +| `ORDER BY` | โŒ | โŒ | โœ… | โŒ | โœ… | +| `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | โœ… | โœ… | Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.text_eq`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). +## Functions + +Every operator has a function form, for managed platforms that disallow custom operators โ€” same typed arguments, identical resolution. The `MIN` / `MAX` aggregates only exist as functions: + +| Function | Equivalent | Available on | +| --- | --- | --- | +| `eql_v3.eq(a, b)` / `eql_v3.neq(a, b)` | `=` / `<>` | `text_eq`, `text_ord` / `text_ord_ore`, `text_search` | +| `eql_v3.lt` / `lte` / `gt` / `gte` | `<` `<=` `>` `>=` | `text_ord` / `text_ord_ore`, `text_search` | +| `eql_v3.contains(a, b)` / `eql_v3.contained_by(a, b)` | `@>` / `<@` | `text_match`, `text_search` | +| `eql_v3.min(col)` / `eql_v3.max(col)` | aggregate `MIN` / `MAX` | `text_ord` / `text_ord_ore`, `text_search` | + +There are no `like` / `ilike` function forms โ€” encrypted text matching is `eql_v3.contains` on a `text_match` value. + ## There is no `LIKE` `LIKE` and `ILIKE` (`~~` / `~~*`) raise on **every** encrypted-domain variant โ€” including `text_match` and `text_search`. SQL pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment โ€” `@>` on a `text_match` or `text_search` column: From 123a544c36d09d456ae86c1477710db5f743f003 Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 21:39:31 +1000 Subject: [PATCH 5/6] docs(v2): track EQL 3.0.0 release-alignment gate in IA.md (CIP-3352) Claude-Session: https://claude.ai/code/session_01ACPpFPHvKtrV48nbEYuv7P --- IA.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/IA.md b/IA.md index d5736a9..6a13da8 100644 --- a/IA.md +++ b/IA.md @@ -184,5 +184,9 @@ live at `/docs/errors/` โ€” permanent, never restructured (CIP-3338). - [ ] OG images for v2 pages (route only covers legacy tree) - [ ] Correctness CI: snippet type-checking, SQL-vs-EQL-Docker, terminology lint (CIP-3337) - [ ] llms.txt curation + Cloudflare AI crawl policy + md-degradation check (CIP-3339) +- [ ] โ›” EQL 3.0.0 release alignment (CIP-3352, blocks CIP-3335) โ€” the EQL reference + documents the release as decided, ahead of the eql_v3 branch: payload `v: 3`, + OPE SEM specifier, Docker tag `:17-3.0.0`, `version()` output, schema files. + Each must land upstream or be walked back in the docs before merge - [ ] Flip `ENABLE_V2_REDIRECTS=1`, delete `content/stack` + `/stack` routes + legacy loader (CIP-3335) - [ ] Consistency sweep + Supabase listing v3 revision (CIP-3335) From 896eae51206add738221ed869cb8a5e0eec7321f Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Thu, 2 Jul 2026 22:07:13 +1000 Subject: [PATCH 6/6] docs(v2): address Copilot review on EQL reference (PR #38) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - indexes.mdx: cast query-shape example params to their EQL domain types, consistent with the typed-operand rule - numbers/dates-and-times/text: the fail-loud note now scopes to operators โ€” ORDER BY on a variant without an ordering term doesn't raise, it silently returns a meaningless order (links Sorting) --- content/docs/reference/eql/dates-and-times.mdx | 2 +- content/docs/reference/eql/indexes.mdx | 6 +++--- content/docs/reference/eql/numbers.mdx | 2 +- content/docs/reference/eql/text.mdx | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/content/docs/reference/eql/dates-and-times.mdx b/content/docs/reference/eql/dates-and-times.mdx index 1014599..3af2b9c 100644 --- a/content/docs/reference/eql/dates-and-times.mdx +++ b/content/docs/reference/eql/dates-and-times.mdx @@ -85,7 +85,7 @@ A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” | `ORDER BY` | โŒ | โŒ | โœ… | | `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | -Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.timestamp_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). +Blocked *operator* cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. `ORDER BY` is the one blocked cell that doesn't raise: it isn't an operator, so sorting a variant without an ordering term runs โ€” but the order is meaningless (see [Sorting](/reference/eql/sorting)). Operands must be typed (`$1::eql_v3.timestamp_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). ## Functions diff --git a/content/docs/reference/eql/indexes.mdx b/content/docs/reference/eql/indexes.mdx index 3d1df3e..ff4b1fe 100644 --- a/content/docs/reference/eql/indexes.mdx +++ b/content/docs/reference/eql/indexes.mdx @@ -77,7 +77,7 @@ WHERE email = '{"hm":"abc"}'::jsonb; ### Equality ```sql -SELECT * FROM users WHERE email = $1; +SELECT * FROM users WHERE email = $1::eql_v3.text_eq; -- Index Scan using users_email_eq -- Index Cond: (eql_v3.eq_term(email) = eql_v3.eq_term($1)) ``` @@ -87,14 +87,14 @@ SELECT * FROM users WHERE email = $1; The `<`, `<=`, `>`, `>=` operators inline to comparisons on `eql_v3.ord_term`, so natural-form range predicates match the btree: ```sql -SELECT * FROM users WHERE created_at < $1; +SELECT * FROM users WHERE created_at < $1::eql_v3.timestamp_ord; ``` `ORDER BY` needs care. The planner inlines operators in *predicates* but does not rewrite *sort keys*: `ORDER BY created_at` uses the index for the `WHERE` clause but still adds a `Sort` node, which scales linearly with the rows passing the filter. To stream rows out of the btree already ordered, write the sort key in extractor form: ```sql SELECT * FROM users - WHERE created_at < $1 + WHERE created_at < $1::eql_v3.timestamp_ord ORDER BY eql_v3.ord_term(created_at) DESC LIMIT 10; ``` diff --git a/content/docs/reference/eql/numbers.mdx b/content/docs/reference/eql/numbers.mdx index cda9471..2d9a2a0 100644 --- a/content/docs/reference/eql/numbers.mdx +++ b/content/docs/reference/eql/numbers.mdx @@ -90,7 +90,7 @@ A value for an `_ord` column carries the shared envelope keys (`v`, `i`, `c` โ€” | `ORDER BY` | โŒ | โŒ | โœ… | | `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | -Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.int8_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). +Blocked *operator* cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. `ORDER BY` is the one blocked cell that doesn't raise: it isn't an operator, so sorting a variant without an ordering term runs โ€” but the order is meaningless (see [Sorting](/reference/eql/sorting)). Operands must be typed (`$1::eql_v3.int8_ord`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). ## Functions diff --git a/content/docs/reference/eql/text.mdx b/content/docs/reference/eql/text.mdx index 9fe995f..f08eafc 100644 --- a/content/docs/reference/eql/text.mdx +++ b/content/docs/reference/eql/text.mdx @@ -84,7 +84,7 @@ The narrower variants carry only their own term: a `text_eq` payload carries `hm | `ORDER BY` | โŒ | โŒ | โœ… | โŒ | โœ… | | `IS NULL` / `IS NOT NULL` | โœ… | โœ… | โœ… | โœ… | โœ… | -Blocked cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. Operands must be typed (`$1::eql_v3.text_eq`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). +Blocked *operator* cells raise an `operator โ€ฆ is not supported` exception โ€” they never silently return wrong rows. `ORDER BY` is the one blocked cell that doesn't raise: it isn't an operator, so sorting a variant without an ordering term runs โ€” but the order is meaningless (see [Sorting](/reference/eql/sorting)). Operands must be typed (`$1::eql_v3.text_eq`), or PostgreSQL resolves the native `jsonb` operator instead of the encrypted one. Both rules are covered in [Core concepts](/reference/eql/core-concepts). ## Functions