Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions IA.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,14 +130,27 @@ live at `/docs/errors/<code>` — permanent, never restructured (CIP-3338).
## Reference

- [x] Section scaffold 🚧 (eql, stack, auth, cli, proxy, workspace)
- **EQL (v3 rewrite — CIP-3326):**
- [ ] `/reference/eql` — overview + install (single SQL file, permissions split, dbdev, Docker)
- [ ] `/reference/eql/types` — 10 scalar families × variants + `eql_v3.json`
- [ ] `/reference/eql/operators` — per-variant matrix incl. what RAISES; typed-operand rule
- [ ] `/reference/eql/indexes` — functional indexes on extractors; Supabase-compatible
- [ ] `/reference/eql/json` — ste_vec, path queries
- [ ] `/reference/eql/functions` — incl. aggregates (min/max only)
- [ ] `/reference/eql/payload-format` — v/i/c envelope, hm/ob/bf (absorbs cipher-cell)
- **EQL (v3 rewrite — CIP-3326; Tailwind-shaped: install → core concepts → type
categories → indexes → query patterns). Anti-drift rule: shared mechanics
(typed operands, blockers, envelope, variant model, ORE-equality) live ONLY in
core-concepts — category/query pages link, never restate:**
- [x] `/reference/eql` — install (single SQL file, permissions split, dbdev, Docker)
- [x] `/reference/eql/core-concepts` — variant model, payload anatomy (absorbs
cipher-cell), typed-operand rule, fail-loud blockers, term leakage pointer
- [x] `/reference/eql/numbers` — int*/float*/numeric
- [x] `/reference/eql/dates-and-times` — date/timestamp (same traits as numbers,
distinct semantics)
- [x] `/reference/eql/text` — all six text variants; owns the no-LIKE treatment
- [x] `/reference/eql/json` — ste_vec + sv payload shape + containment/path queries
- [x] `/reference/eql/booleans` — storage-only variants (bool has only that one)
- [x] `/reference/eql/indexes` — functional indexes on extractors; Supabase-compatible
- [x] `/reference/eql/filtering` — =, IN, ranges, token match, containment
- [x] `/reference/eql/sorting` — ORDER BY, extractor sort-key form, pagination
- [x] `/reference/eql/grouping-and-aggregates` — GROUP BY/DISTINCT, min/max, no SUM/AVG
- [x] `/reference/eql/joins` — equijoins, the same-keyset constraint
- [ ] ⛔ `/reference/eql/query-performance` — port the EQL repo performance guide once
rewritten for v3 upstream (v3 branch folded it into database-indexes.md; verify
nothing from the v2 guide on main was lost) — see CIP-3351
- **Stack SDK:**
- [ ] `/reference/stack` — client + configuration (port encryption/* pages)
- [ ] `/reference/stack/schema`
Expand Down Expand Up @@ -171,5 +184,9 @@ live at `/docs/errors/<code>` — permanent, never restructured (CIP-3338).
- [ ] OG images for v2 pages (route only covers legacy tree)
- [ ] Correctness CI: snippet type-checking, SQL-vs-EQL-Docker, terminology lint (CIP-3337)
- [ ] llms.txt curation + Cloudflare AI crawl policy + md-degradation check (CIP-3339)
- [ ] ⛔ EQL 3.0.0 release alignment (CIP-3352, blocks CIP-3335) — the EQL reference
documents the release as decided, ahead of the eql_v3 branch: payload `v: 3`,
OPE SEM specifier, Docker tag `:17-3.0.0`, `version()` output, schema files.
Each must land upstream or be walked back in the docs before merge
- [ ] Flip `ENABLE_V2_REDIRECTS=1`, delete `content/stack` + `/stack` routes + legacy loader (CIP-3335)
- [ ] Consistency sweep + Supabase listing v3 revision (CIP-3335)
62 changes: 62 additions & 0 deletions content/docs/reference/eql/booleans.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Booleans
description: "Encrypted booleans are storage-only by design: eql_v3.bool stores and decrypts, carries no index terms, and blocks every comparison."
type: reference
components: [eql]
verifiedAgainst:
eql: "3.0.0"
---

Every scalar type has a storage-only variant — for `bool` it's the only one. EQL ships `eql_v3.bool` and nothing else: there is no `bool_eq` and no `bool_ord`. An encrypted boolean column can be stored, decrypted, and null-checked; it cannot be filtered, sorted, grouped, or joined on.

## Why there are no query variants

A two-value column has too little cardinality for any searchable index to be safe. An equality term over `true` / `false` would partition the table into two visible buckets — leaking the value distribution (and, with any outside knowledge, the values themselves) outright. Rather than ship an index term that can't keep its promise, EQL omits the query variants entirely. See [Searchable encryption](/concepts/searchable-encryption) for the general analysis of what index terms reveal.

## What works, what raises

`eql_v3.bool` follows the bare-variant contract described in [Core concepts](/reference/eql/core-concepts#variants-declare-capability): it carries no index terms, so `IS NULL` / `IS NOT NULL` are the only predicates that work. Every comparison operator routes to a blocker and raises — the [fail-loud behavior](/reference/eql/core-concepts#unsupported-operations-fail-loudly) shared by all encrypted variants:

```sql
-- ❌ Raises: operator = is not supported for eql_v3.bool
SELECT * FROM users WHERE is_active = $1::eql_v3.bool;

-- ✅ Works: NULL columns are not encrypted
SELECT * FROM users WHERE is_active IS NOT NULL;
```

## Filter client-side

Query on other columns, decrypt the boolean in your application, and filter there:

```sql
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email eql_v3.text_eq, -- exact lookup
created_at eql_v3.timestamp_ord, -- range queries, ORDER BY
is_active eql_v3.bool -- storage only (by design)
);
```

```sql
-- Narrow the result set with the columns that do carry index terms…
SELECT id, email, is_active FROM users
WHERE created_at >= $1::eql_v3.timestamp_ord;
-- …then decrypt is_active in the client and filter on the plaintext.
```

The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) decrypt the payload back to a plain boolean on read, so the client-side filter is an ordinary `if`.

If a boolean genuinely needs to be a server-side predicate, that is a data-modelling signal: consider whether the flag is actually sensitive. A non-sensitive flag can stay a plain PostgreSQL `boolean` column alongside your encrypted columns.

## Storing without querying

`bool` is the forced case of a pattern available to every scalar type: the bare variant `eql_v3.<T>` (for example `eql_v3.int4`, `eql_v3.text`, `eql_v3.timestamp`) is storage-and-decryption only. It carries no index terms, and every comparison operator raises — use it for columns you only ever store and decrypt, so the database holds no searchable material for them at all.

For every type other than `bool`, storage-only is a choice you can walk back. If you later need to query, retype the column as a query variant — or, if the payloads already carry the needed term (the client decides which terms travel in the payload), cast at the call site:

```sql
SELECT * FROM readings WHERE value::eql_v3.int4_ord > $1::eql_v3.int4_ord;
```

The variant families and what each one enables are covered in [Core concepts](/reference/eql/core-concepts); the per-type specifics live in [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), and [Text](/reference/eql/text).
152 changes: 152 additions & 0 deletions content/docs/reference/eql/core-concepts.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
title: Core concepts
description: "The model behind every EQL page: domain variants that declare capability, the encrypted payload envelope, the typed-operand rule, and fail-loud blockers."
type: reference
components: [eql]
verifiedAgainst:
eql: "3.0.0"
---

Everything in the EQL reference builds on four ideas: columns are typed as **domain variants** that declare what they can do, every value is a **`jsonb` payload** carrying encrypted index terms, **operands must be typed** for the encrypted operators to resolve, and anything a column can't do **fails loudly** instead of returning wrong rows. This page is the canonical home for all four — the per-type and per-query pages link back here rather than restating them.

## Variants declare capability

EQL ships its searchable-encryption surface as PostgreSQL **domains in the `eql_v3` schema**, all backed by `jsonb`. Each scalar type generates a *family* of domain variants, and the variant you type a column as fixes its query capability. Each domain carries a `CHECK` constraint that validates the encrypted payload on insert, so a malformed or wrong-version value is rejected at write time rather than surfacing at query time.

There is no database-side configuration table. Earlier EQL versions tracked encryption config in the database (`config_add_table`, `config_add_column`, and friends) — those are gone in v3. The searchable surface of a column is fixed by the domain variant you type it as, and which index terms travel in a value's payload is decided by the encryption client (the [Stack SDK](/reference/stack) or [CipherStash Proxy](/reference/proxy)). The domain makes the matching operators resolve; the term in the payload is what makes them answer.

For any scalar type `<T>`, the family looks like this:

| Domain variant | Capability |
| --- | --- |
| `eql_v3.<T>` | Storage and decryption only. |
| `eql_v3.<T>_eq` | Equality: `=`, `<>`, `IN`, `GROUP BY`, `DISTINCT`, equijoins. |
| `eql_v3.<T>_ord` | Comparisons (`<` … `>=`), `BETWEEN`, `ORDER BY`, `MIN` / `MAX` — plus equality. |
| `eql_v3.<T>_ord_ore` | As `<T>_ord`, with the ORE mechanism pinned — see [SEM specifiers](#sem-specifiers). |
| `eql_v3.text_match` (text only) | Free-text token containment: `@>` / `<@`. |
| `eql_v3.text_search` (text only) | Equality + ordering + token containment. |

Two things worth calling out:

- **The bare variant blocks everything.** `eql_v3.<T>` carries no index term. Querying it with any comparison operator raises an "operator not supported" exception. Use it for columns you only ever store and decrypt — [Booleans](/reference/eql/booleans) covers this pattern in full.
- **Which index term backs each capability** is an implementation detail of the payload — covered in [Anatomy of an encrypted value](#anatomy-of-an-encrypted-value) below.

### SEM specifiers

A trailing mechanism suffix — the `_ore` in `_ord_ore` — is a **SEM specifier**: it pins *which* searchable-encryption mechanism implements the capability, rather than just declaring the capability itself.

- `eql_v3.<T>_ord` declares *orderable* and leaves the mechanism to EQL's default — currently ORE (order-revealing encryption).
- `eql_v3.<T>_ord_ore` declares *orderable via ORE, explicitly*. Today the two are byte-identical surfaces backed by the same term.

The distinction earns its keep as mechanisms multiply: the EQL v3 release adds an **OPE** (order-preserving encryption) specifier for every orderable type — including `text` — at which point pinning a specifier documents and freezes a column's mechanism choice, while unspecified variants track the default. Each type page lists its available specifiers under an "SEM specifiers" heading.

Declaring a table is just typing each column as the variant it needs:

```sql
CREATE TABLE users (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email eql_v3.text_eq, -- equality only
salary eql_v3.int4_ord, -- equality + range + ORDER BY
created_at eql_v3.timestamp_ord
);
```

Every scalar type — `int2`, `int4`, `int8`, `numeric`, `float4`, `float8`, `date`, `timestamp`, `text`, and `bool` in EQL 3.0.0 — ships some subset of this family. The per-category pages list exactly which variants each type has and how to choose between them: [Numbers](/reference/eql/numbers), [Dates & times](/reference/eql/dates-and-times), [Text](/reference/eql/text), and [Booleans](/reference/eql/booleans). Encrypted JSON documents use a separate domain, `eql_v3.json`, with its own operator surface — see [JSON](/reference/eql/json).

## Anatomy of an encrypted value

Every EQL encrypted value is a `jsonb` payload with a shared envelope plus the index terms that make it queryable. Earlier CipherStash docs called this format the **CipherCell** — this section is the current definition of the same structure.

Payloads are **produced** by the encryption clients — the [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) — and **consumed** by EQL's operators and functions inside Postgres. EQL never sees plaintext: it validates, stores, and compares these payloads; it cannot produce or decrypt them. The division is strict: the clients never rely on the database for key material.

### The envelope

Every payload carries three envelope keys. Each `eql_v3` domain's `CHECK` constraint requires them, so a value missing any of these is rejected at write time:

| Key | Contents | Notes |
| --- | --- | --- |
| `v` | The EQL version | `3` — the payload version matches the EQL major version. The domain `CHECK`s assert it and raise on any other value. |
| `i` | Ident: `{"t": "<table>", "c": "<column>"}` | Binds the ciphertext to the table and column it was encrypted for. Both keys required. |
| `c` | Ciphertext | The opaque, non-deterministic encrypted blob (mp_base85-encoded). Never used in comparisons. |

<Callout>
Payloads produced by EQL v2 clients carried `v: 2`; from 3.0.0 the payload version and the EQL version move together.
</Callout>

A `k` discriminator (`"ct"` for a scalar ciphertext, `"sv"` for a JSON document) also appears on payloads emitted by the clients, distinguishing the two top-level shapes.

### Index-term keys

Alongside the envelope, a payload carries the index terms for its column's capability. Each key is backed by a SEM (searchable encrypted metadata) type in the `eql_v3` schema:

| Key | SEM type | Wire shape | Enables | Reveals |
| --- | --- | --- | --- | --- |
| `hm` | `eql_v3.hmac_256` (domain over `text`) | Hex string (HMAC-SHA-256) | `=`, `<>` on `_eq` and `text_search` domains | Whether two values are equal — nothing else |
| `ob` | `eql_v3.ore_block_256` (composite: array of `bytea` block terms) | Array of hex-encoded ORE blocks (block count varies by scalar width) | `<`, `<=`, `>`, `>=`, `ORDER BY` on `_ord` / `_ord_ore` domains — and `=` / `<>`, since ORE comparison collapses to equality | The relative order of two values |
| `bf` | `eql_v3.bloom_filter` (domain over `smallint[]`) | Array of set bit positions (**signed** 16-bit — large filters emit negative positions) | `@>` / `<@` token containment on `_match` domains | Probabilistic token overlap between values |

The capability is encoded as **required keys**: the payload for an `eql_v3.text_eq` column must carry `hm`; an `eql_v3.int4_ord` payload must carry `ob` (and only `ob`); a `text_match` payload must carry `bf`; a `text_search` payload carries all three. A payload missing its term key fails the domain `CHECK` — and fails to deserialize in the client bindings.

A scalar payload for an `eql_v3.text_search` column (lookup + ordering + free-text match, so all three terms are required):

```json
{
"v": 3,
"i": { "t": "users", "c": "email" },
"c": "mBbKmsMM%bK#QQOx1yLDBHyD...",
"hm": "9c8ec1d2f9932b979b1bf3f09f8a4e2f6a41f8de2f0c8b7a52e1f5c3d4b6a790",
"ob": ["7a1fd0c2...", "d24c9be1...", "03fa66b8..."],
"bf": [42, 1290, -8113, 30201]
}
```

- `v`, `i`, `c` — the envelope
- `hm` — equality term: `WHERE email = $1` compares this
- `ob` — ordering term: `ORDER BY` and range comparisons walk these blocks
- `bf` — bloom-filter term: `@>` token containment tests these bit positions

Encrypted JSON documents use a different payload shape — an `sv` array with one encrypted entry per path in the document instead of a root ciphertext — defined in [JSON](/reference/eql/json).

### Machine-readable schemas

The [EQL repository](https://github.com/cipherstash/encrypt-query-language) publishes the format as JSON Schema in two places:

- **`crates/eql-bindings/schema/`** — one schema per scalar domain (`$id`s under `https://schemas.cipherstash.com/eql/v3/`), generated from the canonical Rust wire types in the `eql-bindings` crate. TypeScript bindings are generated from the same definitions, so every producer and consumer shares one source of truth.
- **`docs/reference/schema/`** — full-payload schemas covering both the scalar and `sv` document shapes. These files are still named for the v2.x payload releases (`eql-payload-v2.2.schema.json`, `eql-payload-v2.3.schema.json`); the v2.3 schema describes the document shape, with the payload version field moving to `3` alongside the EQL 3.0.0 release.

## The typed-operand rule

The `eql_v3` domains are backed by `jsonb`. When an operand has no known type — a bare string literal, an untyped parameter — PostgreSQL reduces the domain to its `jsonb` base type and resolves the **native `jsonb` operator** instead of the encrypted one. The query doesn't fail; it silently returns native `jsonb` semantics, which are meaningless for encrypted payloads.

```sql
-- ❌ Wrong: untyped parameter. PostgreSQL falls back to the native jsonb `=`,
-- which compares raw payloads — syntactically valid, semantically meaningless.
SELECT * FROM users WHERE email = $1;

-- ✅ Right: typed operand — the encrypted `=` resolves.
SELECT * FROM users WHERE email = $1::eql_v3.text_eq;
```

Always type the operand: a typed parameter (`$1::eql_v3.text_eq`) or an explicit cast (`'…'::eql_v3.int4_ord`). The [Stack SDK](/reference/stack) and [CipherStash Proxy](/reference/proxy) type bound parameters automatically — raw SQL must do it by hand.

This is the one place where a mistake is *silent*. Everything else fails loudly:

## Unsupported operations fail loudly

Unsupported operators are not silent no-ops. Every operator that a variant doesn't support is still *defined* — it routes to a blocker function that raises an `operator … is not supported` exception. A mis-typed query fails loudly instead of silently returning wrong results:

```sql
-- salary is eql_v3.int8_eq (equality only)
SELECT * FROM users WHERE salary > $1::eql_v3.int8_eq;
-- ERROR: operator > is not supported for eql_v3.int8_eq
```

A `NULL` operand still raises — the blockers are deliberately not `STRICT`, so PostgreSQL can't skip the check. (A SQL `NULL` column value is not encrypted, so `IS NULL` / `IS NOT NULL` themselves always work, on every variant.)

`LIKE` and `ILIKE` are blocked on **every** encrypted variant — pattern matching is meaningless on ciphertext. Encrypted text matching is bloom-filter token containment instead; [Text](/reference/eql/text) covers it.

One equality subtlety follows from the term table above: on `_ord` / `_ord_ore` columns, `=` and `<>` compare the **ORE (`ob`) term** — ORE comparison collapses to equality — so `_ord` payloads carry no `hm` term at all. On `_eq` and `text_search` columns, equality compares the HMAC (`hm`) term.

## What the terms reveal

Every index term a value carries is extra material stored in the database, and each term class reveals defined structure to an observer who can read the stored payloads: equality terms reveal *value repetition* (which rows share a value), ORE terms reveal *ordering* (which of two values is larger), and bloom terms reveal *probabilistic token overlap*. None of them reveal the plaintext — but you should only carry the terms you actually query on. The full analysis of what each term does and doesn't leak is in [Searchable encryption](/concepts/searchable-encryption).
Loading