Skip to content

[RNE Rewrite] Add text and image embeddings pipelines#1292

Open
msluszniak wants to merge 7 commits into
rne-rewritefrom
@ms/add-embeddings
Open

[RNE Rewrite] Add text and image embeddings pipelines#1292
msluszniak wants to merge 7 commits into
rne-rewritefrom
@ms/add-embeddings

Conversation

@msluszniak

@msluszniak msluszniak commented Jun 30, 2026

Copy link
Copy Markdown
Member

Description

Adds text and image embeddings pipelines to the new architecture, achieving parity with the old flow. Embeddings are pure-TypeScript tasks (pooling + L2-norm stay baked into the .pte): text tokenizes and runs forward; image reuses the existing image preprocessor. To run the existing int64-input embedding models unchanged, this adds an int64/Long tensor dtype to the core (the tensor data path is byte-oriented, so it is a small dtype.{h,cpp} + tensor.ts change).

Text inputs are fed at their exact token length (no padding). model.execute validates dynamically-shaped forward inputs against the [min, max, step] bounds exposed by an optional get_dynamic_dims method; models without it keep exact per-dimension validation. This fixes scale-sensitive pooling heads (e.g. DistilUSE's tanh projection), which padding otherwise corrupts.

Includes createTextEmbeddings / createImageEmbeddings tasks, useTextEmbeddings / useImageEmbeddings hooks, models.textEmbeddings / models.imageEmbeddings registry entries, an interactive text-embeddings demo in apps/nlp, and a CLIP zero-shot image-embeddings demo in apps/computer-vision.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • nlp app → Text Embeddings: seeds a sentence library; type a query and Find similar to rank by cosine similarity, switch models via the chips. Verified on a physical Android device (arm64): all-MiniLM-L6-v2 returns 384-dim L2-normalized embeddings (~25 ms/forward on XNNPACK); DistilUSE ranks correctly with a wide similarity spread (previously compressed by padding).
  • computer-vision app → Image Embeddings: pick an image and rank editable text labels via CLIP zero-shot (image vs. text embeddings). Verified on device.

Screenshots

Related issues

#1247

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

DistilUSE and CLIP (text) are re-exported with the get_dynamic_dims method and pinned to v0.10.0; the remaining text-embedding models (all-MiniLM-L6-v2, all-mpnet-base-v2, multi-qa MiniLM/MPNet, paraphrase-ML) still need re-export to v0.10.0.

Add int64/Long tensor dtype support and text/image embeddings tasks,
hooks, and model registry entries, plus an interactive text-embeddings
demo screen in apps/nlp.

Closes #1247
@msluszniak msluszniak self-assigned this Jun 30, 2026
@msluszniak msluszniak linked an issue Jun 30, 2026 that may be closed by this pull request
@msluszniak msluszniak added the feature PRs that implement a new feature label Jun 30, 2026
model.execute now validates dynamically-shaped forward inputs against the
model-declared [min, max, step] bounds exposed by an optional
get_dynamic_dims method, instead of requiring an exact shape match; models
without it keep exact per-dimension validation. Text embeddings feed the
exact token length with no padding, which fixes scale-sensitive pooling
heads (e.g. DistilUSE's tanh projection).

Point DistilUSE at v0.10.0 (re-exported with get_dynamic_dims).
Comment thread apps/nlp/app/text-embeddings/index.tsx Outdated
Comment thread apps/nlp/app/text-embeddings/index.tsx Outdated
Comment thread apps/nlp/app/text-embeddings/index.tsx Outdated
Comment thread packages/react-native-executorch/cpp/core/model.cpp Outdated
Comment thread packages/react-native-executorch/cpp/core/model.cpp Outdated
Comment thread apps/nlp/app/text-embeddings/index.tsx
…mbeddings demo

- Simplify text-embeddings cosine to a dot product (all models L2-normalize)
  and drop redundant inline comments.
- Move the get_dynamic_dims / input-validation contract into the
  ModelHostObject class docs; trim the inline narration in model.cpp.
- Add an Image Embeddings example to the computer-vision app: pick two images
  and compare their CLIP embeddings by cosine similarity.
Rework the computer-vision Image Embeddings screen (based on main's CLIP demo):
pick an image and rank editable text labels by CLIP image/text embedding
similarity, instead of the uninformative two-image score. Pads the scroll
content past the Android nav bar.

Point CLIP text + image at v0.10.0 (text re-exported with get_dynamic_dims;
image unchanged) and declare the textEmbeddings feature in the app.
- model.{h,cpp}: read get_dynamic_dims once per model and cache it instead
  of re-executing the method on every forward() call; reject a present-but-
  malformed declaration (wrong dtype/rank/shape, bad min/max/step, or row
  count not matching forward's tensor input dims) with an explicit error
  instead of silently falling back to exact validation.
- textEmbeddings: throw a clear error when input tokenizes to zero tokens
  (was BigInt(undefined)); fix docstring to match no-padding behavior.
- useTextEmbeddings: expose localPath/tokenizerPath like sibling hooks.
- computer-vision: extract shared skImageToBuffer helper, dedup from
  classification and imageEmbeddings screens.
- Use unordered_set::contains instead of count()==0
  (readability-container-contains).
- Keep the new dynamic-bounds cache members public so the class stays an
  all-public data carrier; adding private member variables had broken the
  non-private-member exemption and flagged the existing public members.
@msluszniak msluszniak marked this pull request as ready for review July 1, 2026 16:07
@msluszniak msluszniak requested a review from barhanc July 1, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RNE Rewrite] Add image and text embeddings pipelines

1 participant