Skip to content

[RNE Rewrite] Extend core functionality: tensor views, dtypes#1293

Open
IgorSwat wants to merge 3 commits into
rne-rewritefrom
@is/core
Open

[RNE Rewrite] Extend core functionality: tensor views, dtypes#1293
IgorSwat wants to merge 3 commits into
rne-rewritefrom
@is/core

Conversation

@IgorSwat

Copy link
Copy Markdown
Contributor

Description

This PR introduces the following changes to the core of the package:

  • Adds tensor view implementation and Typescript API for creating views
  • Extends core support for new dtypes: int64, bool

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Run existing demo apps / tests (if there are any).

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Some of the changes are there because of the followup commit with text-to-speech pipeline implementation.

@barhanc barhanc self-requested a review June 30, 2026 13:26
@barhanc barhanc added refactoring feature PRs that implement a new feature labels Jun 30, 2026
@msluszniak msluszniak self-requested a review June 30, 2026 16:22

@msluszniak msluszniak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long one, but please read it carefully. Regarding the whole PR, here is my take on that:

  • Addition of new types namely bool and int64 is definitely needed and we should keep it.

  • Regarding the view, I think that eventually we should also keep it, but this one needs big design changes. The object you call "a view" is not what view usually means. You cannot reshape that as in NumPy or Torch. It has arbitrary offset and shape in comparison to parent, actual tensor. It's literally: "reinterpret a contiguous byte sub-range as a new tensor". So it should be rather call it slice or subTensor. Another thing is that offset is in bytes and shape is in elements number. You should keep the offset in number of elements, otherwise you:

    1. User has to compute N * dtype.size() in JS - strange, mixing concepts,
    2. It's unidiomatic (NumPy and Torch offset are in elements),
    3. A non-modulo size of element offset in bytes causes misaligned ptr -> undefined behaviour.

    The third problem: This codebase cares about thread safety. Bartek designed this carefully to make it possible, namely copyTo/setData/getData/dispose does try_to_lock on mutex_. But each TensorHostObject has its own mutex_, and a view is a separate object pointing to the parent's bytes. So parent.setData() (locks parent.mutex_) and view.getData() (locks view.mutex_) can run concurrently on the same memory — each happily holding its own lock. The "one mutex guards one memory region" rule is broken the moment a view exists. That's a data race the current single-tensor code can't have. You should rather:

    1. Hold a shared_ptr to the parent in the view → fixes the use-after-free and makes the "remember to dispose the parent last" comment unnecessary (ownership enforces it),
    2. Share the parent's mutex_ (the view should lock the same mutex) → fixes the aliasing race.

    Yet another thing, you should add the check: offset_elems + numel <= parent.numel → fixes OOB and the alignment hazard.

  • Regarding the DType rewrite (free functions → class): I'd drop this one. As mentioned, the new bool/int64 values are needed, but those don't require the class at all. You get 100% of the dtype benefit with none of the class's churn. The class itself doesn't pull its weight, and I think it's actually a small step back:

    1. It wraps a single uint8_t and protects no invariant the validated free functions didn't already guarantee. The comment "leaves space for additional logic if needed" is speculative — YAGNI.
    2. The implicit conversions are footguns, especially the string pair. With both DType(const std::string&) and operator std::string() implicit, dtype == "float32" now compiles — it resolves via DTypestd::string and does a heap-allocating string compare instead of an enum compare, silently bypassing the type system. On top of that, operator std::string() and operator ScalarType() can throw, so an innocuous std::string s = dtype; carries a throw.
    3. dtype.size() is ambiguous sitting next to TensorView::size_ (total tensor bytes) and numel_. elementSize(dtype) said exactly what it returned. Same for construction: parseDType(s) / fromScalarType(s) documented "parses, may throw" / "maps from aten" at the call site; DType(x) hides that behind a constructor.

    So: keep the new enum values, drop the class. If we ever do want a wrapper later, it should at minimum be explicit on the string boundaries — but I don't think we need it now.

  • We cannot just disable the input / output checks for all models and just merge it. W need to figure out some solution for this one before progressing further. One way is to reexport models with meta methods that returns dynamic shapes of the model. This way we can reimplement our checks to work with models such as Kokoro.

  • If we do not enforce linebreaks in clang-format, maybe we should increase the suggested max signs in a row to a higher value than 60, so we do not imply so many incorrectly formatted lines as here in tensor.cpp, cc: @barhanc

  • I know that these warnings raise by readability-implicit-bool-conversion are preexisting, but we should fix them.

@barhanc

barhanc commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I think @msluszniak's analysis is spot on. My inherent reluctance to adding views is exactly because it makes reasoning about tensor lifetimes and concurrency difficult -- both of which are the most fragile aspects of the new architecture implementation and must be taken with utmost care, since if they aren't ironclad everything else can fall apart. At the same time I agree that it is something we actually might want to add to Tensor since it might make some things cleaner. My two cents regarding views' implementation:

  • The name and the signature should be changed to better convey what this does. While I agree that in terms of the underlying data_ all it does is taking a contiguous slice, the reshape aspect is also important, because it interprets this slice using passed shape in the ExecuTorch tensor. This allows e.g. for clean squeeze/unsqueeze (interpreting e.g. [H,W,C] as [1,H,W,C]) without totally redundant copy as it is done right now for example in semantic segmentation #L190 and which could be done using view with offset=0. I was thinking something along the lines sliceAs(shape, { offset? }) but this is certainly not ideal.
  • I would drop the TensorView entirely. I don't really understand the reason it exists in the first place, it definitely doesn't separate concerns as in the SRP since TensorHostObject inherits from TensorView, those are not separate entities with different responsibilities.
  • The most important aspect is how we handle lifetimes and concurrency. In my opinion the behaviour in TS should be something like this:
    • We need an additional property in Tensor that tells us if the given tensor object is owning or non-owning.
    • .view() returns a standard tensor with this property set as non-owning.
    • Trying to dispose a non-owning tensor should throw.
    • Disposing the parent tensor should invalidate all views (trying to use it after free should throw on all the existing checks 'has already been disposed')
    • Acquiring a unique lock on a view should prevent any access to other views on the same memory and the parent (they should throw on existing 'tensor is currently in use' checks).

Ideally it would be possible to add views as purely additive change that wouldn't require changing any of the validation code in cpp ops outside core/.


Regarding DType I have nothing to add, the analysis perfectly conveys what I tried to say in the office. One additional thing though is that I would appreciate if the introduced changes (these and ones in the future) were mostly additive. We want to prevent situations like in the old architecture where addition of new task had to touch a lot of shared code since this makes it prone to introduce regressions, unnecessary coupling, etc. We should first thing, try to work around what already is implemented and add new helpers, primitives, etc., not change existing shared code (especially the core). The core was designed this way to be easily extensible and it proved to work well for all cv models, tokenizers and poc llms. There certainly will be moments (e.g. recent need for dynamic methods validation) where the change of shared code is unavoidable but such changes should be well thought-out and justified (and in the case of core introduced as separate PRs); they shouldn't be introduced because "I prefer it this way" or "it will maybe make something faster in the future" (YAGNI).


If we do not enforce linebreaks in clang-format, maybe we should increase the suggested max signs in a row to a higher value than 60, so we do not imply so many incorrectly formatted lines as here in tensor.cpp

Do you mean the lines already present in the code or the whitespace changes introduced in this PR? The whitespace changes shouldn't be there and they are probably because the files were first formatted with columnWidth set to something and then after rebase introduced our column-width-lenient .clang-format config the formatter didn't change them back. They should be manually reverted.


I know that these warnings raise by readability-implicit-bool-conversion are preexisting, but we should fix them.

These are not preexisting (the clang-tidy is clean when run on the current rne-rewrite branch). Before, the ->data_ was a smart pointer, this PR changed it so that ->data_ is a raw uint8_t* pointer, thus the warnings.

@msluszniak

Copy link
Copy Markdown
Member

Do you mean the lines already present in the code or the whitespace changes introduced in this PR? The whitespace changes shouldn't be there and they are probably because the files were first formatted with columnWidth set to something and then after rebase introduced our column-width-lenient .clang-format config the formatter didn't change them back. They should be manually reverted.

Right, so let's just reformat the code.

These are not preexisting (the clang-tidy is clean when run on the current rne-rewrite branch). Before, the ->data_ was a smart pointer, this PR changed it so that ->data_ is a raw uint8_t* pointer, thus the warnings.

Oh, forget that it was smart pointer which has an explicit bool operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants