Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion docs/t-sql/statements/create-external-model-transact-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,8 +388,27 @@ Next, download a version of [ONNX Runtime](https://github.com/microsoft/onnxrunt

Download and build [the `tokenizers-cpp` library](https://github.com/mlc-ai/tokenizers-cpp/tree/main) from GitHub. Once the dll is created, place the tokenizer in the `C:\onnx_runtime` directory.

The tokenizer must be compiled as a shared dynamic link library using MSVC, and must export a specific entry point:

```cpp
#include "tokenizers_cpp.h" // for example: `tokenizers-cpp\include\tokenizers_cpp.h`
#include <string>
#include <vector>

extern "C" __declspec(dllexport)
void LoadBlobJsonAndEncode(
const std::string& json_blob, // contents of `tokenizer.json`
const std::string& text, // input text to tokenize
std::vector<int>& out_ids // output token IDs (the embeddings)
Comment on lines +400 to +402
Comment on lines +391 to +402
) {
// ~~ Implement according to current API of `tokenizers-cpp` ~~
// auto tok = tokenizers::Tokenizer::FromBlobJSON(json_blob);
// out_ids = tok->Encode(text);
}
```

> [!NOTE]
> Ensure the created dll is named **tokenizers_cpp.dll**
> The exact signature of this export may change. Ensure the created dll is named **tokenizers_cpp.dll**
Comment on lines 410 to +411

### Step 5: Download the ONNX model

Expand Down