added einops for embedding models and simplified accuracy description by dtrawins · Pull Request #4207 · openvinotoolkit/model_server

dtrawins · 2026-05-13T15:07:25Z

🛠 Summary

CVS-186324

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

Updates demo documentation around accuracy evaluation and model export, and adds a missing Python dependency (einops) needed by some embedding/export workflows.

Changes:

Simplifies continuous batching accuracy demo instructions by linking to other deployment demos and updates the VLM evaluation command.
Adds einops to the export-models demo Python requirements.
Replaces a long CLI help “Expected Output” block in the export-models README with a short compatibility note about transformers versions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`demos/continuous_batching/accuracy/README.md`	Simplifies server startup guidance (links to other demos) and adjusts VLM eval command; retains example outputs.
`demos/common/export_models/requirements.txt`	Adds `einops` dependency to export-model requirements.
`demos/common/export_models/README.md`	Removes verbose help output and adds a note about potential `transformers` version requirements.

 ## Starting the model server

-### With Docker
-```bash
-docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config.json
-```

-### On Baremetal
-```bash
-ovms --rest_port 8000 --config_path ./models/config.json
-```
+Example of LLM and VLM models deployment is documented in other demos like
+[Agentic usage for LLM models](../agentic_ai/README.md) 
+[Using VLM models](../vlm/README.md)


 python -m lmms_eval \
    --model openai_compatible \
-    --model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \
+    --model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \
    --tasks mme,mmmu_val \
    --batch_size 1 \


-  --enable_tool_guided_generation
-                        Enables enforcing tool schema during generation. Requires setting tool_parser
-```
+> Note: Exporting some models might require different transformers version than specified in requirements.txt Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/). If custom transformers version is required, install it afterwards via `pip install transformers==<version>`


ngrozae · 2026-05-18T10:13:35Z

do we want to check pip install command if no other command is checked?

ngrozae · 2026-05-18T10:29:23Z

 sentencepiece  # Required by: transformers`
 torchvision
 requests
+einops


Alibaba model still wasn't exported:
python3 export_model.py embeddings_ov --source_model Alibaba-NLP/gte-large-en-v1.5 --extra_quantization_params "--library sentence_transformers" --weight-format fp16 --config_file_path models/config_all.json

RuntimeError: Couldn't get TorchScript module by tracing.
Exception:
index 2314885530818453536 is out of bounds for dimension 0 with size 16
Please check correctness of provided 'example_input'. Sometimes models can be converted in scripted mode, please try running conversion without 'example_input'.
You can also provide TorchScript module that you obtained yourself, please refer to PyTorch documentation: https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html.
Traceback (most recent call last):
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 687, in
export_embeddings_model_ov(args['model_repository_path'], args['source_model'], args['model_name'], args['precision'], template_parameters, args['config_file_path'], args['truncate'])
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 520, in export_embeddings_model_ov
raise ValueError("Failed to export embeddings model", source_model)
ValueError: ('Failed to export embeddings model', 'Alibaba-NLP/gte-large-en-v1.5')

that is one of the models that require transformers<5

pgladkows · 2026-05-18T11:20:08Z

 python -m lmms_eval \
    --model openai_compatible \
-    --model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \
+    --model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \


there is no such model in OV collection: https://huggingface.co/OpenVINO/models?search=intern

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

mzegla · 2026-05-20T10:02:49Z

 ```text
 export OPENAI_BASE_URL=http://localhost:8000/v3
-bfcl generate --model ovms-model-stream --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir
+bfcl generate --model ovms-model-stream --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir


Won't this be to much for a demo? time-wise, it will take much longer to execute with multi turn.
Also you only add it for streaming path - shouldn't we align unary as well if we choose to go with multi turn?

this command is excluded from regular tests so CI is not impacted. For new models simple categories all show high and similar results. The real difference is visible in complex scenarios. That is the reason for adding mult turn in the command example.

How about unary path a few lines above? I think it should be modified too.

mzegla · 2026-05-20T10:03:39Z

                    dest='dataset')
+parser.add_argument('--embed_dim', type=int, default=None, help='Embedding dimension. Auto-detected if not provided.',
+                    dest='embed_dim')
+parser.add_argument('--max_tokens', type=int, default=999999, help='Max input tokens for truncation. default: 512',


default does not match help description

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

pgladkows · 2026-05-27T08:44:42Z

 export OPENAI_API_KEY="unused"
 git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
 cd lmms-eval
-git checkout 88b23e2bfa16a1edbc16e9e238ed82130b3a4f56


why is it removed?

dtrawins · 2026-05-27T09:37:41Z

-export CHAT_TEMPLATE_KWARGS='{"enable_thinking":false, "reasoning_effort":"low"}'
+export CHAT_TEMPLATE_KWARGS='{"enable_thinking":false, "reasoning_effort":"low", "preserve_reasoning":false}'

 bfcl generate --model ovms-model --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir


Suggested change

bfcl generate --model ovms-model --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir

bfcl generate --model ovms-model --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir

Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>

added einops for embedding models and simplified accuracy description

59fa97a

dtrawins requested review from Copilot and ngrozae May 13, 2026 15:07

Copilot started reviewing on behalf of dtrawins May 13, 2026 15:08 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

dtrawins requested review from michalkulakowski and pgladkows May 15, 2026 13:51

ngrozae reviewed May 18, 2026

View reviewed changes

pgladkows reviewed May 18, 2026

View reviewed changes

review fixes

4fb1e5b

dtrawins requested review from ngrozae and pgladkows May 19, 2026 09:03

Apply suggestions from code review

f524f2f

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

michalkulakowski approved these changes May 19, 2026

View reviewed changes

dtrawins added 4 commits May 19, 2026 13:52

update readme

9f6683b

merge

2255a83

exception and skip tests for gte model

1b627f4

update to latest mteb

1b1eed2

dtrawins requested a review from mzegla May 20, 2026 09:48

mzegla reviewed May 20, 2026

View reviewed changes

dtrawins commented May 20, 2026

View reviewed changes

Comment thread demos/embeddings/README.md Outdated

dtrawins commented May 20, 2026

View reviewed changes

Comment thread demos/embeddings/README.md Outdated

Apply suggestions from code review

1dd3308

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

dtrawins commented May 27, 2026

View reviewed changes

Comment thread demos/embeddings/ovms_mteb.py Outdated

dtrawins added 2 commits May 27, 2026 10:31

Apply suggestion from @dtrawins

1badbfb

Merge branch 'main' into CVS-186324

a61b622

ngrozae approved these changes May 27, 2026

View reviewed changes

pgladkows reviewed May 27, 2026

View reviewed changes

mzegla reviewed May 27, 2026

View reviewed changes

Comment thread demos/continuous_batching/accuracy/README.md Outdated

dtrawins commented May 27, 2026

View reviewed changes

Apply suggestions from code review

b2daabb

Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>

mzegla approved these changes May 27, 2026

View reviewed changes

dtrawins merged commit 09b7911 into main May 27, 2026
1 check passed

	bfcl generate --model ovms-model --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir
	bfcl generate --model ovms-model --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir

Conversation

dtrawins commented May 13, 2026

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants