added einops for embedding models and simplified accuracy description#4207
Conversation
There was a problem hiding this comment.
Pull request overview
Updates demo documentation around accuracy evaluation and model export, and adds a missing Python dependency (einops) needed by some embedding/export workflows.
Changes:
- Simplifies continuous batching accuracy demo instructions by linking to other deployment demos and updates the VLM evaluation command.
- Adds
einopsto the export-models demo Python requirements. - Replaces a long CLI help “Expected Output” block in the export-models README with a short compatibility note about
transformersversions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
demos/continuous_batching/accuracy/README.md |
Simplifies server startup guidance (links to other demos) and adjusts VLM eval command; retains example outputs. |
demos/common/export_models/requirements.txt |
Adds einops dependency to export-model requirements. |
demos/common/export_models/README.md |
Removes verbose help output and adds a note about potential transformers version requirements. |
| ## Starting the model server | ||
|
|
||
| ### With Docker | ||
| ```bash | ||
| docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config.json | ||
| ``` | ||
|
|
||
| ### On Baremetal | ||
| ```bash | ||
| ovms --rest_port 8000 --config_path ./models/config.json | ||
| ``` | ||
| Example of LLM and VLM models deployment is documented in other demos like | ||
| [Agentic usage for LLM models](../agentic_ai/README.md) | ||
| [Using VLM models](../vlm/README.md) |
| python -m lmms_eval \ | ||
| --model openai_compatible \ | ||
| --model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \ | ||
| --model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \ | ||
| --tasks mme,mmmu_val \ | ||
| --batch_size 1 \ |
| --enable_tool_guided_generation | ||
| Enables enforcing tool schema during generation. Requires setting tool_parser | ||
| ``` | ||
| > Note: Exporting some models might require different transformers version than specified in requirements.txt Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/). If custom transformers version is required, install it afterwards via `pip install transformers==<version>` |
| @@ -14,33 +14,17 @@ Install the framework via pip: | |||
There was a problem hiding this comment.
do we want to check pip install command if no other command is checked?
| sentencepiece # Required by: transformers` | ||
| torchvision | ||
| requests | ||
| einops |
There was a problem hiding this comment.
Alibaba model still wasn't exported:
python3 export_model.py embeddings_ov --source_model Alibaba-NLP/gte-large-en-v1.5 --extra_quantization_params "--library sentence_transformers" --weight-format fp16 --config_file_path models/config_all.json
RuntimeError: Couldn't get TorchScript module by tracing.
Exception:
index 2314885530818453536 is out of bounds for dimension 0 with size 16
Please check correctness of provided 'example_input'. Sometimes models can be converted in scripted mode, please try running conversion without 'example_input'.
You can also provide TorchScript module that you obtained yourself, please refer to PyTorch documentation: https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html.
Traceback (most recent call last):
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 687, in
export_embeddings_model_ov(args['model_repository_path'], args['source_model'], args['model_name'], args['precision'], template_parameters, args['config_file_path'], args['truncate'])
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 520, in export_embeddings_model_ov
raise ValueError("Failed to export embeddings model", source_model)
ValueError: ('Failed to export embeddings model', 'Alibaba-NLP/gte-large-en-v1.5')
There was a problem hiding this comment.
that is one of the models that require transformers<5
| python -m lmms_eval \ | ||
| --model openai_compatible \ | ||
| --model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \ | ||
| --model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \ |
There was a problem hiding this comment.
there is no such model in OV collection: https://huggingface.co/OpenVINO/models?search=intern
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| ```text | ||
| export OPENAI_BASE_URL=http://localhost:8000/v3 | ||
| bfcl generate --model ovms-model-stream --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir | ||
| bfcl generate --model ovms-model-stream --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir |
There was a problem hiding this comment.
Won't this be to much for a demo? time-wise, it will take much longer to execute with multi turn.
Also you only add it for streaming path - shouldn't we align unary as well if we choose to go with multi turn?
There was a problem hiding this comment.
this command is excluded from regular tests so CI is not impacted. For new models simple categories all show high and similar results. The real difference is visible in complex scenarios. That is the reason for adding mult turn in the command example.
There was a problem hiding this comment.
How about unary path a few lines above? I think it should be modified too.
| dest='dataset') | ||
| parser.add_argument('--embed_dim', type=int, default=None, help='Embedding dimension. Auto-detected if not provided.', | ||
| dest='embed_dim') | ||
| parser.add_argument('--max_tokens', type=int, default=999999, help='Max input tokens for truncation. default: 512', |
There was a problem hiding this comment.
default does not match help description
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
| export OPENAI_API_KEY="unused" | ||
| git clone https://github.com/EvolvingLMMs-Lab/lmms-eval | ||
| cd lmms-eval | ||
| git checkout 88b23e2bfa16a1edbc16e9e238ed82130b3a4f56 |
| export CHAT_TEMPLATE_KWARGS='{"enable_thinking":false, "reasoning_effort":"low"}' | ||
| export CHAT_TEMPLATE_KWARGS='{"enable_thinking":false, "reasoning_effort":"low", "preserve_reasoning":false}' | ||
|
|
||
| bfcl generate --model ovms-model --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir |
There was a problem hiding this comment.
| bfcl generate --model ovms-model --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir | |
| bfcl generate --model ovms-model --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir |
Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>
🛠 Summary
CVS-186324
🧪 Checklist
``