Skip to content

Commit 242c80d

Browse files
committed
save
1 parent 43b00ca commit 242c80d

File tree

1 file changed

+12
-5
lines changed
  • demos/continuous_batching/agentic_ai

1 file changed

+12
-5
lines changed

demos/continuous_batching/agentic_ai/README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ ovms.exe --pull --model_repository_path models --source_model OpenVINO/Qwen3-8B-
131131
:sync: Mistral-7B-Instruct-v0.3-int4-ov
132132
```bat
133133
ovms.exe --pull --model_repository_path models --source_model OpenVINO/Mistral-7B-Instruct-v0.3-int4-ov --task text_generation --tool_parser mistral
134-
curl -L -o models\OpenVINO\Mistral-7B-Instruct-v0.3-int4-ov\chat_template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.9.0/examples/tool_chat_template_mistral.jinja
134+
curl -L -o models\OpenVINO\Mistral-7B-Instruct-v0.3-int4-ov\chat_template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.10.1.1/examples/tool_chat_template_mistral_parallel.jinja
135135
```
136136
:::
137137
:::{tab-item} Phi-4-mini-instruct-int4-ov
@@ -411,7 +411,7 @@ Run the agentic application:
411411
:::{tab-item} Qwen3-8B
412412
:sync: Qwen3-8B
413413
```bash
414-
python openai_agent.py --query "What is the current weather in Tokyo?" --model Qwen/Qwen3-8B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all --stream --enable-thinking
414+
python openai_agent.py --query "What is the current weather in Tokyo?" --model Qwen/Qwen3-8B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --stream --enable-thinking
415415
```
416416
```bash
417417
python openai_agent.py --query "List the files in folder /root" --model Qwen/Qwen3-8B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all
@@ -420,7 +420,7 @@ python openai_agent.py --query "List the files in folder /root" --model Qwen/Qwe
420420
:::{tab-item} Qwen3-4B
421421
:sync: Qwen3-4B
422422
```bash
423-
python openai_agent.py --query "What is the current weather in Tokyo?" --model Qwen/Qwen3-4B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all --stream
423+
python openai_agent.py --query "What is the current weather in Tokyo?" --model Qwen/Qwen3-4B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --stream
424424
```
425425
```bash
426426
python openai_agent.py --query "List the files in folder /root" --model Qwen/Qwen3-4B --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all
@@ -435,13 +435,13 @@ python openai_agent.py --query "List the files in folder /root" --model meta-lla
435435
:::{tab-item} Mistral-7B-Instruct-v0.3
436436
:sync: Mistral-7B-Instruct-v0.3
437437
```bash
438-
python openai_agent.py --query "List the files in folder /root" --model mistralai/Mistral-7B-Instruct-v0.3 --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather
438+
python openai_agent.py --query "List the files in folder /root" --model mistralai/Mistral-7B-Instruct-v0.3 --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all --tool_choice required
439439
```
440440
:::
441441
:::{tab-item} Llama-3.2-3B-Instruct
442442
:sync: Llama-3.2-3B-Instruct
443443
```bash
444-
python openai_agent.py --query "List the files in folder /root" --model meta-llama/Llama-3.2-3B-Instruct --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather
444+
python openai_agent.py --query "List the files in folder /root" --model meta-llama/Llama-3.2-3B-Instruct --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server all
445445
```
446446
:::
447447
:::{tab-item} Phi-4-mini-instruct
@@ -537,9 +537,16 @@ input_num_tokens 50.0 2298.92 973.02 520.00 1556.50 2367.00 3100.75
537537
Testing model accuracy is critical for a successful adoption in AI application. The recommended methodology is to use BFCL tool like describe in the [testing guide](../accuracy/README.md#running-the-tests-for-agentic-models-with-function-calls).
538538
Here is example of the response from the OpenVINO/Qwen3-8B-int4-ov model:
539539
```
540+
--test-category simple
540541
{"accuracy": 0.9525, "correct_count": 381, "total_count": 400}
542+
543+
--test-category multiple
541544
{"accuracy": 0.89, "correct_count": 178, "total_count": 200}
545+
546+
--test-category parallel
542547
{"accuracy": 0.89, "correct_count": 178, "total_count": 200}
548+
549+
--test-category irrelevance
543550
{"accuracy": 0.825, "correct_count": 198, "total_count": 240}
544551
```
545552

0 commit comments

Comments
 (0)