Fix max_prompt_length #3732

atobiszei · 2025-10-27T08:05:48Z

Add model_distribution_policy
Enable setting arbitrary plugin_config for text generation models --plugin_config
Enable using --cache_dir parameter for text_generation models

Ticket:CVS-175054

docs/parameters.md

atobiszei · 2025-10-28T14:48:05Z

docs/parameters.md

 | `--dynamic_split_fuse`                | `bool`       | Enables dynamic split fuse algorithm. Default: true.                                                                       |
 | `--max_prompt_len`                    | `integer`    | Sets NPU specific property for maximum number of tokens in the prompt.                                                     |
 | `--kv_cache_precision`                | `string`     | Reduced kv cache precision to `u8` lowers the cache size consumption. Accepted values: `u8` or empty (default).            |
+| `--model_distribution_policy`         | `string`     | TENSOR_PARALLEL distributes tensor to multiple sockets/devices and processes it in parallel. PIPELINE_PARALLEL distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |


Suggested change

| `--model_distribution_policy` | `string` | TENSOR_PARALLEL distributes tensor to multiple sockets/devices and processes it in parallel. PIPELINE_PARALLEL distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |

| `--model_distribution_policy` | `string` | `TENSOR_PARALLEL` distributes tensor to multiple sockets/devices and processes it in parallel. `PIPELINE_PARALLEL` distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |

* Add model_distribution_policy

atobiszei force-pushed the atobisze_max_prompt_len branch 4 times, most recently from e045a0d to 79b6523 Compare October 28, 2025 12:55

atobiszei requested review from dtrawins and rasapala October 28, 2025 14:38

atobiszei commented Oct 28, 2025

View reviewed changes

docs/parameters.md Outdated Show resolved Hide resolved

atobiszei commented Oct 28, 2025

View reviewed changes

atobiszei added 6 commits October 29, 2025 16:26

Fix max_prompt_length

dc59371

* Add model_distribution_policy

Extend fix to ov cache dir

ba1db3a

Fix test

76bc42c

Improve error messaging

e946ed7

Apply suggestion from @atobiszei

d59d47e

Self-review

2702650

atobiszei force-pushed the atobisze_max_prompt_len branch from b39e465 to 2702650 Compare October 29, 2025 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix max_prompt_length #3732

Fix max_prompt_length #3732

atobiszei commented Oct 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

atobiszei Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	\| `--model_distribution_policy` \| `string` \| TENSOR_PARALLEL distributes tensor to multiple sockets/devices and processes it in parallel. PIPELINE_PARALLEL distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). \|
	\| `--model_distribution_policy` \| `string` \| `TENSOR_PARALLEL` distributes tensor to multiple sockets/devices and processes it in parallel. `PIPELINE_PARALLEL` distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). \|

Fix max_prompt_length #3732

Are you sure you want to change the base?

Fix max_prompt_length #3732

Conversation

atobiszei commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

atobiszei Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

atobiszei commented Oct 27, 2025 •

edited

Loading