Skip to content

Conversation

@atobiszei
Copy link
Collaborator

@atobiszei atobiszei commented Oct 27, 2025

  • Add model_distribution_policy
  • Enable setting arbitrary plugin_config for text generation models --plugin_config
  • Enable using --cache_dir parameter for text_generation models

Ticket:CVS-175054

@atobiszei atobiszei force-pushed the atobisze_max_prompt_len branch 4 times, most recently from e045a0d to 79b6523 Compare October 28, 2025 12:55
| `--dynamic_split_fuse` | `bool` | Enables dynamic split fuse algorithm. Default: true. |
| `--max_prompt_len` | `integer` | Sets NPU specific property for maximum number of tokens in the prompt. |
| `--kv_cache_precision` | `string` | Reduced kv cache precision to `u8` lowers the cache size consumption. Accepted values: `u8` or empty (default). |
| `--model_distribution_policy` | `string` | TENSOR_PARALLEL distributes tensor to multiple sockets/devices and processes it in parallel. PIPELINE_PARALLEL distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `--model_distribution_policy` | `string` | TENSOR_PARALLEL distributes tensor to multiple sockets/devices and processes it in parallel. PIPELINE_PARALLEL distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |
| `--model_distribution_policy` | `string` | `TENSOR_PARALLEL` distributes tensor to multiple sockets/devices and processes it in parallel. `PIPELINE_PARALLEL` distributes different tensors to process by each device. Accepted values: `TENSOR_PARALLEL`, `PIPELINE_PARALLEL` or empty (default). |

@atobiszei atobiszei force-pushed the atobisze_max_prompt_len branch from b39e465 to 2702650 Compare October 29, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants