File tree Expand file tree Collapse file tree 1 file changed +3
-4
lines changed
site-src/guides/epp-configuration Expand file tree Collapse file tree 1 file changed +3
-4
lines changed Original file line number Diff line number Diff line change @@ -51,7 +51,6 @@ shows a detailed analysis on how to estimate this.
51
51
```
52
52
max_kv_tokens_per_server = (HBM_size - model_size)/ kv_size_per_token
53
53
lru_indexer_capacity_per_server = (max_kv_tokens_per_server * avg_chars_per_token)/prefix_indexer_hash_block_size
54
- lru_indexer_capacity_total = max_num_servers * lru_indexer_capacity_per_server
55
54
```
56
55
57
56
Let's take an example:
@@ -78,9 +77,9 @@ Use the following reference command to install an inferencepool with the prefix
78
77
cache plugin environment variable configurations:
79
78
80
79
``` txt
81
- $ helm install triton -llama3-8b-instruct \
82
- --set inferencePool.modelServers.matchLabels.app=triton -llama3-8b-instruct \
83
- --set inferencePool.modelServerType=triton-tensorrt-llm \
80
+ $ helm install vllm -llama3-8b-instruct \
81
+ --set inferencePool.modelServers.matchLabels.app=vllm -llama3-8b-instruct \
82
+ --set inferencePool.modelServerType=vllm \
84
83
--set provider.name=[none|gke] \
85
84
--set inferenceExtension.env.EXPERIMENTAL_USE_SCHEDULER_V2=true \
86
85
--set inferenceExtension.env.ENABLE_PREFIX_CACHE_SCHEDULING=true \
You can’t perform that action at this time.
0 commit comments