Skip to content

Commit 15b4ff7

Browse files
committed
The LRU cache capacity can be configured via an env variable
Signed-off-by: noemotiovon <[email protected]>
1 parent 4c9b10a commit 15b4ff7

File tree

2 files changed

+14
-1
lines changed

2 files changed

+14
-1
lines changed

docs/backend/CANN.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,3 +314,7 @@ Converting the matmul weight format from ND to NZ to improve performance. Enable
314314
### GGML_CANN_ACL_GRAPH
315315

316316
Operators are executed using ACL graph execution, rather than in op-by-op (eager) mode. Enabled by default.
317+
318+
### GGML_CANN_GRAPH_CACHE_CAPACITY
319+
320+
Maximum number of compiled CANN graphs kept in the LRU cache, default is 12. When the number of cached graphs exceeds this capacity, the least recently used graph will be evicted.

ggml/src/ggml-cann/common.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,12 +368,21 @@ struct ggml_cann_graph {
368368
* move existing graphs to the front (most recently used), and clear the cache.
369369
*/
370370
struct ggml_cann_graph_lru_cache {
371-
size_t capacity = 12; /**< Maximum number of graphs in the cache. */
371+
size_t capacity; /**< Maximum number of graphs in the cache. */
372372

373373
std::list<std::shared_ptr<ggml_cann_graph>> cache_list; /**< List storing cached graphs. */
374374

375375
std::shared_ptr<ggml_cann_graph> matched_graph = nullptr; /**< Pointer to a recently matched graph. */
376376

377+
ggml_cann_graph_lru_cache() {
378+
std::string env_val = get_env("GGML_CANN_GRAPH_CACHE_CAPACITY").value_or("12");
379+
try {
380+
capacity = std::stoul(env_val);
381+
} catch (...) {
382+
capacity = 12; // fallback to default if invalid
383+
}
384+
}
385+
377386
/**
378387
* @brief Push a new graph to the front of the cache.
379388
* If the cache exceeds capacity, the least recently used graph is removed.

0 commit comments

Comments
 (0)