-
Notifications
You must be signed in to change notification settings - Fork 13k
CANN: implement LRU cache for ACL graphs in CANN backend #15814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects. - Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded. - Updated push, move_to_front, and clear methods to manage cached graphs efficiently. - Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.
Signed-off-by: noemotiovon <[email protected]>
32b25b7
to
15b4ff7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this awsome feature. It does improve the performance.
* @param node Shared pointer to the ggml_cann_graph to move. | ||
*/ | ||
void move_to_front(std::shared_ptr<ggml_cann_graph> node) { | ||
cache_list.remove(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete a list in array will go through all elements in array. It's better to use priority queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation has a time complexity of O(n), but even if I switch to a priority queue, it would still require a full traversal. I plan to add a map member variable to reduce the time complexity to O(1).
Test 1: Compiled with ACL graph
With Acl Graph = on
With Acl Graph = off
|
Test 2: Compiled without ACL graph
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little more needs to be modified, it’s very close to perfect.
Signed-off-by: noemotiovon <[email protected]>
Test 3: Test of graph capture timesGGML_CANN_GRAPH_CACHE_CAPACITY=1, falling back to the old single-graph scenario.
GGML_CANN_GRAPH_CACHE_CAPACITY=32(default is 12), In the scenario of using the new LRU cache, try to ensure that the configured value is greater than parallel size.
|
* CANN: implement LRU cache for ACL graphs in CANN backend - Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects. - Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded. - Updated push, move_to_front, and clear methods to manage cached graphs efficiently. - Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend. * fix typo * The LRU cache capacity can be configured via an env variable Signed-off-by: noemotiovon <[email protected]> * refactory acl graph * refactory && fix review comments Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>
What does this PR do?
implement LRU cache for ACL graphs in CANN backend.