Hi
Very cool framework, and I like this direction a lot!
Just want to point out that in KBLaM, we considered a similiar approach for augmenting external memory into pre-trained LLM:
https://arxiv.org/abs/2410.10450
https://www.microsoft.com/en-us/research/blog/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms/
where we concat the KV of compressed external knowledge in front of the input text's QKV for knowledge augmentation (using attention as the retrieval mechanism).
There are also some follow-up works that try to further imporve the scalability, e.g. https://arxiv.org/abs/2510.17934
Hi
Very cool framework, and I like this direction a lot!
Just want to point out that in KBLaM, we considered a similiar approach for augmenting external memory into pre-trained LLM:
https://arxiv.org/abs/2410.10450
https://www.microsoft.com/en-us/research/blog/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms/
where we concat the KV of compressed external knowledge in front of the input text's QKV for knowledge augmentation (using attention as the retrieval mechanism).
There are also some follow-up works that try to further imporve the scalability, e.g. https://arxiv.org/abs/2510.17934