-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Description
I need to extend the context length of llama3.1-8b from suppose the context length is 8k up to 128k. And the same for gemma2.
I see that the there is example for gemma2 here
However I don't know how to select these is there any formula to follow?
Gemma2:
original_gemma_forward = GemmaAttention.forward
self_extend_forward = partial(
SE.Gemma.flash_self_extend_forward, group_size_1=8, group_size_2=1024
)
And for llama3.1:
window_size = 1024
group_size = 32
use_flash = True
SelfExtend.apply(model, group_size, window_size, enable_flash_attention=use_flash, flash_attention_impl="flash_attn") ## flash_attention_impl="triton" or "flash_attn"
btw what is the meaning of these logs Passkey
?
#Tokens of Prompt: 9992 Passkey target: 58328
Metadata
Metadata
Assignees
Labels
No labels