Skip to content

Conversation

ariG23498
Copy link

In this PR I have added Gemma's example. I have also reported the results in the README.

@h-lunah
Copy link

h-lunah commented Aug 11, 2024

this will also severely increase ram usage as your code upcasts BF16 into FP32

@hahmad2008
Copy link

What these params group_size_1 & group_size_2 represent?

    SE.Gemma.flash_self_extend_forward, group_size_1=8, group_size_2=1024

and how to get the desired extended context length based on the equation in the paper?
new_context_length = (old_context_length - neighbor_window) * group_size + neighbor_window

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants