-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Add support for Gemma 3 models within Fastchat #3705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if device_map == "sequential": | ||
device_map = "auto" | ||
# print("From pretrained kwargs", from_pretrained_kwargs) | ||
tokenizer = AutoTokenizer.from_pretrained(model_path, revision=revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I have a small suggestion:
tokenizer = AutoTokenizer.from_pretrained(model_path, revision=revision) | |
tokenizer = AutoTokenizer.from_pretrained(model_path, revision=revision, pad_to_multiple_of=8) |
See this similar issue in huggingface/transformers: huggingface/transformers#36815
Some prompts may trigger an error similar to the following:
ERROR | stderr | Exception in thread Thread-5 (<lambda>):
ERROR | stderr | Traceback (most recent call last):
ERROR | stderr | File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
ERROR | stderr | self.run()
ERROR | stderr | File "/usr/lib/python3.10/threading.py", line 953, in run
ERROR | stderr | self._target(*self._args, **self._kwargs)
ERROR | stderr | File "/home/example/projects/FastChat/fastchat/model/model_gemma3.py", line 81, in <lambda>
ERROR | stderr | target=lambda: model.generate(input_ids=input_ids, **generate_kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR | stderr | return func(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2465, in generate
ERROR | stderr | result = self._sample(
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 3434, in _sample
ERROR | stderr | outputs = model_forward(**model_inputs, return_dict=True)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR | stderr | return self._call_impl(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR | stderr | return forward_call(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
ERROR | stderr | output = func(self, *args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
ERROR | stderr | return func(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 942, in forward
ERROR | stderr | outputs: BaseModelOutputWithPast = self.model(
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR | stderr | return self._call_impl(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR | stderr | return forward_call(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
ERROR | stderr | output = func(self, *args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 722, in forward
ERROR | stderr | layer_outputs = decoder_layer(
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR | stderr | return self._call_impl(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR | stderr | return forward_call(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 420, in forward
ERROR | stderr | hidden_states, self_attn_weights = self.self_attn(
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR | stderr | return self._call_impl(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR | stderr | return forward_call(*args, **kwargs)
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 342, in forward
ERROR | stderr | attn_output, attn_weights = attention_interface(
ERROR | stderr | File "/home/example/projects/fastchat-venv/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 54, in sdpa_attention_forward
ERROR | stderr | attn_output = torch.nn.functional.scaled_dot_product_attention(
ERROR | stderr | RuntimeError: p.attn_bias_ptr is not correctly aligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for this, we actually ended up creating: https://www.github.com/transformerlab/transformerlab-inference.
We use that instead since fastchat hasn't been merging and stopped new developments.
This model is added on there and works without flash attention which was causing your original issue, please let me know if it also occuses without flash attention too?
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
Why are these changes needed?
Added support for Gemma 3 text version. It adds support for inference of all Gemma 3 models (base as well as instruct) for text only mode inference.
Related issue number (if applicable)
Closes #3697