-
-
Notifications
You must be signed in to change notification settings - Fork 249
Adding full weight finetuning #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding full weight finetuning #499
Conversation
|
I think it should be wiser to first merge this then slowly add the other parts grpo, dpo, what do you think @Blaizzy? |
|
was finale able to train the first model for 100 steps: |
|
usualy it crashed OOM after 10 steps :D |
|
You can now train the vision part too and 4bit quant training works as well only wen 2 though qwen2.5 gives off a nan loss: python -m mlx_vlm.lora |
|
so updated some stuff and its a Lott faster and uses a lot less ram: Iter 29: Train loss 8.377, Learning Rate 1.000e-04, It/sec 7.091, Tokens/sec 226.918, Trained Tokens 928, Peak mem 1.839 GB |
|
data from step 40 before: It/sec 1.851, Peak mem 7.643 GB |
|
Qwen 2 models work, both quant and full on both lora and full weight training. |
|
Qwen2.5 is added too |
|
@Blaizzy here is a test adapter llm https://huggingface.co/Goekdeniz-Guelmez/MLX-VLM-Qwen2-VL-2B-Instruct-bf16-VisualWebInstruct-lora/blob/main/README.md the comand used here is: python -m mlx_vlm.lora --model-path mlx-community/Qwen2-VL-2B-Instruct-bf16 --dataset TIGER-Lab/VisualWebInstruct --dataset-config 'example' --output-path Desktop/Qwen2-VL-2B-Instruct-bf16-VisualWebInstruct-lora --batch-size 1 --epochs 1 --learning-rate 1e-6 --grad-checkpoint --train-on-completions --steps-per-report 1 @Blaizzy would be great if you can try out a larger model using this command. |
|
@Goekdeniz-Guelmez and @Blaizzy : Any update on this please? |
|
@sachinraja13 I will be continuing woking on it later this week, after finishing all the Gabliteration project todo's. |
|
Many thanks for all your contributions, @Goekdeniz-Guelmez ! Very helpful! Will be looking forward! |
|
however you can try the Qwen2, 2.5, 3, Gemma 3 and let me know how it is. |
…ypes and update supported models list
…els list adding Qwen3 Omni MoE
|
How are doing? Any updates here? I’m making some major changes in #681 and after that I will add vision attention chunking to reduce peak memory usage and OOM errors when processing images with 2K and above resolution |
|
Awesome, could you fix the tests? Also, I got a 512GB studio we can run training tests on |
|
hey @Blaizzy would you mind trying bigger Owen models the whole Owen family is implemented, also the user can now pass a custom prompt format like |
|
Sure, I can do that. However, I would recommend we adapt to the new pattern in #681 first It will be merged in the next hour or two. |
Those are awesome additions Great work! Let's make those changes, tests and we merge and release today 🚀 |
…ekdeniz-Guelmez/mlx-vlm into adding-full-weight-finetuning
This is a new branch since the old one was not comprehendible and lead to too many errors, the old PR will later get closed. Full weight finetuning works on the Qwen models also quantised too.