Skip to content

Commit 2d6c134

Browse files
authored
Merge pull request #37 from codefuse-ai/v0.3.0_dev
V0.3.0 dev merge
2 parents bdef124 + c561f7b commit 2d6c134

File tree

148 files changed

+9628
-4097
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

148 files changed

+9628
-4097
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
.idea/
2-
.DS_Store
2+
.DS_Store
3+
*.log
4+
*/__pycache__/
5+
*.pyc

README.md

Lines changed: 59 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
1+
# MFTCoder: High Accuracy and Efficiency Multi-task Fine-Tuning Framework
22

33
<p align="center">
44
<img src="./assets/github-codefuse-logo-update.jpg" width="50%" />
@@ -21,7 +21,11 @@
2121
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/codefuse-ai/MFTCoder" />
2222
</a>
2323
</p>
24-
24+
<p>
25+
🤗 <a href="https://huggingface.co/codefuse-ai" target="_blank">HuggingFace
26+
</a>• 🤖<a href="https://modelscope.cn/organization/codefuse-ai" target="_blank"> ModelScope
27+
</a>
28+
</p>
2529

2630
[[中文]](README_cn.md) [**English**]
2731

@@ -41,6 +45,12 @@
4145

4246

4347
## News
48+
🔥🔥 [2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
49+
50+
🔥🔥 [2024/01/17] [CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It achieves top1 win-rate on Bigcode Leardboard.
51+
52+
🔥🔥 [2024/01/17] [CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
53+
4454
🔥🔥 [2023/11/07] [MFTCoder Paper](https://arxiv.org/abs/2311.02303) has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.
4555

4656
🔥🔥 [2023/10/20] [CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) has been released, achieving a pass@1 (greedy decoding) score of 48.8% on HumanEval, which gains 16% absolute improvement over the base model [Qwen-14b](https://huggingface.co/Qwen/Qwen-14B)
@@ -56,54 +66,58 @@
5666
### HumanEval Performance
5767
| Model | HumanEval(Pass@1) | Date |
5868
|:----------------------------|:-----------------:|:-------:|
59-
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
60-
|**CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
61-
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
62-
| GPT-4(zero-shot) | 67.0% | 2023/03 |
63-
| PanGu-Coder2 15B | 61.6% | 2023/08 |
64-
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
65-
| CodeLlama-34b-Python | 53.7% | 2023/08 |
66-
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
67-
| CodeLlama-34b | 48.8% | 2023/08 |
68-
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
69-
| OctoCoder | 46.2% | 2023/08 |
70-
| StarCoder-15B | 33.6% | 2023/05 |
71-
| QWen-14B | 32.3% | 2023/10 |
69+
| **CodeFuse-DeepSeek-33B** | **78.7%** | 2024/01 |
70+
| **CodeFuse-Mixtral-8x7B** | **56.1%** | 2024/01 |
71+
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
72+
| **CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
73+
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
74+
| GPT-4(zero-shot) | 67.0% | 2023/03 |
75+
| PanGu-Coder2 15B | 61.6% | 2023/08 |
76+
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
77+
| CodeLlama-34b-Python | 53.7% | 2023/08 |
78+
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
79+
| CodeLlama-34b | 48.8% | 2023/08 |
80+
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
81+
| OctoCoder | 46.2% | 2023/08 |
82+
| StarCoder-15B | 33.6% | 2023/05 |
83+
| QWen-14B | 32.3% | 2023/10 |
7284

7385

7486
## Articles
7587
[MFT Arxiv paper](https://arxiv.org/abs/2311.02303)
7688

7789
## Introduction
7890

79-
**High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs.**
91+
**High Accuracy and efficiency Multi-task Fine-tuning framework for Code LLMs.**
92+
93+
**CodeFuse-MFTCoder** is an open-source project of CodeFuse for accurate and efficient Multi-task Fine-tuning(MFT) on Large Language Models(LLMs), especially on Code-LLMs(large language model for code tasks).
94+
Moreover, we open source Code LLM models and code-related datasets along with the MFTCoder framework.
8095

81-
**CodeFuse-MFTCoder** is an open-source project of CodeFuse for multitasking Code-LLMs(large language model for code tasks), which includes models, datasets, training codebases and inference guides.
8296
In MFTCoder, we released two codebases for finetuning Large Language Models:
83-
- ```mft_peft_hf``` is based on the HuggingFace Accelerate and deepspeed framework.
84-
- ```mft_atorch``` is based on the [ATorch frameworks](https://github.com/intelligent-machine-learning/dlrover), which is a fast distributed training framework of LLM.
97+
- ```MFTCoder-accelerate``` is a framework with accelerate and DeepSpeed/FSDP. All tech-stacks are open-source and vibrant. We highly recommend you try this framework and make your fintuning accurate and efficient.
98+
- ```MFTCoder-atorch``` is based on the [ATorch frameworks](https://github.com/intelligent-machine-learning/dlrover), which is a fast distributed training framework of LLM.
8599

86100
The aim of this project is to foster collaboration and share advancements in large language models, particularly within the domain of code development.
87101

88102
### Frameworks
89-
![img.png](./assets/img.png)
103+
![img.jpg](./assets/img.jpg)
90104

91105
### Highlights
92106
:white_check_mark: **Multi-task**: Train models on multiple tasks while maintaining a balance between them. The models can even generalize to new, previously unseen tasks.
93107

94108
:white_check_mark: **Multi-model**: It integrates state-of-the-art open-source models such as gpt-neox, llama, llama-2, baichuan, Qwen, chatglm2, and more. (These finetuned models will be released in the near future.)
95109

96-
:white_check_mark: **Multi-framework**: It provides support for both HuggingFace Accelerate (with deepspeed) and [ATorch](https://github.com/intelligent-machine-learning/dlrover).
110+
:white_check_mark: **Multi-framework**: It provides support for both Accelerate (with Deepspeed and FSDP) and ATorch
97111

98-
:white_check_mark: **Efficient fine-tuning**: It supports LoRA and QLoRA, enabling fine-tuning of large models with minimal resources. The training speed meets the demands of almost all fine-tuning scenarios.
112+
:white_check_mark: **Efficient fine-tuning**: It supports LoRA, QLoRA as well as Full-parameters training, enabling fine-tuning of large models with minimal resources. The training speed meets the demands of almost all fine-tuning scenarios.
99113

100114
The main components of this project include:
101115
- Support for both SFT (Supervised FineTuning) and MFT (Multi-task FineTuning). The current MFTCoder achieves data balance among multiple tasks, and future releases will achieve a balance between task difficulty and convergence speed during training.
102-
- Support for QLoRA instruction fine-tuning, as well as LoRA fine-tuning.
103-
- Support for most mainstream open-source large models, particularly those relevant to Code-LLMs, such as Code-LLaMA, Starcoder, Codegeex2, Qwen, GPT-Neox, and more.
116+
- Support for QLoRA instruction fine-tuning, LoRA fine-tuning as well as Full-parameters fine-tuning.
117+
- Support for most mainstream open-source large models, particularly those relevant to Code-LLMs, such as DeepSeek-coder, Mistral, Mixtral, Chatglm3, Code-LLaMA, Starcoder, Codegeex2, Qwen, GPT-Neox, and more.
104118
- Support for weight merging between the LoRA adaptor and base models, simplifying the inference process.
105119
- Release of 2 high-quality code-related instruction fine-tuning datasets: [Evol-instruction-66k](https://huggingface.co/datasets/codefuse-ai/Evol-instruction-66k) and [CodeExercise-Python-27k](https://huggingface.co/datasets/codefuse-ai/CodeExercise-Python-27k).
106-
- Release of 2 models: [CodeFuse-13B](https://huggingface.co/codefuse-ai/CodeFuse-13B) and [CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B).
120+
- Release of many Code LLMs, please refer to organizations: [codefuse-ai on huggingface](https://huggingface.co/codefuse-ai) or [codefuse-ai on modelscope](https://modelscope.cn/organization/codefuse-ai).
107121

108122

109123
## Requirements
@@ -113,28 +127,36 @@ Next, we have provided an init_env.sh script to simplify the installation of req
113127
```bash
114128
sh init_env.sh
115129
```
116-
If you require flash attention, please refer to the following link for installation instructions: https://github.com/Dao-AILab/flash-attention
130+
We highly recommend training with flash attention(version >= 2.1.0, preferably 2.3.6), please refer to the following link for installation instructions: https://github.com/Dao-AILab/flash-attention
117131

118132

119133
## Training
120-
🚀 [Huggingface accelerate + deepspeed Codebase for MFT(Multi-task Finetuning)](./mft_peft_hf/README.md)
134+
As mentioned above, we open source two training frameworks. You could refer to their own READMEs for more details as followed.
121135

122-
🚀 [Atorch Codebase for MFT(Multi-task Finetuning)](./mft_atorch/README.md)
136+
If you are familiar with open source ```transformers```, ```DeepSpeed``` or ```FSDP```, we highly recommend you try:
123137

138+
🚀🚀 [MFTCoder-accelerate: Accelerate + Deepspeed/FSDP Codebase for MFT(Multi-task Finetuning)](mftcoder_accelerate/README.md)
124139

125-
## Models
126140

127-
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on Hugging Face:
141+
If you want to explore some new framework like atorch, you could check:
142+
143+
🚀 [MFTCoder-atorch: Atorch Codebase for MFT(Multi-task Finetuning)](mftcoder_atorch/README.md)
144+
145+
146+
## Models
128147

148+
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on both HuggingFace and ModelScope:
129149

130-
| Model | Base Model | Num of examples trained | Batch Size | Seq Length |
131-
|--------------------------------------------------------------------------------------------|--------------------|-------------------------|------------|------------|
132-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 600k | 80 | 4096 |
133-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python| | | 4096 |
134-
| [🔥🔥🔥 CodeFuse-StarCoder-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | Starcoder | 600k | 256 | 4096 |
135-
| [🔥🔥🔥 CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 1100k | 256 | 4096 |
136-
| [🔥 CodeFuse-13B](https://huggingface.co/codefuse-ai/CodeFuse-13B) | CodeFuse-13B | 66k | 64 | 4096 |
137150

151+
| Model | HuggingFace Links | ModelScope Links | Base Model | Num of examples trained | Batch Size | Seq Length |
152+
|--------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------|------|------------|------------|
153+
| 🔥🔥 CodeFuse-DeepSeek-33B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B) | DeepSeek-coder-33B | 60万 | 80 | 4096 |
154+
| 🔥🔥 CodeFuse-Mixtral-8x7B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 60万 | 80 | 4096 |
155+
| 🔥🔥 CodeFuse-CodeLlama-34B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
156+
| 🔥🔥 CodeFuse-CodeLlama-34B-4bits | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
157+
| 🔥🔥 CodeFuse-StarCoder-15B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 60万 | 80 | 4096 |
158+
| 🔥🔥 CodeFuse-QWen-14B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
159+
| 🔥🔥 CodeFuse-CodeGeex2-6B | [h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) | [m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
138160

139161

140162
## Datasets

0 commit comments

Comments
 (0)