Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
1-1 微调方法
微调方法采用了Lora微调方法,在测试ptune-v2时,因为时间紧张,故放弃。

在lora测试中,对attention的所有层的wq和wv进行了微调因为不了解layer选择与排除方法,因此也没有设定某层进行排除,选择了大力飞砖。
在lora测试中,对attention的所有层的wq和wv进行了微调(后期的话:这里其实我建议微调ovqk四层,虽然一些论文是微调q、v就可以做到一样的效果,但是可能有trick),因为不了解layer选择与排除方法(根据后期的经验来看,可以冻结一些特定的层,达到微调更多参数的目的),因此也没有设定某层进行排除,选择了大力飞砖。
Lora的参数设置如表1-1所示。

Lora参数设置:
Expand All @@ -25,6 +25,7 @@ Target_modules ‘.*wq|*.wv’
1-2 数据集处理
数据集处理的时候对数据进行了筛选,经过多次尝试,发现对如果引入英文数据QA,会造成模型能力的下降,虽然在英文数据集在格式上解释性更强,但对简单数据的微调上表现并不好,也有相关的论文指出了这一点。因此在数据上全部采用了中文数据集作训练。我也对数据集做了一定的探索,整体的问题比例可以查看图1-1。跟随着整体比例与前期的一些测试结果,最终将微调数据集的大小106000,各问题的具体条数为见表1-2,训练时间整体上也有一个较好的把控。

后期:在参加第二轮的时候,发现第一轮犯了一个很大的错误,就是没有对数据进行清洗,里面有非常多的重复数据,从其他开源选手的代码中也可以看到。
![输入图片说明](1.png)

图1-1:数学问题微调数据集问题比例图
Expand Down Expand Up @@ -57,6 +58,8 @@ Target_modules ‘.*wq|*.wv’

这里的重要超参数分别列为runner_config、optimizer、lr_schedule,分别如下。

这里对微调效果影响大的除了上面微调的层外和下面的参数外,还有lora_rank,在一轮的时候忽略了lora_rank,但是lora_rank也是非常重要的!

2-1 runner_config
Runner_config参数设置:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,23 @@
"metadata": {},
"source": [
"# 设置比赛所需要的环境\n",
"### 其基本过程如下:\n",
"**因为每次重启netobook后都会设置新的环境,利用jupyter脚本可以一键执行,非常方便**\n",
"## 其基本过程如下:\n",
"- 1. mindspore安装\n",
"- 2. mindformers安装\n",
"- 3. 环境变量和其他依赖安装\n",
"- 4. 模型权重和tokenizer文件准备\n",
"- 5. 数据集准备(这一步可以直接下载Mindrecord格式的数据集)\n",
"- 6. 开始微调"
"- 3. 模型权重和tokenizer文件准备\n",
"- 4. 数据集准备(这一步可以直接下载Mindrecord格式的数据集)\n",
"- 5. 开始微调\n",
"- 6. 模型合并\n",
"- 7. 原有能力评估\n",
"- 8. 微调结果测试"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.Mindspore安装"
]
},
{
Expand All @@ -23,6 +33,13 @@
"! pip install mindspore==2.3.0RC2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.Mindformers安装"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -47,6 +64,13 @@
"pip install tiktoken"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.模型权重和tokenizer文件准备"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -59,6 +83,13 @@
"wget https://2024-ascend-innovation-contest-mindspore.obs.cn-southwest-2.myhuaweicloud.com/topic2-finetune/tokenizer.model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4.数据集准备"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -85,7 +116,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 数据处理"
"**也可以下载MindRecord数据集**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"wget https://2024-ascend-innovation-contest-mindspore.obs.cn-southwest-2.myhuaweicloud.com/topic2-finetune/train-fastchat256-mindrecore.zip\n",
"unzip train-fastchat256-mindrecore.zip"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**处理自己抽取出的数据集**"
]
},
{
Expand Down Expand Up @@ -121,25 +170,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 下载MindRecord数据集"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"wget https://2024-ascend-innovation-contest-mindspore.obs.cn-southwest-2.myhuaweicloud.com/topic2-finetune/train-fastchat256-mindrecore.zip\n",
"unzip train-fastchat256-mindrecore.zip"
"## 5. 开始微调"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 启动4卡微调"
"### 4卡微调"
]
},
{
Expand Down Expand Up @@ -246,22 +284,11 @@
"--train_data /home/ma-user/work/math_problem/90k-train-fastchat256.mindrecord\" 4"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"import pytz\n",
"print(datetime.now(pytz.timezone('Asia/Shanghai')).strftime('%Y-%m-%d %H:%M:%S'),'changke')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型合并"
"## 6.模型合并"
]
},
{
Expand Down Expand Up @@ -328,7 +355,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 下载评估数据集"
"## 7.原有能力评估\n",
"**下载评估数据集**"
]
},
{
Expand Down Expand Up @@ -370,7 +398,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 评估模型的原有能力"
"**评估模型的原有能力**"
]
},
{
Expand All @@ -380,7 +408,7 @@
"outputs": [],
"source": [
"%%bash\n",
"wget https://2024-ascend-innovation-contest-mindspore.obs.cn-southwest-2.myhuaweicloud.com/topic2-finetune/run_llama3_8b_8k_800T_A2_64G_lora_256_base_eval.yaml -P /home/ma-user/work/mindformers/research/llama3/\n"
"wget https://2024-ascend-innovation-contest-mindspore.obs.cn-southwest-2.myhuaweicloud.com/topic2-finetune/run_llama3_8b_8k_800T_A2_64G_lora_256_base_eval.yaml -P /home/ma-user/work/mindformers/research/llama3/"
]
},
{
Expand All @@ -406,7 +434,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 测试模型"
"**这里在测试的时候,因为需要测试5个模型文件,比较耗时,使用4卡机器可以并行测试**"
]
},
{
Expand Down Expand Up @@ -513,30 +541,11 @@
"--device_id 7 | tee /home/ma-user/work/origin_dataset/2024-07-31-3_log.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"cd /home/ma-user/work/mindformers/ \n",
"python run_mindformer.py \\\n",
"--config research/llama3/run_llama3_8b_8k_800T_A2_64G_lora_256_base_eval.yaml \\\n",
"--eval_dataset_dir /home/ma-user/work/origin_dataset/squad8192.mindrecord \\\n",
"--run_mode eval \\\n",
"--load_checkpoint /home/ma-user/work/mycheckpoint/rank_0/2024-07-29-lora-llama3-0.ckpt \\\n",
"--epochs 1 \\\n",
"--batch_size 1 \\\n",
"--use_parallel False \\\n",
"--device_id 0 | tee /home/ma-user/work/origin_dataset/2024-07-29_log.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 模型数学计算结果推理"
"## 微调结果测试"
]
},
{
Expand Down Expand Up @@ -816,23 +825,9 @@
"source": [
"import moxing as mox\n",
"\n",
"#下载一个OBS文件夹sub_dir_0,从OBS下载至Notebook\n",
"#mox.file.copy_parallel('obs://bucket_name/sub_dir_0', '/home/ma-user/work/sub_dir_0')\n",
"#下载一个OBS文件obs_file.txt,从OBS下载至Notebook\n",
"#mox.file.copy('obs://bucket_name/obs_file.txt', '/home/ma-user/work/obs_file.txt')\n",
"\n",
"#上传一个OBS文件夹sub_dir_0,从Notebook上传至OBS\n",
"mox.file.copy_parallel('/home/ma-user/work/20240718-output', 'obs://modelart-kervin/fine_tuning/result-2')\n",
"#上传一个OBS文件obs_file.txt,从Notebook上传至OBS\n",
"#mox.file.copy('/home/ma-user/work/obs_file.txt', 'modelart-kervin.obs.cn-southwest-2.myhuaweicloud.com://bucket_name/obs_file.txt')\n"
"mox.file.copy_parallel('/home/ma-user/work/20240718-output', 'obs://xxx/xxx/result-2')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down