Skip to content

Commit 6bda90b

Browse files
committed
Merge branch 'main' into release/3.10
2 parents d20dbc3 + 7507db0 commit 6bda90b

File tree

4 files changed

+4
-4
lines changed

4 files changed

+4
-4
lines changed

docs/source/Megatron-SWIFT/Quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
161161

162162

163163
## 训练技巧
164-
- 增加训练吞吐量方法:使用packing、增加DP、减少重计算、增加计算通信overlap。MoE还可以通过丢弃tokens加速。
164+
- 增加训练吞吐量方法:使用packing(不要开启流式)、增加DP、减少重计算、增加计算通信overlap。MoE还可以通过丢弃tokens加速。
165165
- 并行技术选择:
166166
- Megatron-SWIFT的并行技术采用zero1(默认开启use_distributed_optimizer)+各种并行技术的组合。
167167
- DP的速度最快,但显存占用较多,使用其他并行技术以降低显存占用。

docs/source_en/Megatron-SWIFT/Quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
164164

165165

166166
## Training Tips
167-
- Methods to increase training throughput: use packing, increase data parallelism (DP), reduce recomputation, and increase compute-communication overlap. MoE models can also be accelerated by dropping tokens.
167+
- Methods to increase training throughput: use packing (do not enable streaming), increase data parallelism (DP), reduce recomputation, and increase compute-communication overlap. MoE models can also be accelerated by dropping tokens.
168168
- Parallelism choices:
169169
- Megatron-SWIFT uses ZeRO-1 (use_distributed_optimizer enabled by default) combined with various parallelism techniques.
170170
- DP is the fastest but consumes the most memory; use other parallel techniques to reduce memory usage.

swift/llm/dataset/preprocessor/core.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ def preprocess(self, row: Dict[str, Any]) -> Optional[Dict[str, Any]]:
375375
if isinstance(response, (list, tuple)):
376376
from transformers.utils import strtobool
377377
# sometimes response is a list, pick one randomly
378-
if strtobool(os.environ.get('RANDOM_DATASET_RESPONSE', 'True')):
378+
if strtobool(os.environ.get('RANDOM_DATASET_RESPONSE', 'False')):
379379
response = self.random_state.choice(response)
380380
else:
381381
response = response[0]

swift/megatron/model/gpt_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ def _preprocess(
217217
rotary_seq_len,
218218
packed_seq=packed_seq,
219219
)
220-
if packed_seq:
220+
if packed_seq and not self.config.apply_rope_fusion:
221221
assert position_ids.shape[0] == 1, f'position_ids.shape: {position_ids.shape}'
222222
rotary_pos_emb = rotary_pos_emb[position_ids[0]]
223223

0 commit comments

Comments
 (0)