DDP is not accelerating my training #12771

nian-liu · 2022-04-15T11:42:00Z

nian-liu
Apr 15, 2022

Hi all,

Today I tried to use DDP to accelerate my training. When not using it, I trained my model with batchsize=6 on 1 GPU. The iteration step per epoch is 6073 and the training time for the first epoch is 1h 29m. When using DDP with 1 node and 4 GPUs, the iteration step per epoch is still 6073 and the training time for the first epoch is 1h 9m. The batchsize was not changed. It seems that DDP is not working. I am using DDP as follows:
trainer = pl.Trainer( accelerator="gpu", devices=4, num_nodes=1, strategy="ddp", default_root_dir=models_save_path,
The training log without DDP is:

The log with DDP is:

Any suggestion?

Answered by rohitgr7

Apr 20, 2022

looks like in your case, DDP is not triggered for some reason since if you are not changing the batch_size and total batches in the progress bar should be reduced with DDP on 4 GPUs.

did you see any logs like this when you call trainer.fit ??

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4

View full answer

rohitgr7 · 2022-04-20T08:47:18Z

rohitgr7
Apr 20, 2022

looks like in your case, DDP is not triggered for some reason since if you are not changing the batch_size and total batches in the progress bar should be reduced with DDP on 4 GPUs.

did you see any logs like this when you call trainer.fit ??

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4

1 reply

nian-liu Apr 23, 2022
Author

Problem solved, I forgot to remove the flag gpus=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP is not accelerating my training #12771

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

DDP is not accelerating my training #12771

Uh oh!

nian-liu Apr 15, 2022

Replies: 1 comment · 1 reply

Uh oh!

rohitgr7 Apr 20, 2022

Uh oh!

Uh oh!

nian-liu Apr 23, 2022 Author

nian-liu
Apr 15, 2022

Replies: 1 comment 1 reply

rohitgr7
Apr 20, 2022

nian-liu Apr 23, 2022
Author