How to parallarize Large Loss Computation

Bika · December 20, 2021, 9:46am

Hello,
I am facing an issue when back-propagating the loss due to large batch size due to the accumulation of the 8 GPU DataParallel. Is there is a way to divide the loss calculation into 8, then normalize the loss then use the loss.backward() but this time it should be smaller in size or how i can work this around ?

cbalioglu · December 21, 2021, 4:43pm

I assume you are using DataParallel. I would suggest using DDP as it calculates the loss separately on each worker after syncing the gradients.

Bika · December 22, 2021, 1:45pm

@cbalioglu can you recommend some resources please to learn how to efficiently apply DDP to my large model using multiple gpus ?

J_Johnson · December 22, 2021, 3:04pm

Check here:
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html