Memory issue of using nn.DataParallel

Hi guys,

I’m currently using nn.DataParallel for mutli-gpu (8-gpu) training in a single node. However, if I put the data and model to devices[0], I found the memory on GPU 0 will be huge and make the program exits (cuda out of memory) at the begining of training. Can anyone help?

BTW, I find if I use DistributedDataParallel, the memory is fine.

Environment:
pytorch 1.0.1
cuda9.0

1 Like

This effect is described by @Thomas_Wolf in this blog post.

We generally recommend using DDP. :wink:

Thanks. Which syncBatchNormalization do you recommend when using DDP? I’m not sure if the default nn.BatchNorm2d considers multi-gpu ops?