Hi, I have 4 separate nn.Modules which I instantiate inside my actual final model class (this is also nn.Module), so I have something like,
class M(nn.Module): def __init__(): super(M, self).__init__() self.A = A() self.B = B() self.C = C() self.D = D()
for training I do the following :
model = M().to(device) model = nn.DataParallel(model)
However I do not see the other 3 GPUs get utilised at all in a 4 GPU machine.
I have done
export CUDA_VISIBLE_DEVICES=0,1,2,3 before starting training.
Can someone give an insight to what I am doing wrong ?