Hi, I have 4 separate nn.Modules which I instantiate inside my actual final model class (this is also nn.Module), so I have something like,
class M(nn.Module):
def __init__():
super(M, self).__init__()
self.A = A()
self.B = B()
self.C = C()
self.D = D()
for training I do the following :
model = M().to(device)
model = nn.DataParallel(model)
However I do not see the other 3 GPUs get utilised at all in a 4 GPU machine.
I have done export CUDA_VISIBLE_DEVICES=0,1,2,3
before starting training.
Can someone give an insight to what I am doing wrong ?