Hi,
I am using nn.DataParallel on 4 GTX 1080 gpus, with
net = Net().cuda()
net = nn.DataParallel(net)
optimizer = torch.optim.Adam(net.parameters())
criterion = nn.CrossEntropyLoss().cuda()
for training
pred = net(data)
loss = criterion(pred,label)
optimizer = zero_grad()
loss.backward()
optimizer.step()
In this case, the training and validation loss stay around 0.4. While training on a single 1080ti gpu, both loss can reduce to 0.05.
Does anyone know what’s wrong? Any suggestion is appreciated.