So, how does the dataparallel work in backward when I only wrap my network(without loss) with data parallel?
https://discuss.pytorch.org/t/is-the-loss-function-paralleled-when-using-dataparallel/3346/2?u=bigxiuixu
By the way ,I am follow this discuss. I have tried computing loss as part of the forward function in model too, here is the code.
def forward(self, x,target):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
output = F.log_softmax(x)
return F.nll_loss(output, target),output
and wrap the network +loss with dataparallel
model=torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6,7]).cuda()
But finally, it say that
Traceback (most recent call last):
File "main.py", line 135, in <module>
train(epoch)
File "main.py", line 98, in train
loss.backward()
File "/home/lab/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 143, in backward
'backward should be called only on a scalar (i.e. 1-element tensor) '
RuntimeError: backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable