How to use DataParallel in backward?

bigxiuixu · May 31, 2017, 3:42am

So, how does the dataparallel work in backward when I only wrap my network(without loss) with data parallel?

https://discuss.pytorch.org/t/is-the-loss-function-paralleled-when-using-dataparallel/3346/2?u=bigxiuixu
By the way ，I am follow this discuss. I have tried computing loss as part of the forward function in model too, here is the code.

def forward(self, x,target):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.dropout(x, training=self.training)
    x = self.fc2(x)
    output = F.log_softmax(x)
    return F.nll_loss(output, target),output

and wrap the network +loss with dataparallel
model=torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6,7]).cuda()

But finally, it say that

Traceback (most recent call last):
  File "main.py", line 135, in <module>
    train(epoch)
  File "main.py", line 98, in train
    loss.backward()
  File "/home/lab/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 143, in backward
 'backward should be called only on a scalar (i.e. 1-element tensor) '
RuntimeError: backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable