Gradient of Tensor Loss Not Scalar Loss

To generate adversarial examples quickly, I would like to generate them in batches instead of individually for parallelism. This means I need the loss’ dimension to match the batch size. However, the code

logits = net(x)
net.zero_grad()
prediction = logits.data.max(1)[1]
one_hot = Variable(torch.FloatTensor(x.size(0), 1000).zero_().scatter_(1, prediction, 1))
loss = -torch.sum(F.log_softmax(logits) * one_hot, 1)
loss.backward()

gives the error

backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable

How can I differentiate a loss expressed as a tensor?

Can you just sum or average it?

In adversarial images, each perturbation is crafted for an individual example not for a batch of examples.

Yes, but if you sum the loss and then backward() it the loss gradients will still be propagated to the individual examples.

I suppose you are right because if we have (x_0, y_0), (x_1, y_1), then \nabla_{x_0} loss(x_1, y_1) = 0.