To generate adversarial examples quickly, I would like to generate them in batches instead of individually for parallelism. This means I need the loss’ dimension to match the batch size. However, the code
logits = net(x)
net.zero_grad()
prediction = logits.data.max(1)[1]
one_hot = Variable(torch.FloatTensor(x.size(0), 1000).zero_().scatter_(1, prediction, 1))
loss = -torch.sum(F.log_softmax(logits) * one_hot, 1)
loss.backward()
gives the error
backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable
How can I differentiate a loss expressed as a tensor?