nn.CrossEntropy(reduce = False)

WERush · December 7, 2017, 1:54pm

In Pytorch3.0, the Loss function could compute per-sample losses by setting the reduce = False. But when I backpropagate the loss, I also need to feed a Variable in the loss.backward() function. What should I feed into?

For example:

>>> loss = nn.CrossEntropyLoss(reduce=False)
>>> input = autograd.Variable(torch.randn(3, 5), requires_grad=True)
>>> target = autograd.Variable(torch.LongTensor(3).random_(5))
>>> output = loss(input, target)
>>> output.backward(?) # what should I feed into? Any tutorial?

Thanks.

albanD · December 7, 2017, 2:06pm

For simplicity, in pytorch, if a Variable contains a single value, var.backward() is equivalent to var.backward(torch.Tensor([1])).
If you use the loss without reduction (to be able to use the value for each sample in your batch independently?), you get as output a Variable which contains one entry per sample in your batch. So you should provide a gradient for each of them.

All the following are equivalent

# With reduce=True
# loss.size() == [1]
loss.backward()
loss.backward(torch.Tensor([1]))
loss.backward(torch.ones_like(loss.data))

# With reduce=False
# loss.size() == [batch_size, 1] or [batch_size]
loss.sum().backward()
loss.backward(torch.ones_like(loss.data))