>>> m = nn.LogSoftmax()
>>> loss = nn.NLLLoss()
>>> # input is of size nBatch x nClasses = 3 x 5
>>> input = autograd.Variable(torch.randn(3, 5), requires_grad=True)
>>> # each element in target has to have 0 <= value < nclasses
>>> target = autograd.Variable(torch.LongTensor([1, 0, 4]))
>>> output = loss(m(input), target)
>>> output.backward()
With most NN code, you don’t want to set requires_grad=True unless you explicitly want the gradient w.r.t. to your input. In this example, however, requires_grad=True is necessary because otherwise there would be no gradients to compute, since there are no model parameters.