Loss function doesn't compute gradient

The network outputs a number of vectors for which I compute distance and derive a SoftMin probability distribution (a type of clustering algorithm). This output is of the form (total_clustering_prob, target_classes):

nllloss = NLLLoss(size_average=False, reduce=True)

[0.32,0.33, 0.35] [1]
[0.31,0.39, 0.3] [2]

I plug it into NLLLoss:
total_clustering_loss = nllloss(total_clustering_prob, target_classes)
and the resulting loss doesn’t have a grad_fn, so all the previous layers don’t compute gradients either. Why does this happen? Do I need to all computations within nn.Module subclass?

No. All computations need not be within nn.Module subclass.
You have to traceback and see if the computation graph remains connected (if you use detach() / .item() etc, the graph disconnects) and at least one operand (parameters/inputs) in the graph has requires_grad=True.

1 Like

I didn’t realize .item() detaches the graph. I owe you a beer!