The network outputs a number of vectors for which I compute distance and derive a SoftMin probability distribution (a type of clustering algorithm). This output is of the form (total_clustering_prob, target_classes)
:
nllloss = NLLLoss(size_average=False, reduce=True)
[0.32,0.33, 0.35] [1]
....
[0.31,0.39, 0.3] [2]
I plug it into NLLLoss:
total_clustering_loss = nllloss(total_clustering_prob, target_classes)
and the resulting loss doesn’t have a grad_fn
, so all the previous layers don’t compute gradients either. Why does this happen? Do I need to all computations within nn.Module
subclass?