The network outputs a number of vectors for which I compute distance and derive a SoftMin probability distribution (a type of clustering algorithm). This output is of the form `(total_clustering_prob, target_classes)`

:

```
nllloss = NLLLoss(size_average=False, reduce=True)
[0.32,0.33, 0.35] [1]
....
[0.31,0.39, 0.3] [2]
```

I plug it into NLLLoss:

` total_clustering_loss = nllloss(total_clustering_prob, target_classes)`

and the resulting loss doesn’t have a `grad_fn`

, so all the previous layers don’t compute gradients either. Why does this happen? Do I need to all computations within `nn.Module`

subclass?