Domain adaptation loss: loss.backward() fails

Hi!
I would like to implement this paper: DSNE, which is a domain adaptation technique. I would like to reproduce the results for 2 domains: the source domain (MNIST in my case) and the target domain (SVHN in my case).

The idea of this paper is to compute a loss which is the difference between two terms. The firs term is the maximum of the L2 distances between samples from the source domain and samples from the target domain which have the same labels. The second term is the minimum of the L2 distances between samples from the source domain and samples from the target domains which have different labels.

I tried to implement this loss like this:

class HausdorffLoss(torch.nn.Module):
    def __init__(self):
        super(HausdorffLoss, self).__init__()
    
    def forward(self, features_s, features_t, labels_s, labels_t):
        m = torch.cdist(labels_s.unsqueeze(-1).double(), labels_t.unsqueeze(-1).double(), p=1)
        m = torch.eq(m,0)
        #Compute l2 norm
        dist = torch.cdist(features_s.double(), features_t.double(), p=2)
        #Apply mask
        pos = torch.max(torch.masked_select(dist, m))
        neg = torch.min(torch.masked_select(dist, ~m))
        
        #Compute diff
        loss = torch.relu(pos - neg)
        return loss

where “features_s” and “features_t” are feature vectors computed by a VGG-16 network for the samples of the source domain and target domain respectively.

However in my training loop the call to loss.backward() throws this error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I am not sure why it is happening. I am still a beginner in pytorch so I would be glad if someone could help me.

Thanks!