Custom Loss Function - Heatmaps

Hey, I am doing keypoint detecion with heatmaps. Where i have x,y cordinates for each image and i made heatmaps for them. I want to use a MSE loss function, but i cant use it on heatmaps alone since the numbers are very low(-1,1). I wrote a custom loss function that gets the cordinates from each of the heatmpas and then i calculate the mse with the x and y cordinates. But i get this error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

Here is my loss function:

class HourGlassLoss(torch.nn.Module):
def init(self,keypoint_number,batch_size):
super(HourGlassLoss, self).init()
self.keypoint_number = keypoint_number
self.batch_size = batch_size

def forward(self, output, target):
    predicted = output.clone().detach().requires_grad_(True)
    true = target.clone().detach().requires_grad_(True)
    _, max_indices = torch.max(predicted.view(predicted.size(0), predicted.size(1), -1), dim=2)
    keypoints = torch.stack([max_indices // predicted.size(3), max_indices % predicted.size(3)], dim=2).float()
    keypoints = keypoints[0]
    predicted = keypoints.unsqueeze(0)


    _, max_indices = torch.max(true.view(true.size(0), true.size(1), -1), dim=2)
    keypoints = torch.stack([max_indices // true.size(3), max_indices % true.size(3)], dim=2).float()
    keypoints = keypoints[0]
    true = keypoints.unsqueeze(0)

    loss = torch.mean((predicted - true)**2)

    return loss

You are explicitly detaching the ouput tensor in:

predicted = output.clone().detach().requires_grad_(True)

which will cut the computation graph.
Could you explain why you are doing this?

Thanks for replying. I tried something like this:

def forward(self, predicted, true):
    print(predicted.is_leaf)
    print(true.is_leaf)
    _, max_indices = torch.max(predicted.view(predicted.size(0), predicted.size(1), -1), dim=2)
    keypoints = torch.stack([max_indices // predicted.size(3), max_indices % predicted.size(3)], dim=2).float()
    keypoints = keypoints[0]
    predicted = keypoints.unsqueeze(0)


    _, max_indices = torch.max(true.view(true.size(0), true.size(1), -1), dim=2)
    keypoints = torch.stack([max_indices // true.size(3), max_indices % true.size(3)], dim=2).float()
    keypoints = keypoints[0]
    true = keypoints.unsqueeze(0)

    loss = torch.mean((predicted - true)**2)

    return loss

When i check to see if the variables are leafs, it says that the model output variable(predicted) is not a leaf. But when i use any other built in pyotrch loss function i dont have the same problem, even though it also says that the output is not a leaf.

def forward(self, predicted, true):

    loss = nn.MSELoss()
    loss = loss(predicted, true)

    return loss

This works but as said since i am comparing two heatmaps the loss function is really low after the first epoch(0.048).

predicted is not supposed to be a leaf tensor since it’s created as the model output, isn’t it?