Thanks for taking the time to help me out. The heatmaps are just the Gaussian distributions centred around a particular key point in the case of the target. In the paper, it should be the compositions of spatial distributions as you mentioned. What I think is that when I use torch.nn.Softmax(dim=1)(decoder)
the model is able to learn some features about the heatmaps but with the loss stuck around the same point
ref:Compute mse_loss() with softmax() - #8 by Mukesh1729
See you later then