Difference between torch.mean(torch.square(Tensor)) and Tensor.pow_(2).sum()/D

I am using two version of my loss function

In version 1, I am using torch.mean(torch.square(Tensor)) to calculate the loss

In version 2, I am using Tensor.pow_(2).sum()/D to clculate the loss

But the behavior of the loss in the two versions are completely different. The version 2 works fine, but using version 1 I get horrible results.

Both the experiments have the exactly same hyper-parameter configuration.

Does not version 1 and 2, represent the same thing?

Note that torch.mean() will divide the sum by the count of all the dimensions. i.e., if the tensor is 4D (B,D,H,W), the division factor is (B*D*H*W). If the tensor is 2D (B,D), the division factor is (B*D).
I am not sure about the tensor dimension in your case. Check it maybe.

Yes. That was the issue. Thanks for pointing it out.