I am using two version of my loss function
In version 1, I am using torch.mean(torch.square(Tensor))
to calculate the loss
In version 2, I am using Tensor.pow_(2).sum()/D
to clculate the loss
But the behavior of the loss in the two versions are completely different. The version 2 works fine, but using version 1 I get horrible results.
Both the experiments have the exactly same hyper-parameter configuration.
Does not version 1 and 2, represent the same thing?