Questions about torchmetrics MSE

Dear all,

good day.
I computed the MSE using torchmetrics after the test.
I found that the value is strange after I descale.

MSE value before descale is 0.0018.

MSE after descale is 0.7052.

May I know the correct implementation for the MSE?
Shall I compute it before descale since the train, val and test all done in scaling?

I suspect that the MSE is amplify after descale.

Please give me some feedback or comments if anyone know about this.

Thank you very much

Could you describe your use case a bit more and what “descale” refers to, please?

Hi @ptrblck,

I use minmax scaler from sklearn.
scaler = MinMaxScaler(feature_range=(-1,1))

scaler = scaler.fit(features_df)

dummy_data = scaler.data_max_ + 5, scaler.data_min_ - 5

scaler = scaler.fit(dummy_data)

scaler.transform(train_df)

After trained and test,

descaler =MinMaxScaler()

descaler.min_, descaler.scale_ = scaler.min_[2], scaler.scale_[2]

def descale(descaler, values):

values_2d = np.array(values)

return descaler.inverse_transform(values_2d)

predictions_descaled = descale(descaler, sq)

speed_labels_descaled = descale(descaler, speed_labels)

The high MSE loss happened after I use predictions_descaled and speed_labels_descaled

Thanks

The change in the loss value would be expected, since you are comparing the mse loss for tensors on a different scale.
Here is a very simple example:

a = torch.randn(10)
b = torch.randn(10)
mse_loss = ((a - b)**2).mean()
> tensor(1.9161)

a_scaled = a * 1000.
b_scaled = b * 1000.
mse_loss_scaled = ((a_scaled - b_scaled)**2).mean()
> tensor(1916108.7500)

As you can see, the loss difference is created by the multiplication with 1000. for a and b.
This doesn’t mean that one error is “better” or “worse” than the other and depends how you are interpreting the output values.

E.g. scaling of target and model prediction values is sometimes done in regression tasks for e.g. keypoint detection. During training it might be beneficial for the model to predict the keypoint coordinates in the range [0, 1] while the original coordinate space could have been [0, 224].
While calculating the MSELoss for these scaled outputs and targets allows the model to train properly, its meaning might be hard to interpret since you won’t be able to easily map it to a location error in the “pixel space”. Unscaling the predictions and comparing them to the targets in the original range would allow you to use the MSELoss in this pixel space.

hi @ptrblck ,

Thanks for the reply.
I am not sure about the last sentence from your reply.
’ Unscaling the predictions and comparing them to the targets in the original range would allow you to use the MSELoss in this pixel space.’

Is this means unscale the predctions and labels in training process just before the computation of MSELoss?
I understand and see many practice that unscale is done only after the testing.
I saw some journal papers often publish the value of MSELoss, so I wonder the MSELoss is before unscale or after unscale.
That is the part I confuse.

Thanks

The last sentence explained how the MSELoss can be interpreted.
I.e. I would claim you are free to scale the loss during training to make the model converge, but the loss value itself might not be meaningful. E.g. considering the previous example of the keypoint detection I would like to know how many pixels are my predicted keypoints away from the target keypoints. To do so, I would need to get predictions in the “pixel space”, i.e. the predictions should have values in the range [0, image_height] and [0, image_width] (same as the target). If I’ve scaled the predictions during training to e.g. a range of [0, 1] (for better convergence), I would need to unscale them in order to interpret the loss in my target domain (i.e. the pixel space or image coordinates).

I don’t know, but would assume the authors explain it in their paper.

1 Like