Hi all,
I would like to use the RMSE loss instead of MSE. From what I saw in pytorch documentation, there is no build-in function. Any ideas how this could be implemented?
Hi all,
I would like to use the RMSE loss instead of MSE. From what I saw in pytorch documentation, there is no build-in function. Any ideas how this could be implemented?
Wouldn’t it work, if you just call torch.sqrt()
in nn.MSELoss
?
x = Variable(torch.randn(5, 10), requires_grad=True)
y = Variable(torch.randn(5, 10))
criterion = nn.MSELoss()
loss = torch.sqrt(criterion(x, y))
loss.backward()
print(x.grad)
The solution of @ptrblck is the best I think (because the simplest one).
For the fun, you can also do the following ones:
# create a function (this my favorite choice)
def RMSELoss(yhat,y):
return torch.sqrt(torch.mean((yhat-y)**2))
criterion = RMSELoss
loss = criterion(yhat,y)
# create a nn class (just-for-fun choice :-)
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.mse = nn.MSELoss()
def forward(self,yhat,y):
return torch.sqrt(self.mse(yhat,y))
criterion = RMSELoss()
loss = criterion(yhat,y)
You should be careful with NaN
which will appear if the mse=0
. Something like this would probably be better :
class RMSELoss(nn.Module):
def __init__(self, eps=1e-6):
super().__init__()
self.mse = nn.MSELoss()
self.eps = eps
def forward(self,yhat,y):
loss = torch.sqrt(self.mse(yhat,y) + self.eps)
return loss
sqrt of 0 is 0, not nan
>>> torch.sqrt(torch.zeros(1))
tensor([0.])
Of course, the issue is during the backward pass as you multiply 0 by infinity (derivative of sqrt at 0).
>>> mse = nn.MSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = torch.sqrt(mse(yhat,y))
>>> loss.backward()
>>> yhat.grad
tensor([nan])
Using the simple module I wrote above
>>> rmse = RMSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = rmse(yhat,y)
>>> loss.backward()
>>> yhat.grad
tensor([0.])
Hi, I wonder if that’s exactly the same as RMSE when dealing with batch size more than 1 tensor.
i.e. target and prediction are [2,0,256,256] tensor
MSE_0 = MSE(prediction[0,:,:,:], target[0,:,:,:])
MSE_1 = MSE(prediction[1,:,:,:], target[2,:,:,:])
RMSE what we want is:
SQRT( MSE_0) + SQRT( MSE_1)
torch.sqrt(nn.MSELoss(x,y)) will give:
SQRT( MSE_0 + MSE_1)
so:
sqrt(M1+M2) is not equals to sqrt(M1) + sqrt(M2)
with reduction is even off, we wanna
Mean[ Mean (sqrt (MSE_0) ) + Mean(sqrt (MSE_1) ) ]
what will get with reduction = ‘mean’ instead, I think is:
sqrt (Mean(MSE_0) + Mean(MSE_1) )
so:
[sqrt(M1) / N + sqrt(M2)/N] /2 is not equals to sqrt (M1/N + M2/N)
please correct me if my understanding is wrong. Thanks
Try to add eps, such as eps = 1e-8, according to your precision.,
This implementation is according to the definition of RMSE Error
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.eps=1e-6
def forward(self,ground_truth,prediction):
loss=torch.mean(torch.sqrt(torch.sum(torch.square(ground_truth-prediction),axis=-1))) + self.eps
return loss
So in summary one one should implement ptrblck’s implementation by simply taking a square root over the mean square error. However, you need to be aware of Nan resulting in the backward pass.
Hence incredibly important to follow YannDubs1’s advice and add a very small non-zero number like 1e-6 or even smaller like 1e-8. This does not effect neuron’s weights as this small number has vanishing nature aka neurons just ignore it.
So the simplest solution is to add following code to ptrblck’s implementation solution:
criterion = nn.MSELoss()
loss = torch.sqrt(criterion(x, y) + eps) # added eps to be 1e-6
Correct this Noob if wrong!