RMSE loss function

Hi all,

I would like to use the RMSE loss instead of MSE. From what I saw in pytorch documentation, there is no build-in function. Any ideas how this could be implemented?

2 Likes

Wouldn’t it work, if you just call torch.sqrt() in nn.MSELoss?

x = Variable(torch.randn(5, 10), requires_grad=True)
y = Variable(torch.randn(5, 10))

criterion = nn.MSELoss()
loss = torch.sqrt(criterion(x, y))
loss.backward()
print(x.grad)
18 Likes

The solution of @ptrblck is the best I think (because the simplest one).
For the fun, you can also do the following ones:

# create a function (this my favorite choice)
def RMSELoss(yhat,y):
    return torch.sqrt(torch.mean((yhat-y)**2))

criterion = RMSELoss
loss = criterion(yhat,y)
# create a nn class (just-for-fun choice :-) 
class RMSELoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.mse = nn.MSELoss()
        
    def forward(self,yhat,y):
        return torch.sqrt(self.mse(yhat,y))

criterion = RMSELoss()
loss = criterion(yhat,y)
8 Likes

You should be careful with NaN which will appear if the mse=0. Something like this would probably be better :

class RMSELoss(nn.Module):
    def __init__(self, eps=1e-6):
        super().__init__()
        self.mse = nn.MSELoss()
        self.eps = eps
        
    def forward(self,yhat,y):
        loss = torch.sqrt(self.mse(yhat,y) + self.eps)
        return loss
13 Likes

sqrt of 0 is 0, not nan

>>> torch.sqrt(torch.zeros(1))
tensor([0.])
3 Likes

Of course, the issue is during the backward pass as you multiply 0 by infinity (derivative of sqrt at 0).

>>> mse = nn.MSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = torch.sqrt(mse(yhat,y))
>>> loss.backward()
>>> yhat.grad
tensor([nan])

Using the simple module I wrote above

>>> rmse = RMSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = rmse(yhat,y)
>>> loss.backward()
>>> yhat.grad
tensor([0.])
15 Likes

Hi, I wonder if that’s exactly the same as RMSE when dealing with batch size more than 1 tensor.
i.e. target and prediction are [2,0,256,256] tensor
MSE_0 = MSE(prediction[0,:,:,:], target[0,:,:,:])
MSE_1 = MSE(prediction[1,:,:,:], target[2,:,:,:])

RMSE what we want is:
SQRT( MSE_0) + SQRT( MSE_1)
torch.sqrt(nn.MSELoss(x,y)) will give:
SQRT( MSE_0 + MSE_1)
so:
sqrt(M1+M2) is not equals to sqrt(M1) + sqrt(M2)

with reduction is even off, we wanna
Mean[ Mean (sqrt (MSE_0) ) + Mean(sqrt (MSE_1) ) ]
what will get with reduction = ‘mean’ instead, I think is:
sqrt (Mean(MSE_0) + Mean(MSE_1) )
so:
[sqrt(M1) / N + sqrt(M2)/N] /2 is not equals to sqrt (M1/N + M2/N)

please correct me if my understanding is wrong. Thanks :wink:

2 Likes

Try to add eps, such as eps = 1e-8, according to your precision.,

1 Like