Hi all,

I would like to use the RMSE loss instead of MSE. From what I saw in pytorch documentation, there is no build-in function. Any ideas how this could be implemented?

Hi all,

I would like to use the RMSE loss instead of MSE. From what I saw in pytorch documentation, there is no build-in function. Any ideas how this could be implemented?

1 Like

Wouldn’t it work, if you just call `torch.sqrt()`

in `nn.MSELoss`

?

```
x = Variable(torch.randn(5, 10), requires_grad=True)
y = Variable(torch.randn(5, 10))
criterion = nn.MSELoss()
loss = torch.sqrt(criterion(x, y))
loss.backward()
print(x.grad)
```

15 Likes

The solution of @ptrblck is the best I think (because the simplest one).

For the fun, you can also do the following ones:

```
# create a function (this my favorite choice)
def RMSELoss(yhat,y):
return torch.sqrt(torch.mean((yhat-y)**2))
criterion = RMSELoss
loss = criterion(yhat,y)
```

```
# create a nn class (just-for-fun choice :-)
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.mse = nn.MSELoss()
def forward(self,yhat,y):
return torch.sqrt(self.mse(yhat,y))
criterion = RMSELoss()
loss = criterion(yhat,y)
```

7 Likes

You should be careful with `NaN`

which will appear if the `mse=0`

. Something like this would probably be better :

```
class RMSELoss(nn.Module):
def __init__(self, eps=1e-6):
super().__init__()
self.mse = nn.MSELoss()
self.eps = eps
def forward(self,yhat,y):
loss = torch.sqrt(self.mse(yhat,y) + self.eps)
return loss
```

10 Likes

sqrt of 0 is 0, not nan

```
>>> torch.sqrt(torch.zeros(1))
tensor([0.])
```

3 Likes

Of course, the issue is during the backward pass as you multiply 0 by infinity (derivative of sqrt at 0).

```
>>> mse = nn.MSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = torch.sqrt(mse(yhat,y))
>>> loss.backward()
>>> yhat.grad
tensor([nan])
```

Using the simple module I wrote above

```
>>> rmse = RMSELoss()
>>> yhat = torch.zeros(1, requires_grad=True)
>>> y = torch.zeros(1)
>>> loss = rmse(yhat,y)
>>> loss.backward()
>>> yhat.grad
tensor([0.])
```

14 Likes

Hi, I wonder if that’s exactly the same as RMSE when dealing with batch size more than 1 tensor.

i.e. target and prediction are [2,0,256,256] tensor

MSE_0 = MSE(prediction[0,:,:,:], target[0,:,:,:])

MSE_1 = MSE(prediction[1,:,:,:], target[2,:,:,:])

RMSE what we want is:

SQRT( MSE_0) + SQRT( MSE_1)

torch.sqrt(nn.MSELoss(x,y)) will give:

SQRT( MSE_0 + MSE_1)

so:

sqrt(M1+M2) is not equals to sqrt(M1) + sqrt(M2)

with reduction is even off, we wanna

Mean[ Mean (sqrt (MSE_0) ) + Mean(sqrt (MSE_1) ) ]

what will get with reduction = ‘mean’ instead, I think is:

sqrt (Mean(MSE_0) + Mean(MSE_1) )

so:

[sqrt(M1) / N + sqrt(M2)/N] /2 is not equals to sqrt (M1/N + M2/N)

please correct me if my understanding is wrong. Thanks

1 Like

Try to add eps, such as eps = 1e-8, according to your precision.,