Why all my parameters are nan after using torch.nanmean in my loss function

Background:
I had a model that always returned nan, and I found a problem with my loss function. So I reproduced the problem using a very simple linear function.

Data:
My x data are some weather data like temperature, precipitation, etc. (Each column represents a different type of data). They do not contain any nan. My y data are my soil moisture, and they have some nan.
train_x size is (75, 3). train_y size is (75, 1)

Problem:
My loss function is RMSE. Because my target contains nan, I can’t use the torch.MSELoss function directly because it returns nan directly. I implemented the RMSE in two ways. The first is to remove all the nan data using the mask and then calculate the RMSE. The second is to calculate The RMSE directly using torch.nanmean. Before applying them to the loss function, I tested them by generating data using torch.rand, and they were able to calculate the same values.
The first method can train the model well, and the second method returns all nan. Can anyone tell me why? I want to use the second method. What should I do?

Code

import torch
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

# generate data
x, y, coef = datasets.make_regression(n_samples=100, n_features=3, n_targets=1, n_informative=2, bias=3.5, noise=10,
                                      coef=True, random_state=42)

train_x, val_x, train_y, val_y = train_test_split(x, y)
train_y[np.random.randint(0, train_y.shape[0], 5).tolist(), ] = np.nan

# using nn.Linear
class LinearRegression(torch.nn.Module):
    def __init__(self, input_dim, outp_dim):
        super(LinearRegression, self).__init__()
        self.linear = torch.nn.Linear(input_dim, outp_dim)

    def forward(self, x):
        out = self.linear(x)
        return out


# Loss function: method 1
class RMSELoss(torch.nn.Module):

    __name__ = "RMSELoss"
    def __init__(self):
        super(RMSELoss, self).__init__()

    def forward(self, pred, obs):
        mask1 = torch.logical_not(torch.isnan(pred))
        mask2 = torch.logical_not(torch.isnan(obs))
        mask = torch.logical_and(mask1, mask2)
        pred = pred[mask]
        obs = obs[mask]
        loss = torch.sqrt(((pred - obs) ** 2).mean())
        return loss

# Loss function: method 2
class RMSELoss_nanmean(torch.nn.Module):

    __name__ = "RMSELoss_nanmean"
    def __init__(self):
        super(RMSELoss_nanmean, self).__init__()

    def forward(self, pred, obs):
        loss = torch.sqrt(torch.nanmean((pred - obs) ** 2))
        return loss

# test two loss functions
test_loss_x = torch.rand(3,4)
test_loss_y = torch.rand(3,4)
test_loss_y[0,0] = torch.nan
print("RMSE", RMSELoss()(test_loss_x, test_loss_y))
print("RMSE_nanmean", RMSELoss_nanmean()(test_loss_x, test_loss_y))

inp_dim = 3
opt_dim = 1
LR = 0.01
EPOCHS = 100

model = LinearRegression(inp_dim, opt_dim)
loss_fn = RMSELoss_nanmean()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

train_x = torch.from_numpy(train_x).to(torch.float32)
train_y = torch.from_numpy(train_y).to(torch.float32)
val_x = torch.from_numpy(val_x).to(torch.float32)
val_y = torch.from_numpy(val_y).to(torch.float32)

if len(train_y.shape) == 1: train_y = train_y[:,None]


for epoch in range(EPOCHS):
    optimizer.zero_grad()
    pred = model(train_x)

    loss = loss_fn(pred, train_y)
    loss.backward()
    optimizer.step()
    print(loss)

Method1 results:

RMSE tensor(0.4241)
RMSE_nanmean tensor(0.4241)
tensor(28.7153, grad_fn=<SqrtBackward0>)
tensor(28.7081, grad_fn=<SqrtBackward0>)
tensor(28.7009, grad_fn=<SqrtBackward0>)
...
...

Method 2 results:

RMSE tensor(0.3006)
RMSE_nanmean tensor(0.3006)
tensor(30.1395, grad_fn=<SqrtBackward0>)
tensor(nan, grad_fn=<SqrtBackward0>)
tensor(nan, grad_fn=<SqrtBackward0>)