No gradient calculated for custom loss function

Yihong_Li · December 16, 2022, 9:08am

Hi, I designed my custom loss function all using pytorch operations, but I cannot get the gradient when I call the backward( ) function. Can anybody help me to fix this? Many thanks.

The gradient is zero when I used my custome loss function (log_rmse). If I switch to the standard nn.MSELoss function, I can do backprop. But I can’t figure out what’s wrong with my custom loss function.

ptrblck · December 16, 2022, 9:10am

Could you post the custom loss function by wrapping it into three backticks ```, please, as it would make debugging easier?

Yihong_Li · December 16, 2022, 5:20pm

Okay, here is my loss function:

def log_rmse(y, y_hat):
    clipped_preds = torch.clamp(y_hat, 1, float('inf'))
    log_y_hat = torch.sum((torch.log(clipped_preds) - torch.log(y)) ** 2)
    rmse = torch.sqrt(log_y_hat / y.shape[0])
    return rmse

def train_network(net, train_data, train_label, val_data, val_label, iteration, lr, batch_size):
    train_ls, val_ls = [], []
    optimizer = torch.optim.Adam(net.parameters(), lr=lr)
    dataset = data.TensorDataset(train_data, train_label)
    mse = nn.MSELoss()
    for i in range(iteration):
        data_iter = data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
        for X, y in data_iter:
            optimizer.zero_grad()
            loss = log_rmse(y, net(X))
            loss.backward()
            optimizer.step()
        epoch_loss = log_rmse(train_label, net(train_data))
        train_ls.append(epoch_loss)
        if val_data is not None and val_label is not None:
            val_ls.append(log_rmse(val_label, net(val_data)))
    return train_ls, val_ls

ptrblck · December 17, 2022, 6:45am

Thanks for the update!
Your method works for me and calculates the gradients for the outputs which are not clipped as expected:

model = nn.Linear(10, 10)
data = torch.randn(1, 10)
target = torch.rand(1, 10)

output = model(data)
print(output)
# tensor([[ 0.9157, -0.3162, -0.6357,  0.5272, -0.9480,  1.1473,  0.3734,  0.1775,
#           0.4148, -0.9100]], grad_fn=<AddmmBackward0>)
loss = log_rmse(target, output)
print(loss)
# tensor(2.3577, grad_fn=<SqrtBackward0>)

loss.backward()
print(model.weight.grad)
# tensor([[-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.2285,  0.1268, -0.1046,  0.0679,  0.2792,  0.0948, -0.3489,  0.1557,
#          -0.0147,  0.0082],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000],
#         [-0.0000,  0.0000, -0.0000,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000,
#          -0.0000,  0.0000]])

Yihong_Li · December 17, 2022, 8:31am

Thank you so much for the feedback. I think I got zero gradient because when I called the torch.clamp() function during forward prop, because my initial weight is very small, most values are clamped to one and lost gradient. When I initialized larger weight in my neural network, the back prop could work.