Can my custom loss function cause problems?

Hi there. I’m trying several combinations of MSE in one experiment. I suspect that the formulation of my cost function may be giving me problems.

def n_MSE(outputs, indexes):
    """"
    outputs: list of outputs of the model (y_hat) the first is the target
    indexes: list of tuples with the indexes of the outputs to compute mse
    """
    loss = 0
    for i, j in indexes:
        loss += torch.mean((outputs[i] - outputs[j])**2)
    
    return loss

Tanks for reading.

1 Like

There are two issues that can potentially be solved at once:
(1) I’m not sure autograd will like the inplace updates to loss
(2) The iterative updates to loss needlessly adds a bunch of nodes to the autograd graph which could increase memory usage and slow things down

It might be better to do everything at once without a for loop e.g.,

def n_MSE(output, indexes):
    # doing an "unzip" here, ideally the
    lhs = outputs[indexes[:, 0]]
    rhs = outputs[indexes[:, 1]]
    return torch.mean((lhs - rhs)**2)

note that the above assumes outputs and indexes are tensors not lists, so you might need to call torch.stack to convert to tensors first.

eqy, thanks for taking the time to respond.

Wouldn’t your formulation be explicitly assigning specific errors to each output?
I think that way the backpropagation would not be so specific.
Anyway I’m not sure, if you could explain it I’d appreciate it.

Good catch! Yes, you are correct that there should be a sum to preserve the original formulation but I believe the same approach should still work e.g.,

import torch
import time

output_size = (4096, 32)

def n_MSE(outputs, indexes):
    """"
    outputs: list of outputs of the model (y_hat) the first is the target
    indexes: list of tuples with the indexes of the outputs to compute mse
    """
    loss = 0
    for i, j in indexes:
        loss += torch.mean((outputs[i] - outputs[j])**2)
    return loss

def n_MSE_2(outputs, indexes):
    # doing an "unzip" here, ideally the
    lhs = outputs[indexes[:, 0]]
    rhs = outputs[indexes[:, 1]]
    return torch.sum(torch.mean((lhs - rhs)**2, axis=1))

outputs = torch.randn(output_size, requires_grad=True, device='cuda')
indexes = torch.randint(low=0, high=output_size[0], size=(output_size[0], 2), device='cuda')

outputs2 = outputs.detach().clone()
outputs2.requires_grad = True

n_MSE(outputs, indexes).backward()
n_MSE_2(outputs2, indexes).backward()
torch.testing.assert_close(outputs.grad, outputs2.grad)

torch.cuda.synchronize()
t1 = time.perf_counter()
n_MSE(outputs, indexes).backward()
torch.cuda.synchronize()
t2 = time.perf_counter()
n_MSE_2(outputs2, indexes).backward()
torch.cuda.synchronize()
t3 = time.perf_counter()
print(f"n_MSE took {t2 - t1} seconds, n_MSE_2 took {t3 - t2} seconds")
n_MSE took 1.624450863339007 seconds, n_MSE_2 took 0.0011639557778835297 seconds

eqy, tanks again for taking your time to respond.

I apologize, I was not very clear in my explanation.
The arguments of the function are:
outputs : is a list of coarced tensors which are outputs of the model,
indexes : is a list of tuples that says which output is compared with other.

for example if the tuple is (0,1) the loss function should make the mse of the tensor in the position o of outputs and the tensor in the pos 1.

thanks for reading

After debugging the code, I found the error. It was not in the cost function, so the inplace operations or the for loop do not affect the autograd calculation.