Hi there. I’m trying several combinations of MSE in one experiment. I suspect that the formulation of my cost function may be giving me problems.

def n_MSE(outputs, indexes):
""""
outputs: list of outputs of the model (y_hat) the first is the target
indexes: list of tuples with the indexes of the outputs to compute mse
"""
loss = 0
for i, j in indexes:
loss += torch.mean((outputs[i] - outputs[j])**2)
return loss

There are two issues that can potentially be solved at once:
(1) I’m not sure autograd will like the inplace updates to loss
(2) The iterative updates to loss needlessly adds a bunch of nodes to the autograd graph which could increase memory usage and slow things down

It might be better to do everything at once without a for loop e.g.,

Wouldn’t your formulation be explicitly assigning specific errors to each output?
I think that way the backpropagation would not be so specific.
Anyway I’m not sure, if you could explain it I’d appreciate it.

Good catch! Yes, you are correct that there should be a sum to preserve the original formulation but I believe the same approach should still work e.g.,

import torch
import time
output_size = (4096, 32)
def n_MSE(outputs, indexes):
""""
outputs: list of outputs of the model (y_hat) the first is the target
indexes: list of tuples with the indexes of the outputs to compute mse
"""
loss = 0
for i, j in indexes:
loss += torch.mean((outputs[i] - outputs[j])**2)
return loss
def n_MSE_2(outputs, indexes):
# doing an "unzip" here, ideally the
lhs = outputs[indexes[:, 0]]
rhs = outputs[indexes[:, 1]]
return torch.sum(torch.mean((lhs - rhs)**2, axis=1))
outputs = torch.randn(output_size, requires_grad=True, device='cuda')
indexes = torch.randint(low=0, high=output_size[0], size=(output_size[0], 2), device='cuda')
outputs2 = outputs.detach().clone()
outputs2.requires_grad = True
n_MSE(outputs, indexes).backward()
n_MSE_2(outputs2, indexes).backward()
torch.testing.assert_close(outputs.grad, outputs2.grad)
torch.cuda.synchronize()
t1 = time.perf_counter()
n_MSE(outputs, indexes).backward()
torch.cuda.synchronize()
t2 = time.perf_counter()
n_MSE_2(outputs2, indexes).backward()
torch.cuda.synchronize()
t3 = time.perf_counter()
print(f"n_MSE took {t2 - t1} seconds, n_MSE_2 took {t3 - t2} seconds")

n_MSE took 1.624450863339007 seconds, n_MSE_2 took 0.0011639557778835297 seconds

I apologize, I was not very clear in my explanation.
The arguments of the function are:
outputs : is a list of coarced tensors which are outputs of the model,
indexes : is a list of tuples that says which output is compared with other.

for example if the tuple is (0,1) the loss function should make the mse of the tensor in the position o of outputs and the tensor in the pos 1.

After debugging the code, I found the error. It was not in the cost function, so the inplace operations or the for loop do not affect the autograd calculation.