I have a problem and I feel like the way I’m currently solving it is extremely inefficient so I’m hoping that it might be possible to speed this up.

The simplified set up is the following: I have some set of N parameters {x_i} which all go in to some complicated function. From the output of this function, I produce N different losses L_i (each of which can in principle depend on all of the initial parameters), but I want to optimise each parameter x_i with respect to its corresponding loss L_i only. What is the best way to do this?

Currently I am doing:
for i=1:N:
optimiser[i].zero_grad()
loss[i].backward(retain_graph=True)
optimiser[i].step()

but this is extremely slow. Is there a better way of doing this?

Unfortunately, if you only want the ith loss to effect the ith set of parameters, you cannot really do anything else as the backward of a single loss will “polute” the gradient of all the parameters.

It seems like I should be able to make all of the losses into a tensor and then take the gradient with respect to all parameters (essentially a Jacobian), and then just choose the relevant gradients. But it doesn’t seem that PyTorch has a good way of computing Jacobians from what I’ve read so far.

Unfortunately, the only thing autograd can compute is jacobian vector products. So if you want to reconstruct a full jacobian, you have to do backward with tensors [1, 0, 0] then [0, 1, 0] and [0, 0, 1].
You can look in the forum actually, there are a few posts discussing how to compute full Jacobian that might help you speed things up a bit