Is manually manipulating gradient of a tensor bad idea?

sodlqnf123 · March 30, 2023, 10:28am

This is a figure from Nerf in the wild. (You may not need to know what it is exactly)
In brief, “Static” RGB and “Transient” RGB value will be compared to the ground truth value and loss for each is calculated.
What I want to do is, for instance, transient embedding is updated only based on loss from static value, not the transient one. So, I write the code like below.

static_loss.backward(retain_graph=True)
static_grad = transient_embedding.grad
transient_loss.backward()
transient_embedding.grad = static_grad
optimizer.step()

If the code serves right, all the other parameters must be updated based on both losses, and transient embedding is updated based only on static loss. The result is not quite satisfactory, but I want to know whether the bad result comes out because the approach above is not a good idea in itself (or just wrong)

soulitzer · March 30, 2023, 1:19pm

Your transient embedding is also being updated on both losses. You should update your code to:

static_loss.backward(retain_graph=True)
static_grad = transient_embedding.grad.clone()   # <---- added a clone
transient_loss.backward()
transient_embedding.grad = static_grad
optimizer.step()

The clone is necessary because otherwise static_grad still points to the same grad tensor and transient loss would add the new gradients to that tensor in-place.