Backprop via layers that did not compute forward pass

By default Pytorch uses autograd to update layer weights based on their contribution to the loss, meaning that layers weights which did not contribute to computing the loss do not have any gradients flowing through them.

For a rather unusual research project I will need to backprop gradients through a network that didn’t compute the output. Essentially I want to define a custom, architecture-specific backward pass. My question is - can I do this in Pytorch? If not then what library would you recommend?

I know it sounds very unintuitive, so here’s an expanded explanation in case it’s unclear:

I have a process P that generates a tensor Z.
I have a network N that maps S to X. The dimensions of X and Z are identical.
I have a ground-truth label Y (same dimensions as X and Z) and I pass it to the loss function together with Z to compute the loss.
I wish to backpropagate the loss through network N and update its weights as if it computed Z.


I’m a bit confused in the sense that the gradient flowing back depend on the value that was forwarded. So if the last value is X or Z here, the gradient will be different. Is that fine?

If it is, then you can simply compute the gradient s for X first and then backprop them through Z with Z.backward(grad_X).