Hello. I want to write a custom optimizer, which requires input activations and output error (regarding loss function) for each parameter.

I’ll try to explain, which “output error” I refer to. Consider linear regression without bias.

`xW = y`

and MSE loss function

`loss = (xW - y).square().mean() / 2`

Gradients are computed like

`dW = (xW - y) * x = error * x`

So autograd can compute `dW`

, but I also want `error`

. For this simple case and MSE loss function I can compute it manually alongside autograd, but for multilayer network this becomes complicated. Can torch autograd preserve this error matrix during backward pass for each parameter?

(another question is for input activations, `x`

, is it possible to access those for each parameter using torch builtin features?)