Custom optimizer: how to access loss error after gradients are computed?

danbst · May 7, 2023, 3:15pm

Hello. I want to write a custom optimizer, which requires input activations and output error (regarding loss function) for each parameter.

I’ll try to explain, which “output error” I refer to. Consider linear regression without bias.
xW = y

and MSE loss function
loss = (xW - y).square().mean() / 2

Gradients are computed like
dW = (xW - y) * x = error * x

So autograd can compute dW, but I also want error. For this simple case and MSE loss function I can compute it manually alongside autograd, but for multilayer network this becomes complicated. Can torch autograd preserve this error matrix during backward pass for each parameter?

(another question is for input activations, x, is it possible to access those for each parameter using torch builtin features?)