Omit loss.backward for forward-only algos?

qqqllppp · April 16, 2024, 6:09am

In modern deep learning, using an algo with backprop is the mainstream way of getting started. In pytorch, the update of params is done by calling loss.backward(). Currently, I am exploring “forward-only” algos, and as the name implies, forward-only algo does not use backprop to update the weights.

A few questions regarding this:

In pytorch, is there an alternative way to update the params that does not use loss.backward()?
If I do not plan to update my weights via loss.backward(), is it safe to remove that line altogether?

My concern with (2) is that even Hinton’s “forward-forward” pytorch implementation (one of many forward-only algos) uses loss.backward(), which confuses me a bit and makes me think the backward function does more than just updating params? (I understand the use of detach, but if we detach all params, then is it the same as not calling loss.backward() at all?)

soulitzer · April 17, 2024, 2:34pm

loss.backward() is actually only responsible for populating the .grad fields of the parameters. The optimizer step is what is responsible for updating the parameters using the .grad field

So, re: (2), if you are not performing back-prop it is fine to not use .backward(), but if you still want to use existing optimizers, you’re now responsible for setting the .grad fields

qqqllppp · April 18, 2024, 3:34pm

That makes sense! Thank you