Manually/explicitly calculate gradients of Conv kernels

KFrank · April 30, 2022, 9:29pm

Hi Bin!

To get the gradient of the result of one function applied to the result of
another function, that is, of the composition of two functions, you would
use the chain rule. This is how autograd computes the gradient when
many functions are composed together, such as the successive layers
in a network.

It is true that floating-point round-off error can accumulate during
backpropagation (as it can during the forward pass, as well). Underflow
and overflow "errors’ can occur as well. Nonetheless, autograd does
an altogether solid job of performing these numerical computations.
It is unlikely that you will be able to do better calculating your own
gradients, or, in effect, writing your own version of autograd.

If you really are having problems with numerical stability during
backpropagation, you would be better off identifying the root cause
and addressing it directly, presumably by using more numerically
stable functions or implementations in your forward pass.

Best.

K. Frank