Calculating gradient second moment


I’m trying to implement a different kind of second moment estimation for Adam like optimizers. The basic idea is to use the fact that each pixel contributes a realization of the gradient (gW_i). Nominally we sum them all to get the gradient of W, gW. On these gradients we calculate first and second moments. I want to estimate the second moment by taking E(gW_i ** 2).

I have a few implementation problems:

  1. E(gW_i ** 2) can be calculated if in the backwards pass if I had g_input and layer inputs (x) using: gW = conv(x, g_input) and E(gW ** 2) = conv(x**2, g_input ** 2). (See How does Backpropagation work in a CNN? | Medium). I couldn’t find a hook that gives me both x and g_input.

  2. How can I expose the second moment to the optimizer ? It usually get just the gradient.

  3. The solution should work on a generic CNN without too much internal changes to the code.