Gradient statistics

agadetsky · July 23, 2019, 3:39pm

How can I simply get gradient statistics (for example minibatch variance) during training to monitor it?

tom · July 23, 2019, 10:28pm

For intra-batch “per sample contributions”, no, there isn’t a way in general, though there are some tricks you could try.
For inter-batch statistics, you can do that similar to what the Adam optimizer (and other similar optimizers like LAMB) do, which essentially is one of the Welford-style online algorithms for the variance with more or less sophistication around subtracting the mean.

Best regards

Thomas