Gradient statistics

How can I simply get gradient statistics (for example minibatch variance) during training to monitor it?

For intra-batch “per sample contributions”, no, there isn’t a way in general, though there are some tricks you could try.
For inter-batch statistics, you can do that similar to what the Adam optimizer (and other similar optimizers like LAMB) do, which essentially is one of the Welford-style online algorithms for the variance with more or less sophistication around subtracting the mean.

Best regards

Thomas