Standard deviation of the gradients?


The .backward() function returns the mean of the gradients with respect to an input batch. I was wondering if there is an efficient way for also getting the standard deviation of the gradients.

As noo e else replied yet: no, that is not possible at the moment. If you need the standard deviation of a large set (ie maby minibatches), you could multiply the standard deviation of mini-batch-averages by batch_size**0.5 to estimate the per-sample standard deviation using the CLT scaling behaviour.
There is a way to get stdev for linear layers with a bit of hackery, but conv layers seem out of (my) reach at the moment.

However it would be great to be able to calculate the exact std instead of an approximation.
Is there a way to apply a self defined function to each sample gradient of the minibatch? If yes, then I could calculate the exact std by performing 2 times a backward pass (e.g., 1st I calculate the mean and then the mean of their square).

You can use hooks. So for linear layers, you could do this:

For convolutions, you would need to instantiate an object of the ConvNdBackwardBackward class, which I don’t think is possible at the moment. (But might be in the future if standard deviations become more popular.)

