Standard deviation of the gradients?


#1

The .backward() function returns the mean of the gradients with respect to an input batch. I was wondering if there is an efficient way for also getting the standard deviation of the gradients.


(Thomas V) #2

Hi.

As noo e else replied yet: no, that is not possible at the moment. If you need the standard deviation of a large set (ie maby minibatches), you could multiply the standard deviation of mini-batch-averages by batch_size**0.5 to estimate the per-sample standard deviation using the CLT scaling behaviour.
There is a way to get stdev for linear layers with a bit of hackery, but conv layers seem out of (my) reach at the moment.

Best regards

Thomas


#3

Thanks for the reply!
However it would be great to be able to calculate the exact std instead of an approximation.
Is there a way to apply a self defined function to each sample gradient of the minibatch? If yes, then I could calculate the exact std by performing 2 times a backward pass (e.g., 1st I calculate the mean and then the mean of their square).


(Thomas V) #4

You can use hooks. So for linear layers, you could do this:

For convolutions, you would need to instantiate an object of the ConvNdBackwardBackward class, which I don’t think is possible at the moment. (But might be in the future if standard deviations become more popular.)

Best regards

Thomas