Standard deviation of the gradients?

swiedema · October 6, 2017, 8:47am

The .backward() function returns the mean of the gradients with respect to an input batch. I was wondering if there is an efficient way for also getting the standard deviation of the gradients.

tom · October 12, 2017, 5:54pm

Hi.

As noo e else replied yet: no, that is not possible at the moment. If you need the standard deviation of a large set (ie maby minibatches), you could multiply the standard deviation of mini-batch-averages by batch_size**0.5 to estimate the per-sample standard deviation using the CLT scaling behaviour.
There is a way to get stdev for linear layers with a bit of hackery, but conv layers seem out of (my) reach at the moment.

Best regards

Thomas

swiedema · October 12, 2017, 6:25pm

Thanks for the reply!
However it would be great to be able to calculate the exact std instead of an approximation.
Is there a way to apply a self defined function to each sample gradient of the minibatch? If yes, then I could calculate the exact std by performing 2 times a backward pass (e.g., 1st I calculate the mean and then the mean of their square).

tom · October 13, 2017, 7:46am

You can use hooks. So for linear layers, you could do this:

gist.github.com

https://gist.github.com/t-vi/f3437d31b3e4680cc78d9999ea5a8af6#file-variance_of_grad-py

variance_of_grad.py

import torch
from torch.autograd import Variable

def linear_with_sumsq(inp, weight, bias=None):
    def provide_sumsq(inp,w,b):
        def _h(i):
            if not hasattr(w, 'grad_sumsq'):
                w.grad_sumsq = 0
            w.grad_sumsq += ((i**2).t().matmul(inp**2))*i.size(0)
            if b is not None:

This file has been truncated. show original

For convolutions, you would need to instantiate an object of the ConvNdBackwardBackward class, which I don’t think is possible at the moment. (But might be in the future if standard deviations become more popular.)

Best regards

Thomas