Gradient of Standard Deviation is nan

marcel1991 · March 10, 2018, 11:50am

I encountered a strange behaviour that I don’t reall understand. When I want to normalize a Variable I run into a zero division error during the backward pass when the standard deviation is zero.

For example:

a = Variable(torch.ones(1), requires_grad=True)
b = (a - a.mean())/(a.std() + 1e-4)
b.backward()
print(a.grad) # gives nan

I tracked down this problem and it even occurs if I just divide by the std with adding of some high epsilon

a = Variable(torch.ones(1), requires_grad=True)
b = 1/(a.std() + 1)
b.backward()
print(a.grad) # gives nan

But the epsilon should actually prevent the division by zero of the derivative right?
So what is happening here? And how can I normalize without running into this error?

SimonW · March 10, 2018, 8:13pm

Well, sqrt(x)'s derivative at x=0 is undefined.

marcel1991 · March 12, 2018, 11:54am

Thank you, sure of course. I totally forgot about the sqrt(x)

unnir · January 31, 2019, 12:43pm

sorry, could you share a link where I can read about that. Since, I thought we can find a derivative of square root.

SimonW · January 31, 2019, 4:15pm

the left limit doesn’t exist…

marcel1991 · February 16, 2019, 2:40pm

The sqrt(x) is not defined for values smaller or equal zero (at least for real numbered values). Doing this operation with such values results in nan. Therefore the derivative does only exist for x > 0.

fermat97 · October 12, 2019, 1:21pm

Hi, how could you get rid of this issue? I need to use std, is there any solution to avoid nan in the backward pass?