Is there a way to make Pytorch return zero instead of None

(Alire Sa Ranjbar) #1
def f (x, u):
    return x.pow(2)+u.pow(2)
y = f(x,u)
yx = grad(y, x, create_graph=True)[0]
print(yx)
yxu = grad(yx,u,allow_unused=True)[0]
print(yxu)

output:
        tensor([4.], grad_fn=<ThMulBackward>)
        None

Mathematically the above second derivative should be zero, right?

(Alban D) #2

A convention we use in pytorch to reduce memory usage and improve speed during backward is that None is equal to a Tensor full of zeros.
That allows many optimizations during the backward pass.
Is it a problem in your usecase?

1 Like
(Alire Sa Ranjbar) #3

There are I guess other reasons that a None is returned for such gradient, for example, because of a mistake in the code. But, apparently if the function’s gradient is independent of the
target variable, a None is also returned (which should actually be a zero mathematically).

So the main problem is that one will not be sure if the None is because of a bug in the code or because of the independence of the function with respect to the variable.

(Alban D) #4

I’m not sure what you mean by “a bug in the code”?

(Alire Sa Ranjbar) #5

I mean for example I have a mistake in the code that I have written and although my function should return a valid derivative, it returns a None. But maybe it’s not my mistake in my code and is rather the fact that my function is not dependent on the variable I am calculating the derivative for.

So when I get a None, I wouldn’t know if the reason is because of a mistake in my code or the fact that my function is independent of the variable.

(Alban D) #6

But from pytorch’s point of view, the two are exactly the same :smiley:
If there is not link between two variables because of an error or because of the function you implement are indistinguishable.

(Alire Sa Ranjbar) #7

From mathematical point of view, if the first derivative of a function is a constant number, the second derivative is zero, not None.

(Alban D) #8

But if you make a mistake and implement the wrong function and ask for d f(x) / d y (use the wrong variable in your function), then mathematically, the derivative is also 0.
I’m not sure to see where the difference is?