A convention we use in pytorch to reduce memory usage and improve speed during backward is that None is equal to a Tensor full of zeros.
That allows many optimizations during the backward pass.
Is it a problem in your usecase?
There are I guess other reasons that a None is returned for such gradient, for example, because of a mistake in the code. But, apparently if the function’s gradient is independent of the
target variable, a None is also returned (which should actually be a zero mathematically).
So the main problem is that one will not be sure if the None is because of a bug in the code or because of the independence of the function with respect to the variable.
I mean for example I have a mistake in the code that I have written and although my function should return a valid derivative, it returns a None. But maybe it’s not my mistake in my code and is rather the fact that my function is not dependent on the variable I am calculating the derivative for.
So when I get a None, I wouldn’t know if the reason is because of a mistake in my code or the fact that my function is independent of the variable.
But from pytorch’s point of view, the two are exactly the same
If there is not link between two variables because of an error or because of the function you implement are indistinguishable.
But if you make a mistake and implement the wrong function and ask for d f(x) / d y (use the wrong variable in your function), then mathematically, the derivative is also 0.
I’m not sure to see where the difference is?
@albanD , I see your point and your concern about memory usage, etc.; but what about adding an optional named parameter default to “False”? e.g. “none_to_zero=False”. It would not hurt your framework, would it? By doing so, we could make good use of it within our own custom differentiation stuff.