Its not well defined formally I agree (cant take a limit of the original function over values it is not defined at) however the function does not depend on x[0] thus one handwavy guess to its behaviour would be that it has 0 derivative when differentiated on a variable it does not depend on that is (d log(x) / dy |_{x=-1} = 0), nonetheless the correct thing here I would think is returning either an error or a value that clearly indicates something is wrong such as “nan”
We just use 1/x
for the gradient of log. So when x is 0, it get to inf/nan. But when it is negative, you just get other wrong values.
This is my currnet confusion, its not like you are returning “random” wrong values, it is returning the gradient as though the function where defined for the provided out of range values (see the example above and the OPs) and this is problematic from the users viewpoint as one can think that this is the “correct gradient” when in fact the notion of gradient here is not well defined. What would be the issue of returning nan ? as oposed to the currently value that is rather decieving ?
We just use 1/x
for the gradient of log. So when x is 0, it get to inf/nan.
I think its possible you might have missed a detail in the the OPs example. y = [log(x1) , log(x2)] the formal derivative of log(x2) wrt to x1 (which is what the OP computes) is not 1/x , it is 0. So I dont think this comment is particularly relevant. No disagreement with what the derivative of log is of course … without thinking about reverse mode diff you would expect the first snippet to return 0 not nan as the derivative is not 1/x it is 0.
Where it becomes confusing is that the gradient becomes 0 (gives the value you would expect) when you pass log a negative number, this feels like completely inconsistent / unexpectd behaviour.
I understand the point being made in order to be eficient pytorch decides to compute some derivative (which is wrong) in these particular scenarios such that compute is optimised but I think our point is that these particular scenarios give quite deceitful answers.
Also note the OP is asking for a way to make the gradients of the first snippet 0 where x is not negative, the response you have provided is focusing on the second snippet where the gradients are already 0 which is what the OP wants, so I think its possible this point was missed in both responses and its still something we are a bit confused with. A user could argue that these wrong answers seem correct mathematically since d log(x) / dx = 1/x as a function can be evaluated for negative x and this is exactly what torch is doing when provided with negative values:
d log(x) / dx |{x=-1,y=1} = -1 and d log(x) / dy |{x=-1,y=1} = 0 (this is what pytorch is outputing)
meanwhile in the first snippet (when x=0):
d log(x) / dy |_{x=0, y=1} = nan (when it should be 0 ?)
In shorter terms I would expect the jacobian of log([x, y]) at x=0, y=1 to be diagonal however that does not seem to be the case, meanwhile the jacobian at x=-1 and y=1 is in fact diagonal. I dont understand how this inconsistency can be expected behaviour ?
Why is this the case ? as mentioned earlier we expect this to be 0. I am not sure the question is being address I think some details on the OPs example have been lost in the responses.