I have a function f = x+y-(x*y), and I want to calculate the derivative of f with respect to the product xy: df/d(xy). How can I do that using Pytorch?
I tried defining xy as a separate variable as a product of x and y, and then using the retain_grad() feature calculated the gradient, as shown below. The problem is that the new variable is taken to be an independent variable, but it’s not since it depends on the variables x and y.
x = Variable(torch.FloatTensor([0.0]),requires_grad=True)
y = Variable(torch.FloatTensor([0.0]),requires_grad=True)
xy = x*y
f = x+y-xy
print(xy.grad.data) # outputs a value of -1
The output is -1, but I know from how the function behaves that the gradient at x=0,y=0 should be positive. For the curious, f is the algebraic form of the logical OR function.
The result looks correct to me no? if you write b = x*y.
Then f = x+y - b and df/db = -1. So df/dxy = -1.
I don’t know how you print the function? But f(x, y) and f(x, y, b) are two completely different functions so you can extrapolate from knowing the first one how the second one will behave.
That is true if b is an independent variable, but we have b=g(x,y). So, effectively, we have f(x,y,g(x,y)). The question is how to compute df/dg. I guess it’s more of a math question than a Pytorch question. Yet, I wondered if it can answered by writing a piece of Pytorch code. Any ideas?
The problem I have with your question is about which definition of derivative your use.
You can only take derivatives wrt an argument of the function you consider. Not another function (if you have a functional that takes a function as argument, you can do it of course, but it’s not the case here).
If you want the derivative wrt the result of that function. As you mentioned in your question and want 5 as an answer, then that is what your first sample with
.retain_grad() gives you.
Hi @albanD, thank you for the reply. It does make sense to me to take the derivative of a function w.r.t another function even if we are not talking about functionals. After all, every variable is an identity function of itself: f(x,y) = x + y = I(x) + I(y), where I(.) is the identity function. So, df(x,y)/dx is just df(x,y)/dI(x). In the same vein, it does make sense to ask what df(x,y)/dg(x,y) is, where f(x,y) = x + y + xy = x + y + g(x,y), for example. The problem is that if the product xy changes, x and y could change individually or jointly, and there may be multiple ways by which this can happen for a given change in xy. This means that it’s not clear how to calculate df(x,y)/dg(x,y), since it’s not clear how to calculate dx/dg(x,y) and dy/dg(x,y) in the above example. It feels like a very simple problem and yet I’m at a loss.
The way I see it is the following:
I think there is a confusion here between a function and its value.
Derivatives have to be defined wrt an input of your function (you can check the definitions you linked).
So when you write,
df(x,y)/dg(x,y), the only way to make sense of this is to write
b = g(x,y). And
f(x, y, b). And your derivative is written as
df(x,y,b)/db to clearly see the fact that we derive wrt an input.
Given this new formulation, the derivatives that you want to compute is then clearer: When computing this derivative, you ignore any dependency of b in x or y.
Does this makes sense?
Hi again. Yes, I do fully understand why the output was -1 in my case. Clearly, that’s coming from the assumption that the variable b is an independent variable. But is it? Clearly, it’s not and that’s my principal concern. Isn’t it wrong to assume that by simply replacing the product xy with b in the equation f = x + y + xy, b becomes independent of x and y?
One can be more careful, and say that f = b/y + b/x + b, so df/db = 1/y + 1/x + 1. Still, by implying that d(b/y)/db = 1/y, we would have assumed that y is a constant – why should that be true?
This is another completely different function indeed.
But the fact that you have multiple possible interpretation only comes from the fact that your notation
df(x,y)/dg(x,y) is not a valid mathematical notation. You need to interpret it. And whichever way you interpret it, you will get a different function and so potentially different derivatives.