How to Implement partial derivatives of a function?

According to the documentation - “Backward - … should return as many Variables as there were inputs, with each of them containing the gradient w.r.t. its corresponding input.”

What if I have multiple outputs? Can I implement a different partial derivative for each output?

You can only use backward on one single scalar value at the time. Most of the time, this value represents a distance between your vector of outputs and the expected targets.

The thing is, you don’t want to compute the derivative of your outputs, but the derivative of an error function of your outputs.

Of course, you may have several different errors (this is the case of GAN), or maybe a vector of distances (I never saw it but why not?). In that case, the common way is to call backward and to make one step of gradient descent, for each scalar component of them, one at the time.

Thanks for the reply, though you misinterpreted my question.
I was referring to class torch.autograd.Function.backward(), that is - the function of a new class extending torch.autograd.Function (see the link from the original post).

Oh indeed, that’s different.

In that case, the example proposed in the doc works for a single Tensor (with several values), but may not work if you have multiple tensors with different dimensions.

But it’s a matter of how you design your function. If you want a function that outputs different things, you can create one function for each output, and one backward method for each of them. Then, you combine all the ‘sub-functions’ together in a Module:

class Function_1(Function):
    def forward(self, input):

    def backward(self, grad_output):

class Function_2(Function):
     def forward(self, input):

     def backward(self, grad_ouput):

class Module_12(nn.Module):
     def __init__(self):
     def forward(self, input):
         output1 = Function_1(input)
         output2 = Function_2(input)
         return output1, output2

Yeah, that makes sense. Though in my case the two forwards are identical, this would require, I suppose, strange workarounds to avoid redundant computation. Was hoping for a more build-in solution.

By the way, the context is nearest-embed layer for the VQ-VAE model.
I actually need two identical outputs: one with the encoder input detached (stop-gradient) and one with the dictionary input detached.

Then, why don’t you stack your two outputs? Then you can treate them as a single tensor, and you just divide them inside the backward in order to treate them differently:

class MyFunction(Function):
    def forward(self, input):
        return torch.stack([output1,output2], 0)

    def backward(self, grad_output):
        grad_output1, grad_output2 = torch.split(grad_output, 1, 0)

Wow, I didn’t know that paper, the results with videos are amazing :open_mouth: I’m looking forward to seeing your implementation working as well!

By the way, now I’m thinking about this idea of nearest embedded applied to reinforcement learning. If someday I have available time

The difference between the two outputs is the computational graph I want attached to them. I don’t think the concatenation achieves this effect.
Either way, if there isn’t a straightforward method of achieving this, I’ll keep the double forward pass for now…

Yeah, very nice concept with intriguing results. Hoping my implementation will get close to that :slight_smile:

1 Like