# How to Implement partial derivatives of a function?

According to the documentation - “Backward - … should return as many Variables as there were inputs, with each of them containing the gradient w.r.t. its corresponding input.”

What if I have multiple outputs? Can I implement a different partial derivative for each output?

You can only use backward on one single scalar value at the time. Most of the time, this value represents a distance between your vector of outputs and the expected targets.

The thing is, you don’t want to compute the derivative of your outputs, but the derivative of an error function of your outputs.

Of course, you may have several different errors (this is the case of GAN), or maybe a vector of distances (I never saw it but why not?). In that case, the common way is to call `backward` and to make one step of gradient descent, for each scalar component of them, one at the time.

Thanks for the reply, though you misinterpreted my question.
I was referring to class torch.autograd.Function.backward(), that is - the function of a new class extending torch.autograd.Function (see the link from the original post).

Oh indeed, that’s different.

In that case, the example proposed in the doc works for a single Tensor (with several values), but may not work if you have multiple tensors with different dimensions.

But it’s a matter of how you design your function. If you want a function that outputs different things, you can create one function for each output, and one backward method for each of them. Then, you combine all the ‘sub-functions’ together in a `Module`:

``````class Function_1(Function):
def forward(self, input):
#...

#...

class Function_2(Function):
def forward(self, input):
#...

#...

class Module_12(nn.Module):
def __init__(self):
#...

def forward(self, input):
output1 = Function_1(input)
output2 = Function_2(input)
return output1, output2
``````

Yeah, that makes sense. Though in my case the two forwards are identical, this would require, I suppose, strange workarounds to avoid redundant computation. Was hoping for a more build-in solution.

By the way, the context is nearest-embed layer for the VQ-VAE model.
I actually need two identical outputs: one with the encoder input detached (stop-gradient) and one with the dictionary input detached.

Then, why don’t you stack your two outputs? Then you can treate them as a single tensor, and you just divide them inside the backward in order to treate them differently:

``````class MyFunction(Function):
def forward(self, input):
#...

Wow, I didn’t know that paper, the results with videos are amazing I’m looking forward to seeing your implementation working as well!
Yeah, very nice concept with intriguing results. Hoping my implementation will get close to that 