Backward operation on an output tensor w.r.t batch input tensor

Hi, I need to calculate backward derivative of output tensor with respect to a batch of input tensor.

Here is the details:

Input shape is 64x1x28x28 (batch of mnist images) output shape is 64x1.Output is calculated based on some logic using the outputs of feedforward operation.

So actually, for each image of shape 1x1x28x28,I have a scalar value in output.

I can do it when using only one image tensor but batch operations are more efficient to handle and faster so I need to implement it.

The point is backward derivative of each element’s in the output should correspond to the related image tensor in the input tensor.

I know that backward operation is done based ona scalar value. So when I apply backward method on output tensor , I got error. And, I couldn’t handle this in Pytorch.

Can you please help me on this?


The backward operation accepts a gradient argument, so you could pass e.g. torch.ones_like(output) to it.
Would this help in your case?


I might not explain my problem clearly. So please let me state again:

You may assume that I take a batch of mnist image as input x (64x1x28x28) and again you may assume that my model outputs a tensor (Out) of shape 64 x 1 which contains 64 scalar value that is obtained from each 1x28x28 image.

So what I need to implement is for example compute the backward derivative first output of my model with respect to first input in the batch. Ofcourse this is valid for each element in the batch.

And I would expect to get a tensor (Batch_Backward) of shape (64x1x28x28) after this batch backward derivative operation. The important thing here Batch_Backward[0] should be calculated based on the backward derivative Out[0] which is actually the output for X[0].

Something like this in the code:

def backward_batch(model, image):
    # Image is of shape 64x1x28x28

    output = model(image)
    output = softmax(output)

    Out = some_func(output)  #Out is of shape 64x1
    Out.backward()  # This does not work!

    return image.grad  # should be 64x1x28x28 as explained in the previous post

Hi ptrblck,

Can you please check my additional explanations and comment whether it is possible or not?

Thanks in advance…

I’m still unsure, how to interpret this use case and what exactly is not working.
In your previous post you’ve mentioned, that Out.backward() is not working. Could you explain, if you are seeing an error or if this method would break your use case somehow?

Hi @ptrblck

Just forget the Out.backward() command and instead understand my actual need please.

Let me try to explain again in two parts.

Assume that I am using a batch of image from MNIST test loader and batch size is 64.

For this batch, I feed this batch image tensor( size 64x1x28x28) to my CNN model and get an output of size 64x10.

And I also apply some operation on my output tensor and get a final output tensor of shape 64x1.

That means for every input image tensor of shape 1x1x28x28 I get a scalar output of shape 1.

Is it clear until this part? If yes I continue with the second part.

Now, the thing I need to implement is I want to apply a backward derivative operation on my final output tensor. Actually what I want is to find the derivative of first output scalar in the final output tensor with respect to first input image in the input batch tensor. That is the case for all 64 output scalars and 64 input images.

In the example piece of code I provide, it is just used as a demonstration, please don’t be stuck with that. The backward operation there should be applied to every element of final output tensor one by one because for example first output is calculated based on the first input image and x.

and all the final derivatives should be stored in a tensor of shape 64x1x28x28.

(It is composed of sth like: d Out_1/d x_1 , d Out_2/d x_2 , …, d Out_64/d x_64)

If the batch size was 1 I could do it without a problem. But when the batch size is 64 for example, I can’t do it. Can you please help me implement in Pytorch?

Is my problem clear for you now?

May be I should use 64 different ‘x’ tensors of shape 1x28x28 because the backward operation will be based on each one of these tensors… I don’t know…

def backward_batch(model, image):
    # Image is of shape 64x1x28x28

    x = torch.zeros_like(image, requires_grad=True)   # x is of shape 64x1x28x28

    output = model(image + x)
    output = softmax(output)

    Out = some_func(output)  #Final output is of shape 64x1
    Out.backward()  # How to handle here?

    return delta.grad  # should be 64x1x28x28 ```

That would be the usual use case, wouldn’t it? The samples in a batch are independent from each other (besides batch-dependent layers such as batchnorm layers).

If I understand it correctly, your single sample approach would look like this:

# setup
x = torch.randn(64, 2, requires_grad=True)
model = nn.Sequential(
    nn.Linear(2, 10),
    nn.Linear(10, 10)

# single sample approach
for idx, x_ in enumerate(x):
    out = model(x_.unsqueeze(0))
grad_ref = x.grad.clone()

I.e. in each iteration a single sample from the batch would be used and the gradient would be accumulated to the input tensor in the corresponding index. Note how the x.grad attribute is filled during the loop execution.

If the previous approach is what you are targeting, my previous suggestion of using backward(torch.ones_like(out)) should still work:

# clear gradients
x.grad = None

# pass complete batch to model
out = model(x)
out = out.mean(1)

# compare
print(torch.allclose(grad_ref, x.grad))
> True

As you can see, both approaches yield the same gradients.
If my code snippet is still not mathcing your use case, you would have to provide a code snippet to show what exactly is creating the issues, as I might not understand the terminology clearly.

Many thanks for your support!