I have a neural network with scalar output and I want to compute the gradient of the output with respect to the input. I know I can use torch.autograd.grad for this purpose, but it only works when the batch size is one and hence the output is a scalar tensor.

However, to boost the speed, I want to work with mini-batches and then compute the derivative of each y[i] (output) w.r.t. each X[i] (input). How can I achieve that?

Minimal example:

import torch
from torch.autograd import grad
# batch size = 1
x = torch.tensor([[1.]], requires_grad=True) # shape (1, 1)
y = x**2 # shape (1, 1)
y_x = grad(y, x) # this works because y is a scalar
# batch size = 2, first attempt
x = torch.tensor([[1.], [1.]], requires_grad=True) # shape (2, 1)
y = x**2 # shape (2, 1)
y_x = grad(y, x) # RuntimeError: grad can be implicitly created only for scalar outputs
# batch size = 2, second attempt
x = torch.tensor([[1.], [1.]], requires_grad=True) # shape (2, 1)
y = x**2 # shape (2, 1)
y_x = []
for i in range(len(y)):
y_x.append(grad(y[i], x[i])) # RuntimeError: One of the differentiated Tensors appears to not have been used in the graph.

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

I would have used the grad_outputs parameter to avoid the loop: