Gradient of scalar output w.r.t. input in batches

I have a neural network with scalar output and I want to compute the gradient of the output with respect to the input. I know I can use torch.autograd.grad for this purpose, but it only works when the batch size is one and hence the output is a scalar tensor.

However, to boost the speed, I want to work with mini-batches and then compute the derivative of each y[i] (output) w.r.t. each X[i] (input). How can I achieve that?

Minimal example:

import torch
from torch.autograd import grad

# batch size = 1
x = torch.tensor([[1.]], requires_grad=True)  # shape (1, 1)
y = x**2  # shape (1, 1)
y_x = grad(y, x)  # this works because y is a scalar

# batch size = 2, first attempt
x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = grad(y, x)  # RuntimeError: grad can be implicitly created only for scalar outputs

# batch size = 2, second attempt
x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = []
for i in range(len(y)):
  y_x.append(grad(y[i], x[i]))  # RuntimeError: One of the differentiated Tensors appears to not have been used in the graph.

Nvm, this does the job:

x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = grad(y, x, grad_outputs=torch.ones_like(y), create_graph=True)

What you just proposed gives me this error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

I would have used the grad_outputs parameter to avoid the loop:

x = torch.tensor([[1.], [1.]], requires_grad=True) 
y = x**2  # shape (2, 1)
y_x = grad(y, x, grad_outputs=torch.ones_like(y)) 

"""
tensor([[2.], [2.]])
"""

y = x*torch.tensor([[3.], [2.]]) 
y_x = grad(y, x, grad_outputs=torch.ones_like(y)) 
"""
tensor([[3.], [2.]])
"""

Yeah, I forgot to add the create_graph=True, sorry about that. I fixed it and I’ve written a cleaner solution.