Gradient of scalar output w.r.t. input in batches

I have a neural network with scalar output and I want to compute the gradient of the output with respect to the input. I know I can use torch.autograd.grad for this purpose, but it only works when the batch size is one and hence the output is a scalar tensor.

However, to boost the speed, I want to work with mini-batches and then compute the derivative of each y[i] (output) w.r.t. each X[i] (input). How can I achieve that?

Minimal example:

import torch
from torch.autograd import grad

# batch size = 1
x = torch.tensor([[1.]], requires_grad=True)  # shape (1, 1)
y = x**2  # shape (1, 1)
y_x = grad(y, x)  # this works because y is a scalar

# batch size = 2, first attempt
x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = grad(y, x)  # RuntimeError: grad can be implicitly created only for scalar outputs

# batch size = 2, second attempt
x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = []
for i in range(len(y)):
  y_x.append(grad(y[i], x[i]))  # RuntimeError: One of the differentiated Tensors appears to not have been used in the graph.

Nvm, this does the job:

x = torch.tensor([[1.], [1.]], requires_grad=True)  # shape (2, 1)
y = x**2  # shape (2, 1)
y_x = grad(y, x, grad_outputs=torch.ones_like(y), create_graph=True)

What you just proposed gives me this error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

I would have used the grad_outputs parameter to avoid the loop:

x = torch.tensor([[1.], [1.]], requires_grad=True) 
y = x**2  # shape (2, 1)
y_x = grad(y, x, grad_outputs=torch.ones_like(y)) 

"""
tensor([[2.], [2.]])
"""

y = x*torch.tensor([[3.], [2.]]) 
y_x = grad(y, x, grad_outputs=torch.ones_like(y)) 
"""
tensor([[3.], [2.]])
"""

Yeah, I forgot to add the create_graph=True, sorry about that. I fixed it and I’ve written a cleaner solution.

I am not sure if this is working for me, when I do this, my input is (batch_size, *), and output loss function, is of dims batch_size , and I have a model. I use autograd the following way grad_list = torch.autograd.grad(loss_val, inputs = [p for p in net.parameters() if p.requires_grad], grad_outputs=torch.ones_like(loss_val), create_graph=True)

However my grad_list seems to accumulate the gradients across all elements in the minibatch, where the size of the minibatch is batch_size, since I see my grad_list to have no indication of the minibatch size anywhere.