Autograd with respect to input?

This is my network:

class Net(nn.Module):
    def __init__(self, device, channels=[2,10,10,10,10,10,1]):
        super(Net, self).__init__()        
        weights = []
        biases = []
        gammas = []
        N = len(channels)
        self.layers = N-1
        for i in range(self.layers):
        self.weights = weights
        self.biases = biases
        self.gammas = gammas
        = torch.tanh
    def forward(self, inp):
        for i in range(self.layers-1):
            x_1 =,self.weights[i]) 
            x_2 = x_1 + self.biases[i]
            x_2 = x_2 * self.gammas[i]
            inp =
        x_1 =,self.weights[self.layers-1]) + self.biases[self.layers-1]
        out = x_1 * self.gammas[self.layers-1]
        return out

Now when I try to use autograd on the input:

out_inp = torch.autograd.grad(out,inp, create_graph=True)

I get two errors.

  1. If I use more than one input in batch size say a 4x2 matrix as inp, where 4 samples used as inputs, I get the following error:
RuntimeError: grad can be implicitly created only for scalar outputs

However, for such a network these are 4 different samples so it should be able to compute 4 different gradients. Is there a way to get this effect?

  1. If I use the a batch size of 1, I get the following error:
RuntimeError: One of the differentiated Tensors does not require grad

So, I guess I have set requires_grad True for inp. However, I cannot do that because it is a tensor object that I use as input. Is there any way I can do this?

Probably I should elaborate. When I try to make requires_grad true I get the following error:

TypeError: as_tensor() got an unexpected keyword argument 'requires_grad'

Thanks in advance!

  1. If you only need to gradients w.r.t. the inputs (and don’t have bad things like batch norm), summing the outputs and then calling .backward will get you gradients. For weights it is more tricky to avoid gradients being accumulated.
  2. inp.requires_grad_() before passing it into the network will tell PyTorch that you will ask to differentiate by inp.

Best regards


Hi @tom, sorry for opening up such an old thread but I was wondering if there were any examples on how to use autograd to get the gradient w.r.t the weights (for all samples) without gradients being accumulated?

Some time ago, GitHub - cybertronai/autograd-lib had something for this.
The new way is supposed to be with vmap, but it seems it is not quite there yet [feature request] Simple and Efficient way to get gradients of each element of a sum · Issue #7786 · pytorch/pytorch · GitHub .
I did it manually when I needed it.
I also want to write a source-to-source differentiation thing for TorchScript if I ever find the time.

Best regards


1 Like