# Autograd with respect to input?

This is my network:

``````class Net(nn.Module):
def __init__(self, device, channels=[2,10,10,10,10,10,1]):

super(Net, self).__init__()
weights = []
biases = []
gammas = []

N = len(channels)

self.layers = N-1

for i in range(self.layers):
weights.append(nn.Parameter(torch.randn((channels[i],channels[i+1]))).to(device))
biases.append(nn.Parameter(torch.randn((channels[i+1]))).to(device))
gammas.append(nn.Parameter(torch.randn((channels[i+1]))).to(device))

self.weights = weights
self.biases = biases
self.gammas = gammas

self.nl = torch.tanh

def forward(self, inp):

for i in range(self.layers-1):
x_1 = torch.mm(inp,self.weights[i])
x_2 = x_1 + self.biases[i]
x_2 = x_2 * self.gammas[i]
inp = self.nl(x_2)

x_1 = torch.mm(inp,self.weights[self.layers-1]) + self.biases[self.layers-1]
out = x_1 * self.gammas[self.layers-1]

return out
``````

Now when I try to use autograd on the input:

``````out=net(inp)
``````

I get two errors.

1. If I use more than one input in batch size say a 4x2 matrix as `inp`, where 4 samples used as inputs, I get the following error:
``````RuntimeError: grad can be implicitly created only for scalar outputs
``````

However, for such a network these are 4 different samples so it should be able to compute 4 different gradients. Is there a way to get this effect?

1. If I use the a batch size of 1, I get the following error:
``````RuntimeError: One of the differentiated Tensors does not require grad
``````

So, I guess I have set requires_grad True for `inp`. However, I cannot do that because it is a tensor object that I use as input. Is there any way I can do this?

Probably I should elaborate. When I try to make requires_grad true I get the following error:

``````TypeError: as_tensor() got an unexpected keyword argument 'requires_grad'
``````

1. If you only need to gradients w.r.t. the inputs (and don’t have bad things like batch norm), summing the outputs and then calling `.backward` will get you gradients. For weights it is more tricky to avoid gradients being accumulated.
2. `inp.requires_grad_()` before passing it into the network will tell PyTorch that you will ask to differentiate by `inp`.

Best regards

Thomas

Hi @tom, sorry for opening up such an old thread but I was wondering if there were any examples on how to use autograd to get the gradient w.r.t the weights (for all samples) without gradients being accumulated?

Some time ago, GitHub - cybertronai/autograd-lib had something for this.
The new way is supposed to be with vmap, but it seems it is not quite there yet [feature request] Simple and Efficient way to get gradients of each element of a sum · Issue #7786 · pytorch/pytorch · GitHub .
I did it manually when I needed it.
I also want to write a source-to-source differentiation thing for TorchScript if I ever find the time.

Best regards

Thomas

