How to include gradient of nn wrt. the input in the loss function?

Hello all,

I want to include gradient of nn wrt. the input in the loss function, here is my code:

....
dydx  = torch.zeros(Nminibatch,Nin*2)
Y_net = torch.zeros(Nminibatch,Nout)
optimizer = optim.Adam(net.parameters(), lr=LR)

for epoch in range(Nepochs):
  print('starting epoch ' + str(epoch) + ', Learning rate = ' + str(LR))
  for batch_idx, (X, Y) in enumerate(loader):
    X.requires_grad = True
    Y_net = net(X)
    # loop over minibatch
    for idx in range(Y_net.size()[0]):
      dydx[idx,:] = torch.autograd.grad(Y_net[idx,0],X,create_graph=True)[0][idx,:]
    loss = loss_f(Y_net, Y) + (dydx[:,torch.arange(0,Nin)]).sum()
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    optimizer.step()

But I got this error message:

Traceback (most recent call last):
  File "<stdin>", line 14, in <module>
  File "/home/truhlard/ning0035/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/truhlard/ning0035/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [50, 50]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Does anyone know how to fix this?

Thanks a lot!

Xiaodong

I guess the error is raised by using retain_graph=True and trying to update parameters multiple times, which would be wrong, since the gradient would be stale.
Could you explain why you are using this argument?

1 Like

Thanks for your reply! I think you are right. In my code, the parmaters inside the mini-batch loop could be updated multiple times:

for idx in range(Y_net.size()[0]):
      dydx[idx,:] = torch.autograd.grad(Y_net[idx,0],X,create_graph=True)[0][idx,:]

I have changed the code by defining another tensor loss_g, and it can run smoothly now:

The reason for this implementation is that I wan to make sure the gradient of nn output wrt. the input is similar to some known values. I used (dydx[:,torch.arange(0,Nin)]).sum() here just for simplicity.

Xiaodong