How to include gradient of nn wrt. the input in the loss function?

XiaodongMa-MRI · November 23, 2020, 6:46am

Hello all,

I want to include gradient of nn wrt. the input in the loss function, here is my code:

....
dydx  = torch.zeros(Nminibatch,Nin*2)
Y_net = torch.zeros(Nminibatch,Nout)
optimizer = optim.Adam(net.parameters(), lr=LR)

for epoch in range(Nepochs):
  print('starting epoch ' + str(epoch) + ', Learning rate = ' + str(LR))
  for batch_idx, (X, Y) in enumerate(loader):
    X.requires_grad = True
    Y_net = net(X)
    # loop over minibatch
    for idx in range(Y_net.size()[0]):
      dydx[idx,:] = torch.autograd.grad(Y_net[idx,0],X,create_graph=True)[0][idx,:]
    loss = loss_f(Y_net, Y) + (dydx[:,torch.arange(0,Nin)]).sum()
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    optimizer.step()

But I got this error message:

Traceback (most recent call last):
  File "<stdin>", line 14, in <module>
  File "/home/truhlard/ning0035/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/truhlard/ning0035/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [50, 50]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Does anyone know how to fix this?

Thanks a lot!

Xiaodong

ptrblck · November 24, 2020, 1:31am

I guess the error is raised by using retain_graph=True and trying to update parameters multiple times, which would be wrong, since the gradient would be stale.
Could you explain why you are using this argument?

XiaodongMa-MRI · November 24, 2020, 3:24pm

Thanks for your reply! I think you are right. In my code, the parmaters inside the mini-batch loop could be updated multiple times:

for idx in range(Y_net.size()[0]):
      dydx[idx,:] = torch.autograd.grad(Y_net[idx,0],X,create_graph=True)[0][idx,:]

I have changed the code by defining another tensor loss_g, and it can run smoothly now:

XiaodongMa-MRI:

for epoch in range(Nepochs):
  print('starting epoch ' + str(epoch) + ', Learning rate = ' + str(LR))
  for batch_idx, (X, Y) in enumerate(loader):
    X.requires_grad = True
    Y_net = net(X)
    loss_g = torch.zeros([0.0])
    # loop over minibatch
    for idx in range(Y_net.size()[0]):
      dydx[idx,:] = torch.autograd.grad(Y_net[idx,0],X,create_graph=True)[0][idx,:]
      loss_g = loss_g + (dydx[:,torch.arange(0,Nin)]).sum()
    loss = loss_f(Y_net, Y) + loss_g
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    optimizer.step()

The reason for this implementation is that I wan to make sure the gradient of nn output wrt. the input is similar to some known values. I used (dydx[:,torch.arange(0,Nin)]).sum() here just for simplicity.

Xiaodong