How to get around IN PLACE OPERATION error if index leaf variable for gradient update?

sceamms · March 7, 2018, 8:02pm

Hello all, I am encountering In place operation error when I am trying to index a leaf variable to update gradients with customized Shrink function. I cannot work around it. Any help is highly appreciated!

import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable, Function

# hyper parameters
batch_size = 100 # batch size of images
ld = 0.2 # sparse penalty
lr = 0.1 # learning rate

x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False)  # original

# depends on size of the dictionary, number of atoms.
D = Variable(torch.from_numpy(np.random.normal(0,1,(500,10,10))), requires_grad=True)

# hx sparse representation
ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)

ht_ori = ht

# Dictionary loss function
loss = nn.MSELoss()

# customized shrink function to update gradient
shrink_ht = lambda x: torch.stack([torch.sign(i)*torch.max(torch.abs(i)-lr*ld,0)[0] for i in x])

### sparse reprsentation optimizer_ht single image.
# optimizer_ht = torch.optim.SGD([ht,D], lr=lr, momentum=0.9)
optimizer_ht = torch.optim.SGD([ht], lr=lr, momentum=0.9) # optimizer for sparse representation
#optimizer_ht.zero_grad() # clear up gradients

## update from resconstruction
#loss_ht = 0.5*torch.norm((x-(D*ht).sum(dim=0)),p=2)**2
#loss_ht.backward() # back propogation and calculate gradients
#optimizer_ht.step() # update parameters with gradients

## update for the batch
for idx in range(len(x)):
    optimizer_ht.zero_grad() # clear up gradients
    loss_ht = 0.5*torch.norm((x[idx]-(D*ht[idx]).sum(dim=0)),p=2)**2
    loss_ht.backward() # back propogation and calculate gradients
    optimizer_ht.step() # update parameters with gradients
    ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

RuntimeError Traceback (most recent call last)
in ()
15 loss_ht.backward() # back propogation and calculate gradients
16 optimizer_ht.step() # update parameters with gradients
—> 17 ht[idx] = shrink_ht(ht[idx]) # customized shrink function.
18
19

/home/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py in setitem(self, key, value)
85 return MaskedFill.apply(self, key, value, True)
86 else:
—> 87 return SetItem.apply(self, key, value)
88
89 def deepcopy(self, memo):

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

Specifically, this line of code below seems give error as it index and update leaf variable at the same time. how

 ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

Could you help? Thanks!

W.S.

jpeg729 · March 7, 2018, 10:55pm

You could sidestep the issue by modifying the underlying tensor directly.

ht.data[idx] = shrink_ht(ht[idx])

Normally inplace modification of the underlying tensor data is a bad idea because such operations aren’t stored in the computation graph and normally you need a coherent computation graph for backpropagation. But this line of code is a parameter update. The builtin optimizers update parameters by inplace modification of the parameter’s underlying tensor, so in this case you can too.

sceamms · March 8, 2018, 11:49am

@jpeg729 Thank you! I use input shrink_ht(ht[idx].data) to make it works. but gradients seems not updated. please see my reply below.

sceamms · March 8, 2018, 12:21pm

Sorry. for some reasons. ht is not updated when I compare the starting point and end of one cycle. nothing is changed.

end result with one cycle.

ht[1]
Variable containing:
( 0 ,.,.) = 
   17.8900

( 1 ,.,.) = 
   87.1190

compare to the beginning, there is no change. do you know why?

ht_ori[1]
Variable containing:
( 0 ,.,.) = 
   17.8900

( 1 ,.,.) = 
   87.1190

jpeg729 · March 8, 2018, 12:36pm

ht_ori is not a copy of the original ht, it references the same memory storage as ht.

You should have done

ht_ori = ht.clone()

sceamms · March 8, 2018, 1:01pm

you are right. Thank you! I have only made a reference, not original copy itself.