RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (new to this)

Hi! I try to make an adaptive equalizer using 4 FIR Filters implemented by 4 Conv1d modules.
the class of the mimo filter is:

class MIMO(nn.Module):
    def __init__(self,L,oversampling):
        super(MIMO,self).__init__()
        self.L = L
        self.oversampling = oversampling
        self.fir11 = nn.Conv1d(in_channels=1,out_channels=1,kernel_size=L,bias=False,dtype=torch.complex128)
        self.fir11.weight.data.zero_()
        self.fir11.weight.data[0][0][0] = 1
        self.fir12 = nn.Conv1d(in_channels=1,out_channels=1,kernel_size=L,bias=False,dtype=torch.complex128)
        self.fir12.weight.data.zero_()
        self.fir22 = nn.Conv1d(in_channels=1,out_channels=1,kernel_size=L,bias=False,dtype=torch.complex128)
        self.fir22.weight.data.zero_()
        self.fir22.weight.data[0][0][0] = 1
        self.fir21 = nn.Conv1d(in_channels=1,out_channels=1,kernel_size=L,bias=False,dtype=torch.complex128)
        self.fir21.weight.data.zero_()
    
    def forward(self,x):
        L = self.L

        x1 = x[0][-L:].view(1,1,-1)
        x2 = x[1][-L:].view(1,1,-1)

        
        y2 = self.fir21(x1) + self.fir22(x2)
        y1 = self.fir11(x1) + self.fir12(x2)

        y1 = y1[0][0][:].view(-1)
        y2 = y2[0][0][:].view(-1)

        return y1, y2

My training code is the following:

Z2 = torch.from_numpy(Z2) # input data

# creating MIMIO layer

mimo = MIMO(15*oversampling,4)

Output = torch.tensor([[],[]]) # creates the output

optimizer = torch.optim.SGD(mimo.parameters(),lr = 10**-4)

# output = torch.vstack((y1,y2))
# Output = torch.hstack((Output,output))
transient = 15*oversampling
loss_save = []
torch.autograd.set_detect_anomaly(True)

for n in range(transient,1_000_000):
    optimizer.zero_grad()
    input1 = Z2[0][n-transient:n]
    input2 = Z2[1][n-transient:n]

    input = torch.vstack([input1,input2])

    y1, y2 = mimo(input)
    output = torch.vstack((y1,y2))
    Output = torch.hstack((Output,output))

    loss = torch.sum(torch.square(torch.abs(Output[0])-1)) # checked
    loss.backward(retain_graph=True)
    optimizer.step()
    loss_save = loss_save.append(loss.detach().numpy())
    print('=> epoch: {}, loss: {}'.format(n-transient,loss_save))

    if n == 5000:
        exit(1)

The scope of the adaptive equalizer is to update the four filters’ weights while passing time series data and i try to update the weights after every sample of the input data.

However, when I try to train the network, I get this error, and even if I backtrace its provenience, I can’t figure out where its the problem (I think I should use clone method or something).

The backtrace is:

y1, y2 = mimo(input)

y1 = self.fir11(x1) + self.fir12(x2)

return self._conv_forward(input, self.weight, self.bias)

The full error is: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1, 1, 60]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

These issues are often caused by using retain_graph=True while it’s not needed.
Could you explain why you are using this argument?

1 Like

I obtained this error previous:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I specified retain_graph = True to solve that runtime error.

This error is raised if you are keeping the computation graph from a previous iteration and are thus calling backward on the first computation graph again.
Based on your code I would guess it might be caused by reusing the Z2 slices so you could try to detach() these in each iteration.

1 Like