[Solved][Pytorch1.5] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Hi,
I’m facing the issue when I want to do backward() with 2 models, action_model and value_model. I’ve already searched related topic. They said that ‘pytorch 1.15’ always automatically check the ‘inplace’ when using backward(). However, it still report the same problem. How can I do backward() without missing previous parameter in model.

Thanks~~

This is error message:

File "train_test.py", line 234, in <module>
    flatten_imaginated_gru_hiddens).view(imagination_horizon+1, -1)
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jeff/dockers/pick_and_place/models.py", line 168, in forward
    state_value = self.fc4(hidden)
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/nn/functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
 (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File "train_test.py", line 255, in <module>
    action_loss.backward()
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/jeff/VirtualEnv/openai/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [400, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The problems are from action_model and value_model when I want to do backward() :

imaginated_values = value_model(flatten_imaginated_states,
                                          flatten_imaginated_gru_hiddens).view(imagination_horizon+1, -1)
         
## compute lambda target
lambda_target_values = utils.lambda_target(imaginated_rewards, imaginated_values,
                                                   gamma, lambda_)

## update_value model
value_loss = 0.5 * mse_loss(imaginated_values, lambda_target_values.detach())
value_optimizer.zero_grad()
value_loss.backward(retain_graph=True) 
#--------------------------------------------------------------
 # retain_graph=True : 
# reserve the parameters in the process of backpropagation 
 # when encountering two more outputs from the forward processing.
#--------------------------------------------------------------
clip_grad_norm_(value_model.parameters(), clip_grad_norm)
value_optimizer.step()

 ## update action model (multiply -1 for gradient ascent)
action_loss = -1*(lambda_target_values.mean())
action_optimizer.zero_grad()
action_loss.backward()
clip_grad_norm_(action_model.parameters(), clip_grad_norm)
action_optimizer.step()
3 Likes

Hi,

This is most likely happening because value_optimizer.step() actually modifies the weights of the model inplace while the original value of these weights is needed to compute action_loss.backward().
Is that the issue?

8 Likes

Right. But, when I degrade my pytorch version from 1.5.1 to 1.4.0, it worked… I don’t know what’s going on. Even though I use retained_graph in backward(), it still report the same issue.
Thanks.

Thanks albanD~
I find a way that it won’t pop up the issue by rearranging the code like the following bellow:

value_loss = 0.5 * mse_loss(imaginated_values, lambda_target_values.detach())
value_optimizer.zero_grad()

action_loss = -1*(lambda_target_values.mean())
action_optimizer.zero_grad()

value_loss.backward(retain_graph=True)
action_loss.backward()

clip_grad_norm_(value_model.parameters(), clip_grad_norm)
clip_grad_norm_(action_model.parameters(), clip_grad_norm)

value_optimizer.step()
action_optimizer.step()

It won’t get the issue of missing weights again. Thanks~

17 Likes

Before 1.5, these tests were not working properly for the optimizers. That’s why you didn’t see any error. But the computed gradients were not correct.

9 Likes

Hi albanD,

I think I’m running into a very similar problem.

I’m working on a Policy Gradient, in this algorithm you typically use some memory you sample from. The memory contains the last lets say 10 episodes the algorithm played. What you store are the log probabilities for the actions it chose. In this case you actually reuse data points even though they have already been updated, knowing that they are slightly wrong in the gradient, since they model would predict a slightly different probability after the first update. To be mathematically correct you would need to use importance sampling (p_old/p_new) but since this fraction is close to 1 of you use a sufficiently small learning rate this is typically dropped.

So some pseudo code would roughly look like this:

for episode in range(episodes_to_play):
    episode = play_episode()
    store_episode_in_memory(episode)
    drop_some_really_old_stuff_from_memory()
    log_probability, rewards = sample_some_memory()
    loss = REINFORCELOSS(log_probability, rewards)
    loss.backward(retain_graph=True)

I might have understood your explanation wrongly, but as far as I got it this is not possible anymore? The memory contains episodes that have already been used to train, but are still “fresh” enough to be not too far from the real distribution.

So my two questions:

  1. Do I understand your explanation correctly and is the multiple iteration over the loss.backward() the problem why this throws errors, even if I use retain_graph=True?
  2. How can I fix this and get back to the old behavior?

Thank you very much for your time!

I initially thought this would solve the problem but unfortunately it didn’t. I will leave the comment though.

I think I have also verified that this happens due to this particular change, using PyTorch 1.4 it works perfectly fine


I think I have found a solution, assuming that my understanding of the problem is accurate:
Simply cloning the loss before calling .backward(). Is that an accurate solution or did I miss something?

for episode in range(episodes_to_play):
    episode = play_episode()
    store_episode_in_memory(episode)
    drop_some_really_old_stuff_from_memory()
    log_probability, rewards = sample_some_memory()
    loss = REINFORCELOSS(log_probability, rewards).clone()
    loss.backward(retain_graph=True)

Hi,
I implemented the triplet loss and trained the model with the loss.
however, when I passed the three inputs to the model, the same error occurs.
The code is shown below, and the model is Resnet or EfficientNet.
whether I use the Pyotrch1.5 or 1.4,1.3, both of them are wrong. I don‘t know the reason.
Even I changed the ‘inplace operation’ in Resnet code, the error still exists in the running process.

def train(epoch):
model.train() #
total_loss = 0.0
total_size = 0
for batch_idx, (data, target) in enumerate(train_loader): # 一个batch
if args.cuda:
data[0], target[0] = data[0].cuda(), target[0].cuda()
data[1], target[1] = data[1].cuda(), target[1].cuda()
data[2], target[2] = data[2].cuda(), target[2].cuda()

        data[0], target[0] = Variable(data[0]), Variable(target[0])
        data[1], target[1] = Variable(data[1]), Variable(target[1])
        data[2], target[2] = Variable(data[2]), Variable(target[2])

        optimizer.zero_grad()

        anchor = model.forward(data[0])
        positive = model.forward(data[1])
        negative = model.forward(data[2])

        loss_cls_0 = criterion(anchor, target[0].long())
        loss_cls_1 = criterion(positive, target[1].long())
        loss_cls_2 = criterion(negative, target[2].long())
        loss_cls = loss_cls_0 + loss_cls_1 + loss_cls_2

        loss_tri = triplet_loss.forward(anchor, positive, negative)

        loss = loss_tri + loss_cls


        total_loss += loss.data.cpu()[0]
        total_size += data[0].size(0)

        loss.backward()
        optimizer.step()

Thank you very very much!! I’ve also had this problem and tried many times but all failed until I saw your post. That was really helpful!

This is very helpful!
I would like to provide a tiny failing example, though I’m not fully convinced this gradient checking is not an overkill.

enc = nn.Conv2d(3, 5, 3, 1, 1)
dec = nn.Conv2d(5, 3, 3, 1, 1)

e_opt = torch.optim.Adam(enc.parameters(), lr=1e-4)
g_opt = torch.optim.Adam(dec.parameters(), lr=1e-4)

N = 1
H = W = 8

for i in range(2):
    g_opt.zero_grad()
    e_opt.zero_grad()

    image = torch.randn(N, 3, H, W)
    z = enc(image)
    recon = dec(z)

    c1_loss = F.l1_loss(recon, image)

    # encoder
    c1_loss.backward(retain_graph=True)
    e_opt.step()

    # decoder
    g_opt.zero_grad()
    # must do this line:
    # c1_loss = F.l1_loss(dec(z.detach()), image)
    c1_loss.backward()
    g_opt.step()

It’s a simple encoder-decoder (placeholder) network. Here I have two separate optimizers for the decoder and encoder respectively. Both are optimizing c1_loss. I first backprop c1 and update encoder. Then I clear the gradient in decoder and re-backprop c1_loss. This will trigger the same runtime error. However, the resulting gradient should be right in this way?

Of course I wrote some unnecessary steps for this simple example, but it is pretty common to optimize encoder with c1_loss + enc_loss, and optimize decocer with c1_loss + dec_loss.

The error is raised, as the second c1_loss.backward() call would also try to calculate the gradients in enc using the already updated parameters and would thus yield this error.
Based on the code snippet it seems the second backward call is needed to calculate the gradients in dec only, which would then be updated in g_opt.step() (explained to me by @albanD :wink: ).
If so, then you could use this neat approach:

    c1_loss.backward(inputs=list(dec.parameters()))
    g_opt.step()

which would only accumulate the gradients into the passed leaf tensors.

5 Likes

I have been facing this problem for over 8 hours, not being able to understand my problem.
Could anyone try to help?
This is my class for performing the REINFORCE policy gradient:

def update_policy(self, rewards, log_probs, entropies, gamma):
R = torch.zeros(1, 1)
loss = 0
for i in reversed(range(len(rewards))):
R = gamma * R + rewards[i]
loss = loss - (log_probs[i](Variable®.expand_as(log_probs[i]))).sum() - (constantAentropies[i]).sum()
loss = loss / len(rewards)

    optimizer.zero_grad()  #zero up gradients since pytorch accumulates in "backward()"
    loss.backward(retain_graph=True)
    nn.utils.clip_grad_norm_(self.parameters(), 40)
    optimizer.step()

def act(self, state):
mu, sigma = self.forward(Variable(state))
sigma = F.softplus(sigma)
epsilon = torch.randn(mu.size())
action = (mu + sigma.sqrt()Variable(epsilon)).data
prob = normal(action, mu, sigma)
entropy = -0.5
((sigma+2*pi.expand_as(sigma)).log()+1)
log_prob = prob.log()
return action, prob, log_prob, entropy

def forward(self, state):
    x = state
    x = (F.relu(self.linear1(x), inplace=True))
    mu = self.linear2(x)
    sigma = self.linear2_(x)
    return mu, sigma

By logs it looks like the problem derives from " sigma = self.linear2_(x)" But I am not entirely sure of that.

Thanks a lot!!

Thanks for this solution, but it seems that it only works for pytorch with version higher than 1.8 (otherwise there is no input argument named `inputs’ for backward()). Is there any other elegant solution to specify the scope of gradients when doing backward()?

I don’t think the same functionality was available before the added inputs arguments, so you would need to update to the latest release. CC @albanD to correct me

1 Like

Hi,

Indeed, the only way to get something similar with earlier releases is to use autograd.grad() and then populate the .grad fields manually with the gradient it returned.

1 Like

I met the same problem as you, I passed the two inputs to the backbone, the error appeared.
But when there is only one input, the error disappears .May I ask how you solve this problem?
This is my code:
inputs = inputs.cuda(cfg[‘GPU’], non_blocking=True)
labels = labels.cuda(cfg[‘GPU’], non_blocking=True)
inputs_ = inputs_.cuda(cfg[‘GPU’], non_blocking=True)
labels_ = labels_.cuda(cfg[‘GPU’], non_blocking=True)
features = backbone(inputs)
features_ = backbone(inputs_)
outputs = head(features, labels)
outputs_ = head(features_, labels_)
lossx = loss(outputs, labels) + loss(outputs_, labels_)
optimizer.zero_grad()
lossx.backward()
optimizer.step()

Thanks! It worked in my codes.
But, I have a question for this. Isn’t there any effects from action_loss to value_optimizer?
I mean, in my minor knowledge, I didn’t understand why the computational graph of
“value_loss → action_loss → value_optimizer → action_optimizer” have each correct optimizing value.

Could you anyone help me and explain this shortly? thanks.

Thanks for all wonderful discussions.

I just wanted to confirm that the following two solutions would give the same and right optimization results (pytorch 1.9.0).

In such a pipeline:

optim1 = optim.Adam(G.parameters())
optim2 = optim.Adam(D.parameters())
G = Model1()
D = Model2()
recons, z = G(input)
loss1 = loss_func1(recons)
diff = D(z)
loss2 = loss_func2(diff)
loss3 = loss_func3(diff)
loss_G = loss1 + loss2 # we don’t want to update D parameters here
loss_D = loss3

Solution #1

optim1.zero_grad()
loss_G.backward(retain_graph=True)
optim2.zero_grad()
loss_D.backward()
optim1.step()
optim2.step()

Solution #2

optim1.zero_grad()
loss_G.backward(retain_graph=True, inputs=list(G.parameters()))
optim1.step()
optim2.zero_grad()
loss_D.backward(inputs=list(D.parameters()))
optim2.step()

Both of the solutions come from the previous solutions. Thanks again.

1 Like

I am facing the same issue and has beem stuck for 1 day
here my code

class RNN(nn.Module):
    
    def __init__(self,input_size, output_size, hidden_size=64):

        super().__init__()

        self.input_size  = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.xh = nn.Linear(self.input_size, self.hidden_size, bias=False)


        self.hh = nn.Linear(self.hidden_size, self.hidden_size)
        self.hy = nn.Linear(self.hidden_size, self.output_size)
        
        self.h = torch.zeros(self.hidden_size, requires_grad=True)
        
        self.tanh = nn.Tanh()
        self.softmax = nn.Softmax(dim=1)
        self.sigmoid = nn.Sigmoid()

    def rnn_cell(self, x_t):  
        
        first_h = self.hh(self.h)
        
        second_x = self.xh(x_t)

        act = second_x + first_h
        
        self.h = self.tanh(act)

        updated_c = self.sigmoid(self.hy(self.h))

        return updated_c


    def forward(self, inp):
        return self.rnn_cell(inp)

here is training code

def train(train_x,  valid_x, lr, epochs, hidden_units, net='RNN'):
    
    for step, (data, label) in enumerate(train_x):
        inputs = np.array(data)
        break

    if net=='RNN':
        model = RNN(inputs.shape[1], 1, hidden_units)
    elif net == 'LSTM':
        h = torch.zeros(hidden_units).requires_grad_()
        c = torch.zeros(hidden_units).requires_grad_()
        model = LSTM(inputs.shape[1], 1, hidden_units)
    elif net == 'GRU':
        St_1 = torch.zeros(hidden_units).requires_grad_()
        model = GRUModel(inputs.shape[1], 1, hidden_units)
    model.to(device)
    
    
    train_loss, val_loss = [],[]
    train_accuracy, val_accuracy = [], []
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    criterion = nn.BCELoss()

    
    
    for ep in range(epochs):
        running_loss, correct = 0, 0
        for i, (data, label) in enumerate(train_x):
            data, label = Variable(data), Variable(label)
            data, label = data.to(device), label.to(device)
            
            optimizer.zero_grad()
            
            if net == 'RNN':
                net_out= model(data)
            elif net == 'LSTM':
                net_out, h, c = model(data, h, c)
            elif net == 'GRU':
                net_out, St_1 = model(data, St_1)
                
            
            label = torch.reshape(label, (label.shape[0], 1))
            net_out = torch.reshape(net_out, (label.shape[0], 1))
            label = label.float()
            loss = criterion(net_out, label)
            loss.backward(retain_graph=True, inputs=list(model.parameters()))
            optimizer.step()

            running_loss += loss.item()
#             pred = torch.argmax(net_out, axis=1)  # get the index of the max log-probability
#             actual = torch.argmax(label, axis=1)
            out = (net_out>0.5).float()
            correct += out.eq(label).sum()



        print(running_loss)
        print("Epoch:", ep)
        print(correct.item())
        print("Training Accuracy:", 100. * correct.item() / len(train_x.dataset))
        print("Train Loss:", running_loss / len(train_x.dataset))
        train_loss.append(running_loss / len(train_x.dataset))
        train_accuracy.append(correct / len(train_x.dataset))


#         test_loss = 0
#         correct = 0
#         with torch.no_grad():
#             for batch_idx, (data, target) in enumerate(valid_x):
#                 data, target = Variable(data), Variable(target)
#                 data, target = data.to(device), target.to(device)
#     #                 data = data.view(-1, 784)
#                 if net == 'RNN':
#                     net_out, _ = model(data, h)
#                 elif net == 'LSTM':
#                     net_out, _, _ = model(data, h, c)
#                 elif net == 'GRU':
#                     net_out, _ = model(data, St_1)
#                 net_out = torch.reshape(net_out, (net_out.shape[0],))
#                 # sum up batch loss
#                 target = target.float()
#                 test_loss += criterion(net_out, target).item()
#     #                 pred = torch.argmax(net_out, axis=1)  # get the index of the max log-probability
#     #                 actual = torch.argmax(label, axis=1)
#                 out = (net_out>0.5).float()
#                 correct += out.eq(target).sum()
#             val_loss.append(test_loss / len(valid_x.dataset))
#             val_accuracy.append(correct / len(valid_x.dataset))

#         print("Validation Accuracy:" , 100. * correct.item() / len(valid_x.dataset))
#         print("Validation Loss:", test_loss / len(valid_x.dataset)) 
#         print("----------------------------------------------------------")
    
    return model, train_loss, train_accuracy, val_loss, val_accuracy
        
        
    
    

Here is the error list:

/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/autograd/__init__.py:147: UserWarning: Error detected in MmBackward. Traceback of forward call that caused the error:
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 619, in start
    self.io_loop.start()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
    self._run_once()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
    handle._run()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/ioloop.py", line 688, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/gen.py", line 814, in inner
    self.ctx_run(self.run)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 358, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
    self.do_execute(
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2898, in run_cell
    result = self._run_cell(
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2944, in _run_cell
    return runner(coro)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3169, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3361, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-61-f6fbdf7371e3>", line 2, in <module>
    net, train_loss, train_accuracy, val_loss, val_accuracy = train(train_loader,  valid_loader, lr=0.0001, epochs=10,  hidden_units=64, net='RNN')
  File "<ipython-input-60-3bb8b3924f63>", line 35, in train
    net_out= model(data)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "<ipython-input-59-cdbb099f3af3>", line 39, in forward
    return self.rnn_cell(inp)
  File "<ipython-input-59-cdbb099f3af3>", line 25, in rnn_cell
    first_h = self.hh(self.h)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/Users/arslan/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
 (Triggered internally at  /Users/distiller/project/conda/conda-bld/pytorch_1623459044803/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
  Variable._execution_engine.run_backward(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-61-f6fbdf7371e3> in <module>
      1 torch.autograd.set_detect_anomaly(True)
----> 2 net, train_loss, train_accuracy, val_loss, val_accuracy = train(train_loader,  valid_loader, lr=0.0001, epochs=10,  hidden_units=64, net='RNN')

<ipython-input-60-3bb8b3924f63> in train(train_x, valid_x, lr, epochs, hidden_units, net)
     44             label = label.float()
     45             loss = criterion(net_out, label)
---> 46             loss.backward(retain_graph=True, inputs=list(model.parameters()))
     47             optimizer.step()
     48 

~/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

~/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145         retain_graph = create_graph
    146 
--> 147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 64]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

can you @ptrblck @albanD please help me I’ll be very thankful to all of you

I guess the error is raised, since you are not detaching the hidden state, are retaining the graph, and updating the parameters in each iteration. This would cause the intermediate forward activations from the previous iterations be become “stale” and the gradient computation would fail.
I don’t know what your exact use case is, but you might want to detach() the hidden and cell states in each iteration and remove retain_graph=True as well.