RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (Meta Learning)

Hi everybody!
I’m trying to implement an algorithm for MAML model (meta learning). The idea of my code is trying to train a LM model like BERT with different sets of data in a few training steps and optimize the parameters over those sets of data. Finally, the model parameters will be updated again over the sum of losses, computed in the last training steps. However, my implementation has a bug as below. I did google this bug. I think it might has something wrong while performing backward pass for the sum of losses, but I have not solved it so far. Can you guys help me solve it. Thank you in advance!

# Model definition
class BertClassifier(nn.Module):
    def __init__(self, pretrain_path, num_labels): 
        nn.Module.__init__(self)
        self.bert = AutoModel.from_pretrained(pretrain_path)
        self.linear_out = nn.Linear(self.bert.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask):
        output = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        output = self.linear_out(output.last_hidden_state[:,0,:])
        return output


# Training configuration
num_epochs = 2
MAX_ITER = 5  
optim1 = torch.optim.AdamW(model.parameters(), lr=1e-3)
optim2 = torch.optim.AdamW(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()


# Training loop
for epoch in range(num_epochs):

    loss_list = []
    num_iter = 0
    model.train()
    optim2.zero_grad()
    
    while num_iter < MAX_ITER:
        optim1.zero_grad()
        batch_supp, _, _ = next(train_data_loader)
        supp_labels = batch_supp['label'].to(device)
        supp_ids = batch_supp['ids'].to(device)
        supp_mask = batch_supp['mask'].to(device)

        #forward pass
        logits = model(input_ids=supp_ids, attention_mask=supp_mask)

        #compute loss
        loss = loss_fn(logits, supp_labels)
        loss.backward(retain_graph=True)
        loss_list.append(loss)
        optim1.step()
        
        num_iter += 1
        print(">> Loss at iter {}: {}".format(num_iter, loss.item()))

    
    total_loss = sum(loss_list)
    total_loss.backward()
    optim2.step()
    print(">> Overall loss: {}".format(total_loss.item()))

Hi, why have you used retain_graph=True in this backward call? Any specific reason to do that explicitly?

If not, please remove this and re-run the code and let me know if you still face the error.

Hi @srishti-git1110 , if we do loss.backward() at each training step and then perform total_loss.backward() without setting retain_graph=True in previous, the error message will pop up. I think because the total_loss is computed as the sum over all loss in list_lost. You can find more details at this link:

By the way, I’ll give you the error message if I do not set the parameter retain_graph to True in loss.backward()

@hduc-le You are right, I’m sorry. I just saw retain_graph=True and was tempted to point it out as it generally becomes a source of error.

While for your code, I’m not able to think exactly where the inplace modification is taking place.

But, could you please try another approach by initialising total_loss=0.0 to be a float variable instead of a list.

And instead of the appending step, use:

total_loss = total_loss + loss

I’ve simulated this with a simple example on my end and its working fine. Let me know if you still face the error, please.

My simulation:

import torch
z = torch.tensor([5.0], requires_grad=True) # simulates the model parameters 
optim1 = torch.optim.AdamW([z], lr=1e-3)
optim2 = torch.optim.AdamW([z], lr=1e-2)
num_epochs = 2
MAX_ITER = 5 
for epoch in range(num_epochs):
    total_loss = 0.0
    num_iter = 0
    optim2.zero_grad()
    while num_iter < MAX_ITER:
        optim1.zero_grad()
        loss = z*0.1
        total_loss = total_loss + loss
        loss.backward(retain_graph=True)
        optim1.step()
        num_iter += 1
        print(">> Loss at iter {}: {}".format(num_iter, loss.item()))
    total_loss.backward()
    optim2.step()
    print(">> Overall loss: {}".format(total_loss.item()))

out -

>> Loss at iter 1: 0.5
>> Loss at iter 2: 0.49989500641822815
>> Loss at iter 3: 0.4997900128364563
>> Loss at iter 4: 0.49968501925468445
>> Loss at iter 5: 0.4995799958705902
>> Overall loss: 2.4989500045776367
>> Loss at iter 1: 0.4984250068664551
>> Loss at iter 2: 0.4983200132846832
>> Loss at iter 3: 0.4982150197029114
>> Loss at iter 4: 0.4981100261211395
>> Loss at iter 5: 0.4980050027370453
>> Overall loss: 2.491075038909912

I also tried this same simulation with the list approach, and the simulation works fine for me with that, too. In your screenshot in the first post, it shows sum(losses) while it should be sum(loss_list) according to your code.

Hi, I have just edited my previous reply.
Please see that.

1 Like

Firstly, I very much appreciate your help. I’m sorry for my mistake, in the first post, I took the screenshot before changing the variable name from losses to loss_list. So it’s fine with my code for sum(loss_list). But the list approach does not solve my problem.

I’ve tried your suggestion that uses total_loss = total_loss + loss and unfortunately, it still gets the same error. Do you mind if I give you the link to the notebook that I’m working on for more details? Thanks

Sorry, my reply has been annotated as spam, so I put the link to my notebook here, thanks: Google Colab

Sure, please link your notebook.

I see. If you are still getting the error, the simulation didn’t exactly replicate the situation. There might be more nuances to it; I’ll see in your notebook.

1 Like

Again, I really appreciate your consideration. Here is the link: Google Colab that I’m working. If you figure out any mistake in my implementation, please let me know!

Hi @albanD, I was stuck with this problem for a day. I did a google search and saw your comment at [Solved][Pytorch1.5] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation - #2 by albanD. But in my scenario, it has a little bit different, i.e:
I need to optimize the model parameters over a few iterations by cross-entropy loss and optimizer optim1, and finally, take the sum of those losses computed over the iterations to update the model parameters for the last time with optimizer optim2. Unfortunately, my code has an error as in my description above. Hope you help me solve it and I appreciate that. Thank you in advance!