Training different stage of model with different loss

I’m trying to train a two-stage model in an end-to-end way. However, I want to update the different stages of models with different losses. For example, suppose the end to end model is composed of two models:model1 and model2. The output is calculated through running

features = model1(inputs)
output = model2(features)

I want to update the parameters of model1 with loss1, while keeping the parameter of model2 unchanged. Next, I want to update the parameters of model2 with loss2, while keeping the parameter of model1 unchanged. My full implementation is something like:

import torch
import torch.nn as nn

# Define the first model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Linear(20, 10)
        self.conv2 = nn.Linear(10, 5)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x

# Define the second model
class Net1(nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.conv1 = nn.Linear(5, 1)

    def forward(self, x):
        x = self.conv1(x)
        return x

# Initialize models
model1 = Net()
model2 = Net1()

# Initialize separate optimizers for each model
optimizer = torch.optim.SGD(model1.parameters(), lr=0.1)
optimizer1 = torch.optim.SGD(model2.parameters(), lr=0.1)

optimizer.zero_grad() 
optimizer1.zero_grad()

criterion = nn.CrossEntropyLoss()

# Sample inputs and labels
inputs = torch.randn(2, 20)
labels = torch.randn(2,1)

features = model1(inputs)         
outputs_model = model2(features) 

loss1 = criterion(outputs_model[0], labels[0]) 
loss2 = criterion(outputs_model, labels) 
   
loss1.backward(retain_graph=True)  
optimizer.step()  
optimizer.zero_grad()       
optimizer1.zero_grad()  

 
loss2.backward()        
optimizer1.step()
optimizer.zero_grad()   
optimizer1.zero_grad()   

print(f"Loss1 (Net): {loss1.item()}")
print(f"Loss2 (Net1): {loss2.item()}")

However, this will return

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 5]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The full error message is

Traceback (most recent call last):
  File ", line 55, in <module>
    loss2.backward()        
    ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/_tensor.py", line 521, in backward
    torch.autograd.backward(
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/autograd/__init__.py", line 289, in backward
    _engine_run_backward(
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 5]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I kinda understand why this is happening, but is there a way to address this? Any help is appreciated

1 Like

You are most likely running into this issue which fails to compute the gradients since the forward activations are stale after a parameter update.

1 Like

Exactly, thanks a lot for that.