I’m trying to train a two-stage model in an end-to-end way. However, I want to update the different stages of models with different losses. For example, suppose the end to end model is composed of two models:model1 and model2. The output is calculated through running
features = model1(inputs)
output = model2(features)
I want to update the parameters of model1 with loss1, while keeping the parameter of model2 unchanged. Next, I want to update the parameters of model2 with loss2, while keeping the parameter of model1 unchanged. My full implementation is something like:
import torch
import torch.nn as nn
# Define the first model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Linear(20, 10)
self.conv2 = nn.Linear(10, 5)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
return x
# Define the second model
class Net1(nn.Module):
def __init__(self):
super(Net1, self).__init__()
self.conv1 = nn.Linear(5, 1)
def forward(self, x):
x = self.conv1(x)
return x
# Initialize models
model1 = Net()
model2 = Net1()
# Initialize separate optimizers for each model
optimizer = torch.optim.SGD(model1.parameters(), lr=0.1)
optimizer1 = torch.optim.SGD(model2.parameters(), lr=0.1)
optimizer.zero_grad()
optimizer1.zero_grad()
criterion = nn.CrossEntropyLoss()
# Sample inputs and labels
inputs = torch.randn(2, 20)
labels = torch.randn(2,1)
features = model1(inputs)
outputs_model = model2(features)
loss1 = criterion(outputs_model[0], labels[0])
loss2 = criterion(outputs_model, labels)
loss1.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
optimizer1.zero_grad()
loss2.backward()
optimizer1.step()
optimizer.zero_grad()
optimizer1.zero_grad()
print(f"Loss1 (Net): {loss1.item()}")
print(f"Loss2 (Net1): {loss2.item()}")
However, this will return
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 5]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
The full error message is
Traceback (most recent call last):
File ", line 55, in <module>
loss2.backward()
^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/_tensor.py", line 521, in backward
torch.autograd.backward(
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/autograd/__init__.py", line 289, in backward
_engine_run_backward(
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 5]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I kinda understand why this is happening, but is there a way to address this? Any help is appreciated