Custom loss using output of other model

I have a loss function for a Model (model 1) which has a term calculating based on output of other model (model 2). So model 2 is not trained. I want to disable gard to get the output of model 2 faster. but I want to enable it for model 1.
input = …
output1 = model1(input)

#disable gard
output2 = model2(input)

#enable again
loss = criterion(output1, target) + function(output1,output2)

Thank you

You could disable the gradient calculation via with torch.no_grad() as seen here:

model1 = nn.Linear(1, 1)
model2 = nn.Linear(1, 1)

input = torch.randn(1, 1)
target = torch.randn(1, 1)
criterion = nn.MSELoss()
function = criterion

output1 = model1(input)

with torch.no_grad():
    output2 = model2(input)

loss = criterion(output1, target) + function(output1,output2)

for name, param in model1.named_parameters():
    print(name, param.grad)
> weight tensor([[-0.8400]])
  bias tensor([4.2485])

for name, param in model2.named_parameters():
    print(name, param.grad)
> weight None
  bias None
1 Like

Thank you for reply,

Now I wonder, The model 2 should be in eval() mode or not. I do not want to train this model. I am trying to distill knowledge from it to model 1. So model 1 should produce out same as model 2 for same batches.


model.eval() would disable e.g. dropout layers and would use the running stats of batchnorm layers.
Depending on your use case, you might need to call it.