Custom loss using output of other model

sakh251 · July 9, 2021, 9:53am

I have a loss function for a Model (model 1) which has a term calculating based on output of other model (model 2). So model 2 is not trained. I want to disable gard to get the output of model 2 faster. but I want to enable it for model 1.
input = …
output1 = model1(input)

#disable gard
output2 = model2(input)

#enable again
loss = criterion(output1, target) + function(output1,output2)

Thank you

ptrblck · July 9, 2021, 8:02pm

You could disable the gradient calculation via with torch.no_grad() as seen here:

model1 = nn.Linear(1, 1)
model2 = nn.Linear(1, 1)

input = torch.randn(1, 1)
target = torch.randn(1, 1)
criterion = nn.MSELoss()
function = criterion

output1 = model1(input)

with torch.no_grad():
    output2 = model2(input)


loss = criterion(output1, target) + function(output1,output2)
loss.backward()

for name, param in model1.named_parameters():
    print(name, param.grad)
> weight tensor([[-0.8400]])
  bias tensor([4.2485])

for name, param in model2.named_parameters():
    print(name, param.grad)
> weight None
  bias None

sakh251 · July 15, 2021, 9:24am

Thank you for reply,

Now I wonder, The model 2 should be in eval() mode or not. I do not want to train this model. I am trying to distill knowledge from it to model 1. So model 1 should produce out same as model 2 for same batches.

Bests

ptrblck · July 15, 2021, 7:06pm

model.eval() would disable e.g. dropout layers and would use the running stats of batchnorm layers.
Depending on your use case, you might need to call it.