Split loss across two different models and calculate gradients


I have two separate models, each of which outputs a probability distribution. I weigh the probability distribution of each model by a dynamic scaling factor and calculate the outputs. I compare the outputs against the true labels and generate the overall loss.

However, I want to split the loss across both models in the same scaling factor, essentially a weighted average. Once each model receives the scaled loss, it backpropagates and computes the gradients. The learning factors and optimisers can be different for both models.

At a high level, the (pseudo)code looks like this -

import torch

modelA = Model(paramsA)
modelB = Model(paramsB)

optimiserA = getOptimiser(modelA.parameters(), lr=lr_A)
optimiserB = getOptimiser(modelB.parameters(), lr=lr_B)

outputs = modelA(inputs)*scalingFactor_A + modelB(inputs)*scalingFactor_B
loss = criterion(outputs, labels)

# unsure how exactly loss.backward() and optimiserA.step() and optimiserB.step() should work

Any help is appreciated. Thanks!

Your pseudo code looks alright and loss.backward() will calculate the gradients for both outputs.
For Autograd it doesn’t matter if the scaling and addition is performed “outside of the model” or in the forward pass and it’s just a mathematical operation which loss.backward() will backpropagate through.

optimizerX.step() will update the passed parameters using their .grad attribute.