Hi,

I have two separate models, each of which outputs a probability distribution. I weigh the probability distribution of each model by a dynamic scaling factor and calculate the outputs. I compare the outputs against the true labels and generate the overall loss.

However, I want to split the loss across both models in the same scaling factor, essentially a weighted average. Once each model receives the scaled loss, it backpropagates and computes the gradients. The learning factors and optimisers can be different for both models.

At a high level, the (pseudo)code looks like this -

```
import torch
modelA = Model(paramsA)
modelB = Model(paramsB)
optimiserA = getOptimiser(modelA.parameters(), lr=lr_A)
optimiserB = getOptimiser(modelB.parameters(), lr=lr_B)
outputs = modelA(inputs)*scalingFactor_A + modelB(inputs)*scalingFactor_B
loss = criterion(outputs, labels)
# unsure how exactly loss.backward() and optimiserA.step() and optimiserB.step() should work
```

Any help is appreciated. Thanks!