Using multiple models and optimizer in training

Hello,

I’m now trying to use multiple models and optimizers in training.
During the training, it seems like the loss is not decreasing.

Can anyone know what the problem is?
The code is below

I have multiple models and optimizers, and in the for c in range(classnum):
loop, I am trying to compute the loss from each model and optimizing the model with optimizers corresponds to the model.

Note that each optimizer is different alghorithm.

models = list[mode1, model2,...., model10]
optimizers = list[optimizer1, optimizer2,...., optimizer10]

for model in models:
            model.train()
            
for step, inputs in enumerate(tqdm(train_loader)):
     inputs = inputs.to(device)
            
     for c in range(classnum):
                
          optimizers[c].zero_grad()
                
                             :
                             :
                             :
           outputs, mean, logvar = models[c].forward(each_input)
           kld, recon = loss_function(outputs, each_input, mean, logvar, each_weights)
           loss = kld + recon
        
           loss.backward()
           optimizers[c].step()

           running_loss += loss.item()

Thank you.

Hi,

This approach looks good to me. You should not have any problem with it.
Does training a single model with your loss works?

Thank you for you’re reply.
@albanD

Are you asking whether training loss decreased with single model and singe optimizer?

Yes it did

@albanD
When I look at the training loss graph, looked like the loss was decreasing.

00

Although the difference of the loss between each loss is very small…

1st epoch
 72170060.70782048
2st epoch
 72124209.65666279
3st epoch
 72124202.77555539
4st epoch
 72124197.6106462
5st epoch
 72124195.73002665
     :
     :

Ho.

So it is training just very very slowly? Have you tried different learning rate? What about only 2 models?
I don’t know what your loss is, but is it properly averaged over the samples?

I’m using Adam as an optimizer and tried several learning rate.
But nothing actually changes. And haven’t tried training with 2 models yet.

The loss function is torch.nn.BCELoss and the reason that loss is large is that because reduction is sum.

The original data’s shapes are (batch size, C, H, W) = (32, 10, 100, 100) and during the training, I’m training each channel with an independent model and optimizer. Which means the shape of input data to model is (32, 1, 100, 100).

And also original data is the output of the segmentation model, so each channel is the probability.