Ensemble Learning - Losses

PabloRR100 · September 14, 2018, 11:32pm

I am training an ensemble of Resnets on CIFAR10.
I have created a list of optimizers, to store each optimizer for each individual model since the model.parameters() must match:

names = []
optimizers = []
for i in range(ensemble_size):
    
    model = ResNet20()
    names.append(model.name + '_' + str(i+1))
    opt = optim.SGD(model.parameters(), learning_rate, momentum, weight_decay)
    optimizers.append(opt)
    
    model.to(device)
    if gpus: model = nn.DataParallel(model)
    ensemble.append(model)

My question is if I need also to create a list of loss, to apply the loss.backward() separately:

for m in models: 
    m.train()
            
for epoch in range(1, epochs+1):
                
# Training
# --------
for i, (images, labels) in enumerate(trainloader):
            
        # Load images and labels
            
        outs = [] # Initialize a list to store the output of each individial model to further averate
        for n, m in enumerate(models):
                
            ## Individual forward pass
                
            # Calculate loss for individual                
            m.zero_grad()                                     # Zero grad each individual model
            output = m(images)                            # Forward pass that model
            outs.append(output)                           # Append to list of outputs
                        
            ## Individual backwad pass
                
            loss.backward()                             #### Does loss.backward wich model should backprop? ####
            optimizers[n].step()        
                
            ## Ensemble foward pass
            
            outputs = torch.mean(torch.stack(outs), dim=0) # Average estimates
            
            # Calculate loss for ensemble
            # Calculate accuracy for ensemble

Thank you!

ptrblck · September 15, 2018, 1:21pm

In your code example it looks like you want to use the current batch sequentially on all your models.
This would mean that loss gets overwritten for each new model, so you don’t need to store it somehow.