Help with propagating a flexible number of models in ensembles

I’m using PyTorch for a research project that uses model ensembles. My code is based off of this Crystal Graph Network cgcnn project. My modification has involved propagating and pruning an ensemble of models. To make the code as adaptable as possible, I made a “ModelInstance” class that creates new child models using a parent model and parent optimizer as a template. I also had to bundle the scheduler inside the class, as it seems optimizer steps weren’t being tracked through model propagation(I couldn’t just load the state dicts of propagated optimizers into a head optimizer bound to a single scheduler).

#My ModelInstance template
class ModelInstance():
    #All models in the program are descendant from the first b_model and optimizer0; models are copied using this template
    def __init__(self,index, parent_model, parent_optim, parent_scheduler):
        # Create a new instance of CrystalGraphConvNet
        self.model = copy.deepcopy(parent_model)

        if args.cuda:

        self.index = index
        self.optimizer = self.create_new_optim(self.model, parent_optim)
        self.scheduler = copy.deepcopy(parent_scheduler)
    #Copies optimizer from parent
    def create_new_optim(self,model, parent_optim):
        parent_lr = parent_optim.param_groups[0]["lr"]
        parent_momen = parent_optim.param_groups[0]["momentum"]
        parent_wd = parent_optim.param_groups[0]["weight_decay"]
        return optim.SGD(self.model.parameters(), lr=parent_lr, momentum=parent_momen, weight_decay=parent_wd)

#Standard way of running a step
    def run_step(self,input_var,target,criterion):
        output = self.model(*input_var)
        loss = criterion(output, target)
        return output, loss

When I do a first propagation from the first ModelInstance template for the whole program, model0, I start getting very bad training results. When I refactor the code to do no propagation but use the original model0 object, I get the same normal results as the vanilla cgcnn project I was building on top off. I have my code currently setup to only propagate 1 model each step, which always keeps the number of models at 1 for comparison with the vanilla code.

#How the template model, optimizer, scheduler get formed(the following all happens in the main() function)
 b_model0 = CrystalGraphConvNet(orig_atom_fea_len, nbr_fea_len,
                                classification=True if args.task ==
                                                       'classification' else False)
optimizer0 = optim.SGD(b_model0.parameters(),,
scheduler0 = MultiStepLR(optimizer0, milestones=args.lr_milestones,

#model0 is the first ModelInstance object and will be the template for the first propagation in the train function
model0 = ModelInstance(0, b_model0, optimizer0, scheduler0)

for epoch in range(args.start_epoch, args.epochs):
    #ModelInstance model0 gets sent into train() each epoch
    train(train_loader, model0, criterion, epoch, normalizer)

#First propagation code in train(), propagate_num is number of models propagated from template.
#propagate_num is 1 for my reproducibility testing

model_list = []
for i in range(propagate_num):
    #Training goes haywire if below is uncommented 
    # model_list.append(ModelInstance(i, model0.model, model0.optimizer, model0.scheduler))
    #But will work perfectly if hardcoded to just append the original model0

    model_i = model_list[i]
    #Code for loading and normalizing points from train_loader omitted
    init_output, init_loss = model_i.run_step(init_input, init_target, criterion)
#More training code happens afterwards, but error seems to occur at this propagation "initialization step"

I highly suspect my copy.deepcopy() method to propagate new objects in ModelInstances is malfunctioning, but I am not sure what alternatives I have. I have tried .clone().detach(), but it doesn’t work on the CrystalGraphConvNet objects that my project has to use(also wouldn’t work for copying the scheduler). I am testing creating blank object templates and loading the parent object’s state dict into them e.g.

self.model = CrystalGraphConvNet(parameters) 

but my initial results are not reassuring. What is the cleanest way to make fresh copies of models, optimizers, and schedulers for model ensembles in PyTorch? Ideally, the number of models propagated per step should be easy to change (e.g. it’s easy to just hardcode 2 models and 2 optimizers for propagating 2 models per step, but it’s not flexible if one wants to test propagating 3 or 4 models per step).

Fix: added a method to ModelInstance that also copies the internal state of the optimizer

    def copy_optimizer_state(self, source_optimizer, target_optimizer):
        for target_group, source_group in zip(target_optimizer.param_groups, source_optimizer.param_groups):
            for target_p, source_p in zip(target_group['params'], source_group['params']):
                target_state = target_optimizer.state[target_p]
                source_state = source_optimizer.state[source_p]

self.copy_optimizer_state(parent_optim, self.optimizer)