Using a single optimizer object for multiple models

kvtzn · March 4, 2023, 8:09pm

Hi,

I was wondering if using a single optimizer object to train multiple models is plausible. Particularly, I would like to use a torch.optim.SGD object on models deriving from torch.nn.Module.

I would like to make sure that the models are optimized independently. Is passing the parameters of each model as a separate parameter group correct? Will the gradient be calculated for each model separately, or for common minimum points of all models’ parameters? I checked the code for SGD but I’m not entirely sure.

ptrblck · March 4, 2023, 10:29pm

Yes, passing the parameters from each model into a new param_group allows you to specify the optimizer parameters separately for each group, such as the learning rate. If you don’t want to use different arguments in your optimizer for each model you can also pass all parameters into a single param_group.

The optimizer will not calculate the gradients, but will use the already calculated .grad attribute to update the parameters. The gradients are calculated during the backward call and which gradients are computed depends on the forward pass (in particular which parameters were used to calculate the loss or model output).
Note that some optimizers use internal running stats which will update parameters even with a zero gradient. If this behavior is not desired, you could delete the gradients via optimizer.zero_grad(set_to_none=True).

kvtzn · March 4, 2023, 10:31pm

I see, thank you for the detailed explanation!