Requires_grad or two different optimizers


I have a network which is composed of two sequential subnetworks. I wanted to perform partial training on each of these subnetworks (i.e updating the parameters of the first subnetwork using first part of data while freezing the second subnetwork and the converse for the second subnetwork and second part of data). As far as I know, there are two options for updating the parameters:

1- making the requires_grad of the respective subnetwork parameters to False and update the other
2- create two different optimizers for each subnetworks’ parameters and perform optimization on the respective optimizer.

I wanted to know if any of these two approaches are possible in pytorch? and which one do you suggest for this task and what’s the difference between them?

I would go with option1 , freeze what you don’t wanna update with this

for param in child.parameters():
    param.requires_grad = False

and then

optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, whole_model.parameters()),, amsgrad=True)