Optimizer construction for updating specific layers

akr90 · July 8, 2017, 1:30am

I have a network in which I want to update parameters of only a fixed set of layers (say layer3, layer7). So we basically set the requires_grad attribute to False for parameters of other layers.

Now, is is okay to construct optimizer as
optimizer = optim.SGD(net.parameters(), lr=0.01) or do I have to make an iterable of layer3, layer7 parameters as

optimizer = optim.SGD([net.layer3.parameters(),net.layer7.parameters()], lr = 0.01) . I would assume that first option should be fine because gradients are not being computed for other layers in any case. So is there any difference ?

akr90 · July 10, 2017, 7:42pm

Found out that you cannot really do the option 1

chsasank · July 11, 2017, 3:15am

I’d think this should work fine
optimizer = optim.SGD(list(net.layer3.parameters()) + list(net.layer7.parameters()), lr = 0.01) (despite the extra computation if you didn’t do require_grad=False)