I have a network in which I want to update parameters of only a fixed set of layers (say layer3, layer7). So we basically set the requires_grad attribute to False for parameters of other layers.
Now, is is okay to construct optimizer as
optimizer = optim.SGD(net.parameters(), lr=0.01) or do I have to make an iterable of layer3, layer7 parameters as
optimizer = optim.SGD([net.layer3.parameters(),net.layer7.parameters()], lr = 0.01) . I would assume that first option should be fine because gradients are not being computed for other layers in any case. So is there any difference ?