Passing Params to an Optimizer


A quick question regarding the code snippet below from the PyTorch Autograd Mechanics Tutorial:

model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)

# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

What would be the difference if I were to just pass in the entire model’s parameters? E.g.:

optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

instead of

optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

I am a bit confused why this would matter, since we are setting requires_grad=False to every parameter of the model, besides the model.fc layer (because this layer is created as a new module, which has requires_grad=True by default.

Thanks and I apologize if this has been asked already,

In this example it wouldn’t change the behavior, and the optimizer will just skip all parameters without a gradient.
However, “explicit is better than implicit” according to the Python Zen, so the the first approach might be cleaner.

There are some edge cases, where e.g. a parameter was trained, then frozen and should therefore not be updated anymore.
However, if you are using weight decay, this "frozen’ parameter might still be updated.
As you can see, this is not the usual use case, but might create some headache while debugging. :wink:


Wow very interesting!

Just to confirm, this edge case you are talking about would be in favor of the 2nd option right? (i.e. use all model parameters so you have the ability to freeze them during training, but risky due to weight decay)

Thanks for the reply :slight_smile:

Yes, that’s correct. However, this edge case was an “error” in some posts and rather unwanted behavior. :wink:

1 Like

I see, thanks for the clarification :slight_smile: