model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

What’s difference if I use optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9).

Also if I have a network with conv1-ReLU-conv2-ReLU-conv3, then I want to fix the weights of conv2.

# set the weights of conv2 requires_grad False.
# Then put the weights of whole network in SGD ?

Setting requires grad to False means that no gradients will be computed for this Tensor. And so your_tensor.grad will contain it’s previous value (None or a Tensor) and will never be updated.

The thing is that when you have momentum (or l2 normalization), even having a gradient of 0 will make your weights change. So if you don’t want to optimize these weights, you want to exclude them from the optimizer to be sure (note that some optimizer will completely ignore tensors that have a grad field set to None and what you proposed will work. But this is not True for all optimizers and thus not safe).