Hello,
Consider the following 2-layer NN:
model = nn.Sequential(OrderedDict([
('1', nn.Linear(3,3)),
('2', nn.Linear(3,3)),
]))
Suppose that I want to freeze the second layer, and train only the first layer.
I think there are two methods to achieve this
- Set ‘require_grad’ of the second layer to False then train
for param in model[1].parameters():
param.requires_grad = False
optimizer = optim.AdamW(model.parameters(), lr=0.0001)
- Constraint the optimizer to work with parameters of the first layer:
optimizer = optim.AdamW(model[0].parameters(), lr=0.0001)
Both methods yield the same value for the parameters of the first layer.
So, which method is preferred? or should we combine both:
for param in model[1].parameters():
param.requires_grad = False
optimizer = optim.AdamW(model[0].parameters(), lr=0.0001)