Consider the following 2-layer NN:

```
model = nn.Sequential(OrderedDict([
('1', nn.Linear(3,3)),
('2', nn.Linear(3,3)),
]))
```

Suppose that I want to freeze the second layer, and train only the first layer.

I think there are two methods to achieve this

- Set ‘require_grad’ of the second layer to False then train

```
for param in model[1].parameters():
param.requires_grad = False
optimizer = optim.AdamW(model.parameters(), lr=0.0001)
```

- Constraint the optimizer to work with parameters of the first layer:

`optimizer = optim.AdamW(model[0].parameters(), lr=0.0001)`

Both methods yield the same value for the parameters of the first layer.

So, which method is preferred? or should we combine both:

```
for param in model[1].parameters():
param.requires_grad = False
optimizer = optim.AdamW(model[0].parameters(), lr=0.0001)
```