alihasn
1
Say that we have a simple model:

```
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(2,4)
self.fc2 = nn.Linear(4,4)
self.fc3 = nn.Linear(4,2)
def forward(self, x):
x = F.tanh(self.fc1(x))
x = F.tanh(self.fc2(x))
x = self.fc3(x)
x = torch.sigmoid(x)
return x
model = Net()
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(),lr=LEARNING_RATE,momentum=MOMENTUM,weight_decay=L2_REG)
```

and we do some operation (maybe a normalization of the weights of the last layer)

```
model.fc3.weight = nn.Parameter(model.fc3.weight/torch.max(model.fc3.weight))
```

Would this last operation be traced by the optimizer and when I call

```
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

will the appropriate gradients and optimization step be taken?

ptrblck
2
No as seen using your code snippet:

```
model = Net()
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(),lr=1.)
model.fc3.weight = nn.Parameter(model.fc3.weight/torch.max(model.fc3.weight))
out = model(torch.randn(1, 2))
out.mean().backward()
print(model.fc3.weight)
> Parameter containing:
tensor([[ 0.9226, -0.6592, -0.6971, 0.8911],
[-0.8582, 0.9190, 0.2713, 1.0000]], requires_grad=True)
optimizer.step()
print(model.fc3.weight)
> Parameter containing:
tensor([[ 0.9226, -0.6592, -0.6971, 0.8911],
[-0.8582, 0.9190, 0.2713, 1.0000]], requires_grad=True)
print(model.fc3.weight.grad)
> tensor([[ 0.0622, -0.0140, -0.0225, -0.0629],
[ 0.0505, -0.0114, -0.0183, -0.0511]])
optimizer.zero_grad()
print(model.fc3.weight.grad)
> tensor([[ 0.0622, -0.0140, -0.0225, -0.0629],
[ 0.0505, -0.0114, -0.0183, -0.0511]])
```

You would have to add the new parameter to the optimizer again.

1 Like

alihasn
3
I appreciate you taking the time to answer. Thanks!