Do optimizers track tensors which are updated even after the forward pass through the network?

Say that we have a simple model:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__() 
        self.fc1 = nn.Linear(2,4) 
        self.fc2 = nn.Linear(4,4)
        self.fc3 = nn.Linear(4,2) 

    def forward(self, x):
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        x = self.fc3(x)
        x = torch.sigmoid(x)
        return x

model = Net()
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(),lr=LEARNING_RATE,momentum=MOMENTUM,weight_decay=L2_REG)

and we do some operation (maybe a normalization of the weights of the last layer)

model.fc3.weight = nn.Parameter(model.fc3.weight/torch.max(model.fc3.weight))

Would this last operation be traced by the optimizer and when I call

optimizer.zero_grad()
loss.backward()
optimizer.step()

will the appropriate gradients and optimization step be taken?

No as seen using your code snippet:

model = Net()
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(),lr=1.)

model.fc3.weight = nn.Parameter(model.fc3.weight/torch.max(model.fc3.weight))

out = model(torch.randn(1, 2))
out.mean().backward()

print(model.fc3.weight)
> Parameter containing:
tensor([[ 0.9226, -0.6592, -0.6971,  0.8911],
        [-0.8582,  0.9190,  0.2713,  1.0000]], requires_grad=True)

optimizer.step()
print(model.fc3.weight)
> Parameter containing:
tensor([[ 0.9226, -0.6592, -0.6971,  0.8911],
        [-0.8582,  0.9190,  0.2713,  1.0000]], requires_grad=True)

print(model.fc3.weight.grad)
> tensor([[ 0.0622, -0.0140, -0.0225, -0.0629],
        [ 0.0505, -0.0114, -0.0183, -0.0511]])

optimizer.zero_grad()
print(model.fc3.weight.grad)
> tensor([[ 0.0622, -0.0140, -0.0225, -0.0629],
        [ 0.0505, -0.0114, -0.0183, -0.0511]])

You would have to add the new parameter to the optimizer again.

1 Like

I appreciate you taking the time to answer. Thanks!