Can I do a backward step in optimization?

Hello everyone!

I was thinking: is there a way to make a backward optimization step?

Say the learning rate was too high, or I have not understood what momentum is and but a nonsense value. But after having performed the optimization, I re-evaluate the loss function and decide that I’m not happy with the results.

So I want to make a step backwards, change parameters and try again, so that I would have a better optimization trajectory. Is it possible to do something like this?

For now my code is pretty basic, with the built-in ADAM:

outputs = model(images)

loss = lFun(input=outputs, target=targets)
optimizer.zero_grad()

loss.backward()
optimizer.step()

In other words is there a way of doing optimizer.backward_step()?

I don’t quite understand your question. Would you like to implement your own backward method?

The optimizer actually explicitly don’t consider the computation graph of the gradient, so you have to implement yourself.

I think he’s actually looking for a way to undo the optimizer.step() operation.

Exactly!

I see the the optimizer as a blindfolded person walking randomly in the mountains and trying to get to the lowest point. And I would like sometimes to tell him to step back because he is not going in the right direction.

Seems I will have to create my own gradient descent function to see if something comes out of this idea.

Thanks!

Actually it seems to be surprisingly easy, once you work with the state_dict instead of the optimizer. To test my idea I created a very small neural network:

class small_perceptron(nn.Module):
    def __init__(self, inp=3, mid=2, out=2):
        super(small_perceptron, self).__init__()
        self.lin1 = nn.Linear(inp, mid, bias=True)
        #self.lin2 = nn.Linear(mid, out, bias=True)
        
    def forward(self, x):
        x = self.lin1(x)
        #x = self.lin2(x)
        return x

And then performed a one step optimization process:

lr=0.01
neuronet = small_perceptron()
neuronet.cuda()
loss=torch.nn.MSELoss().cuda()

print('copy initial state to k')
k = copy.deepcopy(neuronet.state_dict()) #k contains the neuronet state before optim
print(k)

optim = torch.optim.Adam(neuronet.parameters(), lr=lr, weight_decay=0.1)

A = Variable(torch.randn(3).cuda()) #random number, after all the goal is not convergence
B = Variable(torch.randn(2).cuda())
C = neuronet(A)
print('C before optim')
print(C)

L = loss(C, B)
optim.zero_grad()
L.backward(retain_graph=True) 
optim.step()
C = neuronet(A)
print('C after optim')
print(C)#C should be different

neuronet.load_state_dict(k)
C = neuronet(A)
print('C state_dict')
print(C)#C should be the same again

And to my great pleasure
C before optim is (0.1776, 0.1483)
C after optim is (0.1989, 0.1696)
and C state_dict is (0.1776, 0.1483), the same as before optim.

So even though you can’t do optim.step_backwards(), you can do as if optim.step() never happened which is pretty much the same!

Even though you retrieve the previous model, I think you are still updating the internal state of your optimizer when calling optim.step(). Adam has running estimates of different moments which are updated at each call to step(). The same is true for e.g. SGD with momentum.

Therefore, the next call to optim.step() will not give the same results (0.1989, 0.1696) even if the loss and everything else is the same. You could try to also store the optimizer’s state_dict and reload that as part of the “back-stepping”.