I am trying to write a function to test different optimizers using Rosenbrock function. The following code works (at least for some optimizers), but I don’t understand why we need to clone x and why doesn’t it work without cloning.
def rosenbrock(x):
return 100*(x[1]-x[0]**2)**2 + (1-x[0])**2
def optimize(x, optimizer, l):
if not isinstance(optimizer, torch.optim.Optimizer):
raise ValueError("You should pass an optimizer")
# replaces the parameters of a given optimizer with x
rosenbrock_test_params = torch.nn.Parameter(torch.tensor(x, dtype=torch.float, requires_grad=True))
optimizer.param_groups[0]['params'] = [rosenbrock_test_params]
positions = []
while rosenbrock(x) > l:
# ! Why did we have to clone?
x = optimizer.param_groups[0]['params'][0].clone()
y = rosenbrock(x)
positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))
y.backward()
optimizer.step()
optimizer.zero_grad()
if positions == []:
print ("Parameter was already good enough")
x = optimizer.param_groups[0]['params'][0]
y = rosenbrock(x)
positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))
return positions
if math.isnan(positions[-1][1].item()):
raise ValueError("Optimizer diverges")
positions = np.array(positions, dtype=object)
x_y, Z = positions[:,0], positions[:,1]
X = np.array([i[0] for i in x_y])
Y = np.array([i[1] for i in x_y])
return X, Y, Z
lr = 0.0001
l = 4.5
x = (-1.5,1.5)
optimizer = torch.optim.SGD([torch.tensor([])], lr=lr, momentum=0.1, nesterov=True)
X, Y, Z = optimize(x, optimizer, l)
The structure of param_group is an implementation detail of the optimizer and the way you use it might work for some but not in general.
The parameters should be passed when you initialize the optimizer as some optimizers do non-trivial work during their initialization with them.
But then how can I create a function which takes an optimizer and some parameters as input?
Let’s say in this example, I pass the optimizer with the parameter x already inside, but then how can I continue iterating over the parameters without accessing them via param_groups?
In this case, you won’t be able to pass an already created optimizer.
You can pass the construction method though optimize(x, optim.SGD, optim_args, l)
So, it looks like passing a reference is causing the need to clone. I don’t understand why though.
Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?
Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?
It definitely is. Other optimizers will just plain error out if you try to add the parameters after they are initialized. So 100% implementation detail
I tried some built-in optimizers, and they work in both cases, or give an error in both cases. A lot of them seems to be working for this specific task.
But I’d be happy if I can understand why we needed clone(), why passing a reference didn’t work. I tried to replicate this problem without using an optimizer but couldn’t.