Weights don't update without using clone

I am trying to write a function to test different optimizers using Rosenbrock function. The following code works (at least for some optimizers), but I don’t understand why we need to clone x and why doesn’t it work without cloning.

def rosenbrock(x):
	return 100*(x[1]-x[0]**2)**2 + (1-x[0])**2

def optimize(x, optimizer, l):

	if not isinstance(optimizer, torch.optim.Optimizer):
		raise ValueError("You should pass an optimizer")
	# replaces the parameters of a given optimizer with x
	rosenbrock_test_params = torch.nn.Parameter(torch.tensor(x, dtype=torch.float, requires_grad=True))
	optimizer.param_groups[0]['params'] = [rosenbrock_test_params]

	positions = []

	while rosenbrock(x) > l:
		# ! Why did we have to clone?
		x = optimizer.param_groups[0]['params'][0].clone() 
		y = rosenbrock(x)
		positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))

	if positions == []:
		print ("Parameter was already good enough")
		x = optimizer.param_groups[0]['params'][0]
		y = rosenbrock(x)
		positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))
		return positions

	if math.isnan(positions[-1][1].item()):
		raise ValueError("Optimizer diverges")

	positions = np.array(positions, dtype=object)
	x_y, Z = positions[:,0], positions[:,1]
	X = np.array([i[0] for i in x_y])
	Y = np.array([i[1] for i in x_y])
	return X, Y, Z

lr = 0.0001
l = 4.5
x = (-1.5,1.5)

optimizer = torch.optim.SGD([torch.tensor([])], lr=lr, momentum=0.1, nesterov=True)
X, Y, Z = optimize(x, optimizer, l)

Any help would be appreciated!


The structure of param_group is an implementation detail of the optimizer and the way you use it might work for some but not in general.
The parameters should be passed when you initialize the optimizer as some optimizers do non-trivial work during their initialization with them.

Thanks for the reply!

But then how can I create a function which takes an optimizer and some parameters as input?
Let’s say in this example, I pass the optimizer with the parameter x already inside, but then how can I continue iterating over the parameters without accessing them via param_groups?

In this case, you won’t be able to pass an already created optimizer.
You can pass the construction method though optimize(x, optim.SGD, optim_args, l)

Thank you, indeed passing optim.SGD or more simply, calling

x = torch.nn.Parameter(torch.tensor([-1.5,1.5], dtype=torch.float, requires_grad=True))
optimizer = torch.optim.SGD([x], lr=lr, momentum=0.1, nesterov=True)

and removing .clone(), instead of calling it with

x = (-1.5,1.5)
optimizer = torch.optim.SGD([torch.tensor([])], lr=lr, momentum=0.1, nesterov=True)

and setting the parameter manually, worked.

Even a more simple solution worked, which is removing .clone() and removing rosenbrock_test_params and directly setting the parameter with

optimizer.param_groups[0]['params'] = [torch.nn.Parameter(torch.tensor(x, dtype=torch.float, requires_grad=True))]

So, it looks like passing a reference is causing the need to clone. I don’t understand why though.
Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?

Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?

It definitely is. Other optimizers will just plain error out if you try to add the parameters after they are initialized. So 100% implementation detail :slight_smile:

I tried some built-in optimizers, and they work in both cases, or give an error in both cases. A lot of them seems to be working for this specific task.
But I’d be happy if I can understand why we needed clone(), why passing a reference didn’t work. I tried to replicate this problem without using an optimizer but couldn’t.

I am not sure why the .clone() changes the behavior to be honest, You most likely will have to dive into the optimizer implementation to find out :confused:

1 Like