Weights don't update without using clone

Emre_Yalcinoglu · December 7, 2020, 9:39pm

I am trying to write a function to test different optimizers using Rosenbrock function. The following code works (at least for some optimizers), but I don’t understand why we need to clone x and why doesn’t it work without cloning.

def rosenbrock(x):
	return 100*(x[1]-x[0]**2)**2 + (1-x[0])**2


def optimize(x, optimizer, l):

	if not isinstance(optimizer, torch.optim.Optimizer):
		raise ValueError("You should pass an optimizer")
    
	# replaces the parameters of a given optimizer with x
	rosenbrock_test_params = torch.nn.Parameter(torch.tensor(x, dtype=torch.float, requires_grad=True))
	optimizer.param_groups[0]['params'] = [rosenbrock_test_params]

	positions = []

	while rosenbrock(x) > l:
		# ! Why did we have to clone?
		x = optimizer.param_groups[0]['params'][0].clone() 
		y = rosenbrock(x)
        
		positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))
        
		y.backward()
		optimizer.step()
		optimizer.zero_grad()

	if positions == []:
		print ("Parameter was already good enough")
		x = optimizer.param_groups[0]['params'][0]
		y = rosenbrock(x)
		positions.append((x.detach().numpy().astype(np.float), y.detach().numpy().astype(np.float)))
		return positions

	if math.isnan(positions[-1][1].item()):
		raise ValueError("Optimizer diverges")

	positions = np.array(positions, dtype=object)
	x_y, Z = positions[:,0], positions[:,1]
	X = np.array([i[0] for i in x_y])
	Y = np.array([i[1] for i in x_y])
    
	return X, Y, Z

lr = 0.0001
l = 4.5
x = (-1.5,1.5)

optimizer = torch.optim.SGD([torch.tensor([])], lr=lr, momentum=0.1, nesterov=True)
X, Y, Z = optimize(x, optimizer, l)

Any help would be appreciated!

albanD · December 8, 2020, 5:22pm

Hi,

The structure of param_group is an implementation detail of the optimizer and the way you use it might work for some but not in general.
The parameters should be passed when you initialize the optimizer as some optimizers do non-trivial work during their initialization with them.

Emre_Yalcinoglu · December 8, 2020, 5:50pm

Thanks for the reply!

But then how can I create a function which takes an optimizer and some parameters as input?
Let’s say in this example, I pass the optimizer with the parameter x already inside, but then how can I continue iterating over the parameters without accessing them via param_groups?

albanD · December 8, 2020, 6:49pm

In this case, you won’t be able to pass an already created optimizer.
You can pass the construction method though optimize(x, optim.SGD, optim_args, l)

Emre_Yalcinoglu · December 9, 2020, 5:44pm

Thank you, indeed passing optim.SGD or more simply, calling

x = torch.nn.Parameter(torch.tensor([-1.5,1.5], dtype=torch.float, requires_grad=True))
optimizer = torch.optim.SGD([x], lr=lr, momentum=0.1, nesterov=True)

and removing .clone(), instead of calling it with

x = (-1.5,1.5)
optimizer = torch.optim.SGD([torch.tensor([])], lr=lr, momentum=0.1, nesterov=True)

and setting the parameter manually, worked.

Even a more simple solution worked, which is removing .clone() and removing rosenbrock_test_params and directly setting the parameter with

optimizer.param_groups[0]['params'] = [torch.nn.Parameter(torch.tensor(x, dtype=torch.float, requires_grad=True))]

So, it looks like passing a reference is causing the need to clone. I don’t understand why though.
Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?

albanD · December 9, 2020, 5:45pm

Any idea about that, or do you think that it’s still an implementation detail depending on the optimizer?

It definitely is. Other optimizers will just plain error out if you try to add the parameters after they are initialized. So 100% implementation detail

Emre_Yalcinoglu · December 9, 2020, 7:06pm

I tried some built-in optimizers, and they work in both cases, or give an error in both cases. A lot of them seems to be working for this specific task.
But I’d be happy if I can understand why we needed clone(), why passing a reference didn’t work. I tried to replicate this problem without using an optimizer but couldn’t.

albanD · December 10, 2020, 1:14am

I am not sure why the .clone() changes the behavior to be honest, You most likely will have to dive into the optimizer implementation to find out