Optimizer won't update weights

Liviu_Rotaru · April 8, 2020, 1:28pm

So I’m trying to write my own optimizer and I found out that the parameters are’t being updated. I tried following the example of optim.SGD https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py .

class AGCD(Optimizer):
def init(self, params, theta=1):
defaults = dict(theta=theta)
super(AGCD, self).init(params, defaults)
self.x_update_params = deepcopy(self.param_groups)
self.z_update_params = deepcopy(self.param_groups)
self.theta_update_params = deepcopy(self.param_groups)
for group in self.theta_update_params:
for i in range(len(group[‘params’])):
group[‘params’][i] = torch.ones(1)

def __setstate__(self, state):
	super(AGCD, self).__setstate__(state)
@torch.no_grad()
def step(self, closure=None):
	loss = None
	if closure is not None:
		with torch.enable_grad():
			loss=closure()


	for group_y, group_x, group_z, group_theta in zip(self.param_groups,
						self.x_update_params,
						self.z_update_params,
						self.theta_update_params):
		
		for y, x, z, theta in zip(group_y['params'],
						group_x['params'],
						group_z['params'],
						group_theta['params']):
			if y.grad is None:
				continue
		
			d_y = y.grad
			j1 = j2 = my_argmax(d_y)
			y = (x*(1-theta).expand_as(x)).add(z*theta.expand_as(z)) #use pytorch functions for math
			temp = torch.zeros(x.size())
			#print(temp.size())
			#print(x.size())
			#print(d_y.size())
			temp[j1] = 1*0.2*x[j1]
			x = y.sub(temp)
			temp = torch.zeros(z.size())
			temp[j2] = 5*0.2*z[j2]
			#print(theta)
			theta = update_theta(theta)
			#print(theta)
				
	return loss

BTW this is my first time posting on a forum so please excuse any sins.
Edit: To be more specific, the parameters used to update aren’t themselves updated (theta stays at the value of 1 at all times).
Edit2: I think it’s because I have to use pytorch operations instead of using “=” so that the object gets modified, not just the copy of that object

JuanFMontesinos · April 8, 2020, 1:50pm

Hi, just shot coment, you are doing a deep copy of param groups. They no longer point to the memory address of the model weights.

Liviu_Rotaru · April 8, 2020, 2:00pm

Thank you @JuanFMontesinos but that’s why i’m doing the deepcopy. The copies (x,z and theta) are used to update the orginal parameters (y) in the step(), and they each need to be their own separate thing. Everytime I call step() each of the parameter groups should be updated but they stay the same.

JuanFMontesinos · April 8, 2020, 2:49pm

I’m a bit loss with your code
What is see is that you are assigning y for example here

y = (x*(1-theta).expand_as(x)).add(z*theta.expand_as(z)) #use pytorch functions for math

Thus y is no longer pointing to the deepcopy you did but a inner variable in the scope.

x = y.sub(temp)

same there, you are reasigning x, thus, overwritting param group again.
You can wheter use in-place operators (in pytorch some functions has 2 versions w and w/o underscore, the second one is in-place.)
Example:

a=5 # a tensor
a.add(4) 
#a is still 5
a.add_(4)
#a is now 9

or just assign y like y.grad=whatever_you_want

Liviu_Rotaru · April 8, 2020, 3:13pm

Well, y should pointing to the inner variable. But every step I update the inner variables based on the deepcopies. Anyways, I found the problem. Thanks a lot!!!
The thing is I was using. eg:
y = x+3
instead of
x.copy_(x+3)
so none of the variables were beeing updated.