I have written a simple sgd optimiser in 1.1.4 as:
class Optimiser():
def __init__(self,params,lr):
self.params=params
self.lr=lr
def step(self):
for p in self.params:
p.data.add_(-self.lr, p.grad)
def zero_grad(self):
for p in self.params:p.grad.zero_()
My train loop is :
sgd=Optimiser(model.parameters(),0.01)
for epoch in range(epochs):
for i,data in enumerate(trainloader):
xb = data[0]
yb = data[1]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
sgd.step()
sgd.zero_grad()
And the loss does not decrease at all. In fact I put a print statement in
def step(self):
for p in self.params:
p.data.add_(-self.lr, p.grad)
print('Here')
And âHereâ was only printed in the first step and stopped being printed after than meaning somehow I stop entering the update params step. I donât understand why this would happen.
If instead of opt.step() I call
for p in model.parameters():p.data.add_(-0.01, p.grad)
My model does start learning. So what is happening?
I would say that itâs not even entering because you are exhausting the iterator. Pass the model and call model.parameters() inside step
But pytorchâs optimisers take in model.parameters() and seem to work. Why would mine not?
Because you are not doing the same than them. The code is way different. Itâs not like because having the same input it will work the same way.
In simple words, a generator âextractsâ parameters (internally) on-demand. Once all the parameters have been extracted the generator is empty.
That leads to the fact:
for p in self.params:
that here self.params is empty and thatâs why here is not printed.
def __init__(self, params, defaults):
torch._C._log_api_usage_once("python.optimizer")
self.defaults = defaults
if isinstance(params, torch.Tensor):
raise TypeError("params argument given to the optimizer should be "
"an iterable of Tensors or dicts, but got " +
torch.typename(params))
self.state = defaultdict(dict)
self.param_groups = []
param_groups = list(params) #HERE THEY ARE CONVERTING THE GENERATOR INTO A LIST, GENERATOR IS ITERATED AND STORED IN THAT LIST
if len(param_groups) == 0:
raise ValueError("optimizer got an empty parameter list")
if not isinstance(param_groups[0], dict):
param_groups = [{'params': param_groups}]
for param_group in param_groups:
self.add_param_group(param_group)
First of all they are saving the iterator in a list. List are iterables which are restarted once they are exhausted.
Generators (which is what you are using interally calling model.parameters()) are iterables which remains empty once iterated.
You can google differences between generators and lists. Besides, you will further find a bug if a gradient is None. Something line NoneType has no attribute data.
1 Like
Ah, thanks. I though it would create a new generator in every instance.