Hi! Im having a peculiar issue, I’m doing an inner optimization steps using this code:
inner_model = NN(param.detach(),self.kwargs).to(self.device).train() inner_opt = torch.optim.SGD(inner_model.model.parameters(), lr=100, momentum=.7, weight_decay=1e-6) for opt_step in range(20): out_inner = inner_model(images) loss = torch.sum(mask * F.binary_cross_entropy_with_logits(out_inner.squeeze(),inner_cur_lbl[idx].float(), reduction='none')) loss.backward() inner_opt.step() inner_opt.zero_grad()
There are few important things to point out:
1.The grads are good.
2. The weight are updating, I check them using:
inner_model.model.regressor.weight
- The output a.k.a: out_inner is the same(!) as before the updates!!
So I entered nn.Linear forward method:
def forward(self, input: Tensor) → Tensor:
return F.linear(input, self.weight, self.bias)
And checked the self.weight, they are exactly as expected (i.e they changed exactly as I saw using section 2), hence I tried to run:
return F.linear(input, self.weight, self.bias)
But I got the same unchanged results, thus I did another experiment I checked:
torch.mm(input,self.weight.T) #I have no bias
And got different results!! results that shows that self.weight has changed!,i.e
torch.mm(input,self.weight.T) !=F.linear(input, self.weight, self.bias)
I think that something I did non properly made the F.linear use an old copy of the weights, once again , this is inspite self.weight has changed.
The final experiment and maybe the most important one was running:
F.linear(input, self.weight.clone(), self.bias)
This line calculated the same result as the matmul, but not as:
F.linear(input, self.weight, self.bias)
I know its not an easy post to follow, I tried to do my best to illustrate, does anyone has any idea what is going on?
Edit-Few more hints:
1.Switching from DDP to 1 GPU didn’t help.
2. This code is under training_step method under the pytorch lightning package
3. When switching to CPU it works!!!
This is even more puzzeling! what Im missing?