Suppose the following very trivial model:
class Net(torch.nn.Module): def __init__(self, img_size): super(Net, self).__init__() self.l1 = torch.nn.Linear(1, 1) self.optimizer = torch.optim.Adadelta(self.parameters()) self.loss_function = torch.nn.MSELoss() def forward(self): return self.l1(torch.tensor(, dtype=torch.float)) def backward(self, loss): self.optimizer.zero_grad() loss.backward() if not self.l1.weight.grad is None: #(*) changing the grad to 0 self.l1.weight.grad = torch.tensor(, dtype=torch.float) self.optimizer.step()
In the backward method (*) we are changing the value of the grad to 0.
My understanding was that this mean that the weight is not contributing to the loss, so, the optimizer should not optimize the weight, but it happens anyway.
Why the network keeps learning even when the grad has been set to zero?