Modifying Weight of Pretrained Model in `forward()` Makes Training Slowing

Hi @ptrblck , from another discussion I read that you suggest using no_grad() context when modifying model parameters. But, in my case, here I didn’t use it because I want B to be updated by the optimizer through add_ operation.

I tried to use no_grad(), and it made the training time more stable (not slowing), but as I expected, it made B not update. I am not sure which one is close to the solution.