Alternatively train multi task learning model in pytorch - weight updating question

Thanks a lot for pointing that out. Indeed, the momentum term would bring the gradients from previous steps.
image

If we zero the gradient at step t, for getting weight at t+1, we still have momentum t.

1 Like