Alternatively train multi task learning model in pytorch - weight updating question

linlin · September 8, 2020, 2:00pm

Thanks a lot for pointing that out. Indeed, the momentum term would bring the gradients from previous steps.

If we zero the gradient at step t, for getting weight at t+1, we still have momentum t.