What happens when you zero_grad before the forward? I.e., res101.zero_grad() loss1= res101(normed_input).sum()