gradients are accumulated. So in the repeat
case, what you are seeing is gradient of first call + gradient of second call.
before the line # repeat without updating the weight
, insert this call:
modL.zero_grad()
gradients are accumulated. So in the repeat
case, what you are seeing is gradient of first call + gradient of second call.
before the line # repeat without updating the weight
, insert this call:
modL.zero_grad()