Could be that I missed it but it seems like a possible reason is that you forgot to zero the gradients before/after running a batch. You only seem to do it at the start. Try adding the following INSIDE your training loop:
optimizer.zero_grad()
Does this solve your issue?
See here for an example or here for the reason why this is needed.