Using sub-batches to avoid busting the memory?

Hi,

I think this post Why do we need to set the gradients manually to zero in pytorch? will give you all the details you need for how to do this in a memory efficient way (you want to use 2).