Error: Cuda Out of Memory after training on 2.5 million images; works find on 150K images

Sure, although in this particular case I wouldn’t suspect any of the zero_grad operations to be the culprit as they should be inplace operations that don’t allocate any memory.