I am encountering a bizarre CUDA memory allocation error on Linux (and not Windows). I have two different machines - a Windows machine with an NVIDIA GeForce GTX 1050 with 4 GB of RAM and a Google Cloud NVIDIA Tesla T4 with 16 GB of RAM. My model (linked below) is a transformer which have a reputation for chewing up a lot of memory (see the Reformer paper for example).
On my Windows machine on the latest PyTorch version I can run the model fine in GPU memory for most training examples with a batch size of 4 (the training examples are variable sized). When I take the exact same model and training data over to the Linux VM I begin getting CUDA out of memory errors after around 40 steps. Increasing the batch size decreases the number of steps needed to obtain an out of memory error. This would indicate a memory leak to me, so I went ahead and made sure I wasn’t holding onto any references to allow the garbage collector to do its thing. I have added calls for clearing the GPU cache, running the Python garbage collector, etc. which doesn’t seem to help.
The super weird part is if I insert a call to
print(torch.cuda.memory_summary(device=None, abbreviated=True))
inside my exception handler for the CUDA out of memory error the issue goes away! I am even able to bump the batch size to 12 on the Linux instance, which is what I would expect from a system with higher GPU memory.
Model is here:
Any ideas?