I’m trying to debug a situation where running the same model/training code on a new instance has about 3x the GPU memory usage.
Instance 1:
P2, Cuda 10.2 (driver 440.33.01), Pytorch 1.5
Instance 2:
P2 (Sagemaker), Cuda 11.0 (driver 450.51.05), Pytorch 1.6
The model I’m running is a LSTM. On instance 1, training with a batch size of 512 uses 2.3 GB of GPU memory. On instance 2, training with a batch size of 512 uses 9.6 GB of GPU memory.
Both instances are running the same code (same commit) and the same model.
Does anyone know how I might go about debugging this?