Cuda error with batch size 16 but works with batch size 32

Hi everyone,

For a model, I wasn’t able to run training with batch size 16, so I was running it with batch size 8. Today I tried different batch sizes to see whether the memory was the issue since I was getting RuntimeError: Creating MTGP constants failed. at /pytorch/aten/src/THC/THCTensorRandom.cu:34. However when I tried batch size 32, it worked! Does anyone know why might be the problem with batch size = 16 and 15?

Additional info:
nvidia-smi for batch size = 14: 6393/22919 MBs used
nvidia-smi for batch size = 15: 11195/22919 MBs used
nvidia-smi for batch size = 16: 11237/22919 MBs used
nvidia-smi for batch size = 32: 13867/22919 MBs used

Thanks!