The GPU type is V100 with 16G memory. The trained model is our internal model. For single GPU training, it can work well.
I have also tried it in several different clusters. For another two clusters, which have the GPUs of P100 and V100 respectively, the training can work well.
I have also tried to use smaller batch size, even though our GPU memory is enough for the normal batch size. But for smaller batch size, it also raised the CUDA out of memory error.
I have tried to use smaller input image size, and it can works well.
Hi I am facing the same issue: RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 15.75 GiB total capacity; 6.25 GiB already allocated; 8.44 GiB free; 17.78 MiB cached)