I am training a classification model using 4 gpus. I see there are 3 extra processes running on gpu 0 so it wont fit a batch size larger than 2 but the others can still accept a batch size of 4. I dont understand what the 3 extra processes are and is manually setting batch size for each gpu correct?
if gpu == 0:
batch_size = 2
It seems you might have created multiple CUDA contexts on the default device (GPU0). Are you launching the script via
torchrun? If so, make sure that each script sets the proper device e.g. via
+1 to @ptrblck’s answer
You can also set
CUDA_VISIBLE_DEVICES to make sure that each process only sees one device, so that they won’t unintentionally create CUDA context on
I am using mp.spawn as suggested in other thread for single machine multi-gpu setup.