Discussion here might be helpful.
This is likely due to some tensors/context is unintentionally created on the 1st GPU, e.g., when calling torch.cuda.empty_cache()
without a device guard. Solutions would be either 1) carefully walking though libs/codes to make sure no states leaks to cuda:0
, or 2) set CUDA_VISIBLE_DEVICES
to let each process only see one GPU.The second approach might be easier.