Hi,
I am having a similar issue here. If I use 1 GPU with batch_size 128, the job works fine. When I use 2 GPUs with batch_size 256 I am getting this error:
‘RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 1; 15.75 GiB total capacity…’
I am using DistributedDataParallel as per below:
‘torch.distributed.init_process_group(backend=‘nccl’)
model = nn.parallel.DistributedDataParallel(model, device_ids = list(range(n_gpu))[::-1],find_unused_parameters=True)’
and I am running the job with : python3 -m torch.distributed.launch main.py
Thanks in advanced
Giorgio