A question concerning batchsize and multiple GPUs in Pytorch

Hi,
I am having a similar issue here. If I use 1 GPU with batch_size 128, the job works fine. When I use 2 GPUs with batch_size 256 I am getting this error:
‘RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 1; 15.75 GiB total capacity…’

I am using DistributedDataParallel as per below:

‘torch.distributed.init_process_group(backend=‘nccl’)
model = nn.parallel.DistributedDataParallel(model, device_ids = list(range(n_gpu))[::-1],find_unused_parameters=True)’

and I am running the job with : python3 -m torch.distributed.launch main.py

Thanks in advanced
Giorgio