I am trying to use 2 GPU’s for training.
I followed this tutorial and tested 1 GPU with batch size of 8 and image size of 512 and got the output:
In Model: input size torch.Size([8, 1, 512, 512]) output size torch.Size([8, 1, 512, 512])
This is maximum capacity for 1 of my GPU’s. My idea is to train on 2 GPU’s and increase the batch size to 16 so each of my GPU’s can get batches of 8.
When I try to run this setup about 10 iterations run smooth with expected output
In Model: input size torch.Size([8, 1, 512, 512]) output size torch.Size([8, 1, 512, 512])
In Model: input size torch.Size([8, 1, 512, 512]) output size torch.Size([8, 1, 512, 512])
but then I get an error:
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 7.80 GiB total capacity; 5.51 GiB already allocated; 191.31 MiB free;
This error appears when loss.backward()
is called during the training process.
This is puzzling me since by my knowledge there should be enough VRAM in one card to withstand batch size of 8 with image size of 512.
Is there some extra memory needed when using 2 GPU’s that I am not aware of so this task is not so straightforward?
Just to mention I have tested running my training on 2 GPUs with batch size 8 and image size 512 and everything works fine: GPU’s split the batch and each gets 4 images with output:
In Model: input size torch.Size([4, 1, 512, 512]) output size torch.Size([4, 1, 512, 512])
In Model: input size torch.Size([4, 1, 512, 512]) output size torch.Size([4, 1, 512, 512])
Why wouldn’t this work when I increase batch size to 16?