My goal is the same as in this question.
Basically I want to train a CNN with batch size 16 and image_size 256. Ssince one of my GPU’s can only handle batch size of 8 with image size of 256 my idea is to split this work between 2 GPU’s so that they split batch size of 16 into 2 batches of 8.
I tried to do this with DataParallel model wrapping with no success as this model inluces additional overheads of memory which my GPU’s can’t handle.
I switched to DistributedDataParallel and I am getting the same memory error RuntimeError: CUDA out of memory.
as soon as I try to calculate predictions with y_pred = unet(x)
.
My way of splitting the dataset is same as in docs:
sampler = DistributedSampler(dataset_train, num_replicas=world_size, rank=rank, shuffle=False, drop_last=False)
dataloader = DataLoader(dataset_train, batch_size=params['batch_size'], pin_memory=pin_memory, num_workers=num_workers, drop_last=False, sampler=sampler)
My questiong is what should the parameter batch_size
of DataLoader
be?
Should it be the total batch size I want to be splitted (in my case 16) or should it be the batch size for each GPU (in my case 8)?
I tried to find the answer in docs with no luck.