Im very new to deeplearning and am trying to understand how to use data parallel with my semantic segmentation training work.
My data loader gives me a batch of 16 files, of both the input image and the ground truth image. I understand I need to transfer the files into the GPU and then train the model. The following code looks right to me, and when i run it I can see 3 GPUS of 16GB being used. soon after though it bombs, saying out of memory. Is the code below right?
One thing Im confused about is whether when training the model, the entire batch can be sent in at one time, or if we need to send each image one by one.We used to send each image one by one using a custom dataloader and that worked but was very slow. Hence this attempt of speeding training using data Parallel.
These are 3 16GB GPUs.
for epoch in range(0, num_epochs): optimizer = get_optimizer(trainable_model, epoch) # optimizer for current epoch total_train_loss = 0 for i_batch, sample_batched in tqdm(enumerate(training_generator)): optimizer.zero_grad() rgb,mask = sample_batched var_rgb = Variable(rgb.float()) print (var_rgb.shape) < -- this is (16,3,363,400) var_rgb = var_rgb.cuda() var_mask = Variable(mask.float()) var_mask = var_mask.cuda() output = trainable_model(var_rgb) loss = ((criterion(output, var_mask.long()) / (settings.opt['batch_size'])) total_train_loss += loss.data loss.backward() del var_mask,var_rgb optimizer.step()