I’m training a 3D U-Net-like architecture (with a patch size of 128^3), on a Tesla V100 16GB, which runs out of memory in the loss.backward() step. The forward pass goes through, but the next line which is loss.backward() throws the following CUDA OOM error :
Traceback (most recent call last):
File "trainer.py", line 200, in <module>
loss.cpu().backward()
File "/cbica/external/python/anaconda/3/envs/pytorch/1.0/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/cbica/external/python/anaconda/3/envs/pytorch/1.0/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 15.78 GiB total capacity; 14.70 GiB already allocated; 74.62 MiB free; 54.63 MiB cached)
The error shows loss.cpu().backward() since I try pushing the loss to CPU for backprop but I still get the error of CUDA OOM.
The crux of my training script is given below:
for ep in range(num_epochs):
start = time.time()
model.train
for batch_idx, (subject) in enumerate(train_loader):
# Load the subject and its ground truth
image = subject['image']
mask = subject['gt']
# Loading images into the GPU and ignoring the affine
image, mask = image.float().cuda(), mask.float().cuda()
#Variable class is deprecated - parameteters to be given are the tensor, whether it requires grad and the function that created it
image, mask = Variable(image, requires_grad = True), Variable(mask, requires_grad = True)
# Making sure that the optimizer has been reset
optimizer.zero_grad()
# Forward Propagation to get the output from the models
output = model(image.float())
# Computing the loss
loss = loss_fn(output.cpu().double(), mask.cpu().double(), n_classes)
# Back Propagation for model to learn
print(loss)
loss = loss.cpu()
loss.cpu().backward()
#Updating the weight values
optimizer.step()
#Pushing the dice to the cpu and only taking its value
curr_loss = MCD_loss(output.double(), mask.double(), n_classes).cpu().data.item()
#train_loss_list.append(loss.cpu().data.item())
total_loss+=curr_loss
# Computing the average loss
average_loss = total_loss/(batch_idx + 1)
#Computing the dice score
curr_dice = 1 - curr_loss
#Computing the total dice
total_dice+= curr_dice
#Computing the average dice
average_dice = total_dice/(batch_idx + 1)
scheduler.step()
Any information would be of great help. Thanks in advance.