Hello everybody!

I try to write and train a deep multiple instance learning model for 3D image classification, I always meet the question of “out of memory”. And cannot find out what leads to this problem.

Since the 3D images are too large, I segment each 3D image into overlapped patches. So my input is an array of 3D patches, and my output is a label.

In the training process, each patch is as the input of a 3D CNN with the Softmax layer, and the 3D CNN outputs probability of each patch. After obtaining the probabilities of all the patches of a 3D image, the deep multiple instance leanring model selects the patches with the max probability to proceed back-propagation. During the process of back-propagation, I always get the problem of “cudaError: out of memory”. My codes are as bellow:

```
self.model.train(True)
running_loss, running_corrects = 0,0
patch_unit_size = 200
for pat_batch in self.dcm_datasets['train']:
inputs, labels, data_dir = pat_batch
# the shape of inputs are [batch_numr, patch_num, channel_num, path_height, patch_width, patch_length], where batch_num=1, patch_num varies with differente 3D images, channel_num=3, patch_height=patch_width=patch_length=24
for input_each_batch in inputs:
patches_size = input_each_batch.shape[0]
num_patches = math.ceil(1.0 * patches_size / patch_unit_size)
patch_out_max, patch_out_prob_max = None, None
for i in range(num_patches): # Find out the patches with the maximum probability
inputs_tmp = input_each_batch[i * self.patch_size: (i + 1) *
self.patch_size] if i < num_patches - 1 else input_each_batch[i * self.patch_size:]
with torch.cuda.device(self.cuda_ids[0]):
inputs_new = Variable(inputs_tmp.cuda()).to(self.cuda_ids[0])
labels_new = Variable(labels.cuda()).to((self.cuda_ids[0]))
outputs = self.model(inputs_new) # 1*2
out_probs = torch.nn.functional.softmax(outputs, dim=1).data
patch_out_prob_max = out_probs if patch_out_prob_max is None else torch.cat(
(out_probs, patch_out_prob_max), dim=0)
'''find all the indices of the maximum'''
patch_prob_max = patch_out_prob_max.cpu().numpy()
inds_x, inds_y = np.where(patch_prob_max == np.max(patch_prob_max))
patch_out_prob_max = patch_out_prob_max[inds_x]
inds_x_out = inds_x[inds_x < outputs.shape[0]]
inds_x_rest = inds_x[inds_x >= outputs.shape[0]]
if inds_x_rest.shape[0] != 0:
inds_x_rest = inds_x_rest - outputs.shape[0]
patch_out_max = patch_out_max[inds_x_rest]
if inds_x_out.shape[0] != 0:
patch_out_max_tmp = outputs[inds_x_out]
if inds_x_rest.shape[0] != 0:
patch_out_max = torch.cat((patch_out_max, patch_out_max_tmp), dim=0)
else:
patch_out_max = patch_out_max_tmp
outputs, out_probs, inputs_new = 0, 0, 0
labels_new_1 = None
for i in range(patch_out_max.shape[0]):
labels_new_1 = labels_new if labels_new_1 is None else torch.cat((labels_new_1, labels_new),dim=0)
loss = self.criterion(patch_out_max, labels_new_1)
self.optimizer.zero_grad() # zero the parameter gradients
loss.backward()
self.optimizer.step()
preds = torch.argmax(patch_out_max[0].data) # preds is still a tensor
running_loss += loss.item() # running_loss is a Python data
running_corrects += np.sum(preds.item() == labels_new.item())
loss, labels_new_1 = 0, None
torch.cuda.empty_cache()
torch.cuda.memory_allocated()
data_len = len(self.dcm_datasets['train'])
epoch_loss = running_loss / data_len
epoch_acc = running_corrects / data_len
return epoch_acc, epoch_loss
```

Since the patch_num may be higher to 1500, I am wondering whether the “CudaError: out of memory” is caused by the large computation graph? But I am not sure.

So my question is how large is my computation graph? Providing the computation graph of a self.patch_size is O, my computation graph is O or num_patches*O?

If my computation graph is just O, the cuda usage is not too much. In this case, what leads to “cuda out of memory”?