I try to write and train a deep multiple instance learning model for 3D image classification, I always meet the question of “out of memory”. And cannot find out what leads to this problem.
Since the 3D images are too large, I segment each 3D image into overlapped patches. So my input is an array of 3D patches, and my output is a label.
In the training process, each patch is as the input of a 3D CNN with the Softmax layer, and the 3D CNN outputs probability of each patch. After obtaining the probabilities of all the patches of a 3D image, the deep multiple instance leanring model selects the patches with the max probability to proceed back-propagation. During the process of back-propagation, I always get the problem of “cudaError: out of memory”. My codes are as bellow:
self.model.train(True) running_loss, running_corrects = 0,0 patch_unit_size = 200 for pat_batch in self.dcm_datasets['train']: inputs, labels, data_dir = pat_batch # the shape of inputs are [batch_numr, patch_num, channel_num, path_height, patch_width, patch_length], where batch_num=1, patch_num varies with differente 3D images, channel_num=3, patch_height=patch_width=patch_length=24 for input_each_batch in inputs: patches_size = input_each_batch.shape num_patches = math.ceil(1.0 * patches_size / patch_unit_size) patch_out_max, patch_out_prob_max = None, None for i in range(num_patches): # Find out the patches with the maximum probability inputs_tmp = input_each_batch[i * self.patch_size: (i + 1) * self.patch_size] if i < num_patches - 1 else input_each_batch[i * self.patch_size:] with torch.cuda.device(self.cuda_ids): inputs_new = Variable(inputs_tmp.cuda()).to(self.cuda_ids) labels_new = Variable(labels.cuda()).to((self.cuda_ids)) outputs = self.model(inputs_new) # 1*2 out_probs = torch.nn.functional.softmax(outputs, dim=1).data patch_out_prob_max = out_probs if patch_out_prob_max is None else torch.cat( (out_probs, patch_out_prob_max), dim=0) '''find all the indices of the maximum''' patch_prob_max = patch_out_prob_max.cpu().numpy() inds_x, inds_y = np.where(patch_prob_max == np.max(patch_prob_max)) patch_out_prob_max = patch_out_prob_max[inds_x] inds_x_out = inds_x[inds_x < outputs.shape] inds_x_rest = inds_x[inds_x >= outputs.shape] if inds_x_rest.shape != 0: inds_x_rest = inds_x_rest - outputs.shape patch_out_max = patch_out_max[inds_x_rest] if inds_x_out.shape != 0: patch_out_max_tmp = outputs[inds_x_out] if inds_x_rest.shape != 0: patch_out_max = torch.cat((patch_out_max, patch_out_max_tmp), dim=0) else: patch_out_max = patch_out_max_tmp outputs, out_probs, inputs_new = 0, 0, 0 labels_new_1 = None for i in range(patch_out_max.shape): labels_new_1 = labels_new if labels_new_1 is None else torch.cat((labels_new_1, labels_new),dim=0) loss = self.criterion(patch_out_max, labels_new_1) self.optimizer.zero_grad() # zero the parameter gradients loss.backward() self.optimizer.step() preds = torch.argmax(patch_out_max.data) # preds is still a tensor running_loss += loss.item() # running_loss is a Python data running_corrects += np.sum(preds.item() == labels_new.item()) loss, labels_new_1 = 0, None torch.cuda.empty_cache() torch.cuda.memory_allocated() data_len = len(self.dcm_datasets['train']) epoch_loss = running_loss / data_len epoch_acc = running_corrects / data_len return epoch_acc, epoch_loss
Since the patch_num may be higher to 1500, I am wondering whether the “CudaError: out of memory” is caused by the large computation graph? But I am not sure.
So my question is how large is my computation graph? Providing the computation graph of a self.patch_size is O, my computation graph is O or num_patches*O?
If my computation graph is just O, the cuda usage is not too much. In this case, what leads to “cuda out of memory”?